Monday, November 18th, 2013

9:10 am9:35 am

How do we simplify and understand a rigid combinatorial object (a finite, discrete graph) in terms of an infinite-dimensional limiting object (a function)? What types of limiting behaviors should we expect empirically? What types do we need mathematically, to ensure consistent inference (leading to algorithms with provable properties, and hence to defensible data-analytic conclusions)?

9:35 am10:00 am

Algorithms developed in the last decade to analyze large networks (centrality, neighbourhood functions, distance distributions) use approximate set representations. The ways in which these approximate set representation are used give no theoretical guarantee, yet the computations are extremely precise (when compared with a ground truth) and even outperform the theoretical precision of the set representations themselves. It would be interesting to understand why.

10:30 am10:55 am

I will recap how exchangeability works for graph data and why it is fundamental to the Bayesian approach; this is closely related to the theory of graph limits. I will then discuss why exchangeable models are misspecified for network data, and summarize what we know about the (so far completely open) problem of finding an alternative concept suitable for networks.

10:55 am11:20 am

In this talk I'll provide an overview of the network scale-up method which enables researchers to estimate the size of hidden populations, such as the groups most at-risk for HIV (e.g, drug injectors, sex workers, and men who have sex with men). I'll present a new (and still unpublished) modeling framework for scale-up estimation, and then I'll describe how we operationalized that framework in large scale-up studies in Brazil and Rwanda. I'll close with what I think are the main limitations of our approach and some ideas for how our approach could be improved.

1:40 pm2:05 pm

I’ll review a general probabilistic framework and large n asymptotics for unlabeled random graphs,introduced by B. and Chen(2009)PNAS. I will show, how various standard models ;block, Chung-Lu, and preferential attachment fall into this framework and will review the asymptotic performance of some standard algorithms.If there is time I will discuss some weaknesses of the approach and a possible cure.

2:05 pm2:30 pm

The term 'relational dataset' refers generically to graphs, matrices, networks. I will review a number of examples coming from different communities (machine learning, statistics, computer science) in which it is desirable to find a small structure in such datasets. Some fundamental computational obstructions appear to emerge in each of these domains, and I will discuss their connections.

3:00 pm3:25 pm

Many massive empirical networks fail to partition into a small number of clusters or modules. As such, standard "global" partitioning algorithms are searching for structure that may not exist. Instead, a number of researchers have demonstrated the advantages of local clustering algorithms that find small clusters. This talk provides a statistical framework for local clustering algorithms by (i) showing how sparse and transitive Stochastic Blockmodels have small "local" clusters and (ii) illustrating that triangles in sparse and stochastic networks lead to the blessing of transitivity. This allows for fast local algorithms with statistical guarantees under a semi-parametric Stochastic Blockmodel that makes limited assumptions on the vast majority of the network.

3:25 pm3:50 pm

We discuss new distributed message-passing-based graph mining frameworks based on MapReduce, Pregel, and ASYMP. Our goal is to develop graph mining algorithms for trillions of nodes. Considering both synchronous and asynchronous message-passing frameworks, we discuss algorithmic challenges and ideas from local graph clustering, and map-reduce-based algorithms, and design scalable graph clustering algorithms.

Tuesday, November 19th, 2013

9:10 am9:35 am
Discussion of developing models that are computable, have identifiable parameters, and for which we can derive some asymptotic/scaling results.
9:35 am10:00 am

A fundamental problem in developmental systems biology is to understand driving forces behind organ formation. In this talk, we use 400+ high-throughput TF images from fruitfly embryoic stage 4-6 to relate these TFs in order to shed light into the TF cascades that trigger organ formation processes. We borrow sparse coding ideas from computational neuroscience/computer vision to decompose the images into principal patterns that capture the locations and shapes of  group TFs and compare with costly human annotations.

This is joint work with Siqi Wu, Erwin Frise, and Antony Joseph.

10:30 am10:55 am
Discussion of how to model network formation in ways that account for both choice and chance in tractable ways.
10:55 am11:20 am

I will describe behavioral experiments in which human subjects must create a network over which they will simultaneously play a coordination game, and compare the results to a setting in which the network was specified exogenously. Perhaps surprisingly, the networks created by the subjects seem fundamentally difficult for solving the coordination task.

2:10 pm2:35 pm

We know that social influence or social contagion exists, and one of the most important problems in network science is to actually measure it. Doing so faces a fundamental obstacle: contagion effects are unidentified in the presence of latent homophily, which is to say almost always. Understanding where this identification problem comes from makes it clear why it invalidates almost all published observational studies on contagion, and why none of the obvious fixes work. There seem to be two ideas which aren't yet known to fail: partial identification or bounding, and control via community discovery. Neither of them will get rid of the fundamental problem, but they might help sometimes.

2:35 pm3:00 pm
Many networks are changing in time and our data on them come in the form of samples, that is, we observe only some of the nodes and link relationships of the network. Often the most practical way to obtain our sample observations is by tracing links from sample nodes to add more nodes to the sample, so our sampling is also a dynamic, adaptive process. I would like to discuss some applications, issues, and challenges in doing this.
3:30 pm3:55 pm

An increasing of news is referred through Facebook and Twitter. We use a large browsing dataset to analyze how the substitution of directly navigated news for social news affects political bias and topic selection.

Wednesday, November 20th, 2013

9:10 am9:35 am

The problem data are a graph with labels on a small subset of the vertices. Many methods to fill in the missing values are based on smoothing via graph Laplacians. We show that those are algebraicly equivalent to kriging with a fixed and somewhat arbitrary covariance. On a social network, the same covariance would be used for age, income and gender. It is more reasonable to tune the covariance based on the observed response values, even if those are relatively few. We see improved performance in hold-out experiments.

This is joint work with Ya Xu and Justin Dyer. It is based largely on Ya Xu's dissertation.

9:35 am10:00 am
Despite mounting empirical observations on online cascades, and advances in theoretical models of diffusion over networks, the growth of online cascades remains unpredictable.
10:30 am10:55 am

Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model roll-out on the properties of the resulting statistical models--which occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partially-observable social network and social media datasets.

10:55 am11:20 am

One of the classic problems in social science is understanding how self-enforcing norms emerge within large groups. While many different models have been developed to understand these collective behaviors, ranging from linguistic conventions to cultural practices to ideological consensus, studying these dynamics empirically has remained intractable. Using an experimental online environment, we study how the interaction of individual mechanisms with social networks drives the emergence of consensus even in the absence of the forces traditionally argued to be necessary, such as centralized authority, common knowledge, and focal points. Our results inform the effectiveness of various models of collective behavior for understanding this social process.

1:40 pm2:05 pm
Extracting knowledge and providing insights into the complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Networks are an example of simple, yet powerful tools for capturing relationships among entities over time. I will present a line of work that deals with the estimation of high-dimensional dynamic networks from limited amounts of data. In particular, I will present a general framework of flexible semiparametric models, capable of capturing the dynamics of network changes, and efficient estimation procedures tailored to different assumptions on the underlying dynamics. Illustrative applications of exploring complex real worlds systems will be presented as well.
3:25 pm3:50 pm

The humble triangle is well known to be an important feature in large social networks, and it forms the basis of many data analyses and algorithms. Nonetheless, getting accurate triangle counts in a large, especially streamed, graph is a big challenge. I will give some recent advances in this problem for various settings. This is one of those lucky situations where theory and practice can be unified: interesting theoretical ideas for triangle counting have a practical impact.

Thursday, November 21st, 2013

9:10 am9:35 am

I will summarize some architectures that are currently used in practice (distributed streaming, single large memory or flash machines, Hadoop, sharded key-value stores, etc). I will outline some graph algorithms that run well on these architectures, and describe some problems for which no good combination of algorithm and architecture is currently known.

9:35 am10:00 am

Many network analysis questions can be posed as solving a system of linear equations with a stochastic random-walk matrix, e.g. PageRank. Monte Carlo methods often have the best asymptotic complexity for these problems. Yet, those same methods are notoriously slow to find high precision answers. In this talk, we pose some questions on unifying asymptotic complexity with real-world performance.

10:30 am10:55 am

We consider the following computational model on graphs that slowly change over time.  At each time step, the graph incurs a unit update and an algorithm has to periodically probe the graph to learn about the update and modify its solution accordingly.  We discuss algorithms in this model for basic graph questions including path connectivity, minimum spanning trees, and PageRank.

10:55 am11:20 am

Classical Recommender Systems provide serving schemes that display items to users to optimize some user satisfaction metrics. But what happens when such users are connected to each other as in social media platforms like Facebook, LinkedIn and Twitter? Content in such a network is produced by nodes (users) and flows through edges in the graph. This gives rise to challenging and brand new issues when dealing with classical problems like recommending content to users. I will talk about such issues, based on my experience working with feed recommendation problems at LinkedIn.