Abstracts

Unifying Theory and Experiment for Large-Scale Networks

Monday, November 18th, 2013

Overview: Graph Models

9:00 am–9:10 am

Speaker: Michael Kearns, University of Pennsylvania

Limiting Behavior in Large Networks

9:10 am–9:35 am

Speaker: Patrick Wolfe, University College London

How do we simplify and understand a rigid combinatorial object (a finite, discrete graph) in terms of an infinite-dimensional limiting object (a function)? What types of limiting behaviors should we expect empirically? What types do we need mathematically, to ensure consistent inference (leading to algorithms with provable properties, and hence to defensible data-analytic conclusions)?

Practice is Better than Theory: Using Approximate Counting to Mine Large Graphs

9:35 am–10:00 am

Speaker: Sebastiano Vigna, University of Milan

Algorithms developed in the last decade to analyze large networks (centrality, neighbourhood functions, distance distributions) use approximate set representations. The ways in which these approximate set representation are used give no theoretical guarantee, yet the computations are extremely precise (when compared with a ground truth) and even outperform the theoretical precision of the set representations themselves. It would be interesting to understand why.

Bayesian Models for Graph Data and the Open Problem of Invariance in Networks

10:30 am–10:55 am

Speaker: Peter Orbanz, Columbia University

I will recap how exchangeability works for graph data and why it is fundamental to the Bayesian approach; this is closely related to the theory of graph limits. I will then discuss why exchangeable models are misspecified for network data, and summarize what we know about the (so far completely open) problem of finding an alternative concept suitable for networks.

Bayesian Models for Graph Data and the Open Problem of Invariance in Networks (slides)

Know Your Epidemic: But How?

10:55 am–11:20 am

Speaker: Matt Salganik, Princeton University

In this talk I'll provide an overview of the network scale-up method which enables researchers to estimate the size of hidden populations, such as the groups most at-risk for HIV (e.g, drug injectors, sex workers, and men who have sex with men). I'll present a new (and still unpublished) modeling framework for scale-up estimation, and then I'll describe how we operationalized that framework in large scale-up studies in Brazil and Rwanda. I'll close with what I think are the main limitations of our approach and some ideas for how our approach could be improved.

Overview: Clustering

1:30 pm–1:40 pm

Speaker: Jennifer Neville, Purdue University

Asymptotic Analysis of Unlabelled Networks

1:40 pm–2:05 pm

Speaker: Peter Bickel, UC Berkeley

I’ll review a general probabilistic framework and large n asymptotics for unlabeled random graphs,introduced by B. and Chen(2009)PNAS. I will show, how various standard models ;block, Chung-Lu, and preferential attachment fall into this framework and will review the asymptotic performance of some standard algorithms.If there is time I will discuss some weaknesses of the approach and a possible cure.

Finding Small Structures in Large Datasets

2:05 pm–2:30 pm

Speaker: Andrea Montanari, Stanford University

The term 'relational dataset' refers generically to graphs, matrices, networks. I will review a number of examples coming from different communities (machine learning, statistics, computer science) in which it is desirable to find a small structure in such datasets. Some fundamental computational obstructions appear to emerge in each of these domains, and I will discuss their connections.

Local Clustering and the Blessing of Transitivity

3:00 pm–3:25 pm

Speaker: Karl Rohe, University of Wisconsin-Madison

Many massive empirical networks fail to partition into a small number of clusters or modules. As such, standard "global" partitioning algorithms are searching for structure that may not exist. Instead, a number of researchers have demonstrated the advantages of local clustering algorithms that find small clusters. This talk provides a statistical framework for local clustering algorithms by (i) showing how sparse and transitive Stochastic Blockmodels have small "local" clusters and (ii) illustrating that triangles in sparse and stochastic networks lead to the blessing of transitivity. This allows for fast local algorithms with statistical guarantees under a semi-parametric Stochastic Blockmodel that makes limited assumptions on the vast majority of the network.

Local Clustering and the Blessing of Transitivity (slides)

Large-scale Graph Clustering and Message-passing-based Distributed Framework

3:25 pm–3:50 pm

Speaker: Vahab Mirrokni, Google

We discuss new distributed message-passing-based graph mining frameworks based on MapReduce, Pregel, and ASYMP. Our goal is to develop graph mining algorithms for trillions of nodes. Considering both synchronous and asynchronous message-passing frameworks, we discuss algorithmic challenges and ideas from local graph clustering, and map-reduce-based algorithms, and design scalable graph clustering algorithms.

Tuesday, November 19th, 2013

Overview: Network Formation

9:00 am–9:10 am

Speaker: Ashish Goel, Stanford University

Challenges of Modeling Network Formation in Tractable Ways

9:10 am–9:35 am

Speaker: Arun Chandrasekhar, Stanford University

Discussion of developing models that are computable, have identifiable parameters, and for which we can derive some asymptotic/scaling results.

Relating Developmental Transcription Factors (TFs) Based on Fruitfly Embryoic Images

9:35 am–10:00 am

Speaker: Bin Yu, UC Berkeley

A fundamental problem in developmental systems biology is to understand driving forces behind organ formation. In this talk, we use 400+ high-throughput TF images from fruitfly embryoic stage 4-6 to relate these TFs in order to shed light into the TF cascades that trigger organ formation processes. We borrow sparse coding ideas from computational neuroscience/computer vision to decompose the images into principal patterns that capture the locations and shapes of group TFs and compare with costly human annotations.

This is joint work with Siqi Wu, Erwin Frise, and Antony Joseph.

Issues in Developing Estimable Models of Strategic Network Formation

10:30 am–10:55 am

Speaker: Matt Jackson, Stanford University

Discussion of how to model network formation in ways that account for both choice and chance in tractable ways.

Behavioral Experiments in a Network Formation Game

10:55 am–11:20 am

Speaker: Michael Kearns, University of Pennsylvania

I will describe behavioral experiments in which human subjects must create a network over which they will simultaneously play a coordination game, and compare the results to a setting in which the network was specified exogenously. Perhaps surprisingly, the networks created by the subjects seem fundamentally difficult for solving the coordination task.

Overview: Causality/Experiments

2:00 pm–2:10 pm

Speaker: Matt Jackson, Stanford University

Are Observational Studies of Social Contagion Doomed?

2:10 pm–2:35 pm

Speaker: Cosma Shalizi, Carnegie Mellon University

We know that social influence or social contagion exists, and one of the most important problems in network science is to actually measure it. Doing so faces a fundamental obstacle: contagion effects are unidentified in the presence of latent homophily, which is to say almost always. Understanding where this identification problem comes from makes it clear why it invalidates almost all published observational studies on contagion, and why none of the obvious fixes work. There seem to be two ideas which aren't yet known to fail: partial identification or bounding, and control via community discovery. Neither of them will get rid of the fundamental problem, but they might help sometimes.

On Sampling in Dynamic Networks for Inference, Experiments, and Interventions

2:35 pm–3:00 pm

Speaker: Steve Thompson, Simon Fraser University

Many networks are changing in time and our data on them come in the form of samples, that is, we observe only some of the nodes and link relationships of the network. Often the most practical way to obtain our sample observations is by tracing links from sample nodes to add more nodes to the sample, so our sampling is also a dynamic, adaptive process. I would like to discuss some applications, issues, and challenges in doing this.

The Effect of Social Media on News Consumption

3:30 pm–3:55 pm

Speaker: Markus Mobius, Microsoft Research New England

An increasing of news is referred through Facebook and Twitter. We use a large browsing dataset to analyze how the substitution of directly navigated news for social news affects political bias and topic selection.

Wednesday, November 20th, 2013

Overview: Label Prediction/Diffusion

9:00 am–9:10 am

Speaker: Michael Kearns, University of Pennsylvania

Semi-supervised Learning on Graphs, Using Observed Correlation Structure

9:10 am–9:35 am

Speaker: Art Owen, Stanford University

The problem data are a graph with labels on a small subset of the vertices. Many methods to fill in the missing values are based on smoothing via graph Laplacians. We show that those are algebraicly equivalent to kriging with a fixed and somewhat arbitrary covariance. On a social network, the same covariance would be used for age, income and gender. It is more reasonable to tune the covariance based on the observed response values, even if those are relatively few. We see improved performance in hold-out experiments.

This is joint work with Ya Xu and Justin Dyer. It is based largely on Ya Xu's dissertation.

How Predictable are Online Cascades

9:35 am–10:00 am

Speaker: Lada Adamic, Facebook

Despite mounting empirical observations on online cascades, and advances in theoretical models of diffusion over networks, the growth of online cascades remains unpredictable.

The Impact of Network Structure on Relational Learning and Inference

10:30 am–10:55 am

Speaker: Jennifer Neville, Purdue University

Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model roll-out on the properties of the resulting statistical models--which occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partially-observable social network and social media datasets.

The Emergence of Convention: An Experimental Study of Social Order

10:55 am–11:20 am

Speaker: Damon Centola, Massachusetts Institute of Technology

One of the classic problems in social science is understanding how self-enforcing norms emerge within large groups. While many different models have been developed to understand these collective behaviors, ranging from linguistic conventions to cultural practices to ideological consensus, studying these dynamics empirically has remained intractable. Using an experimental online environment, we study how the interaction of individual mechanisms with social networks drives the emergence of consensus even in the absence of the forces traditionally argued to be necessary, such as centralized authority, common knowledge, and focal points. Our results inform the effectiveness of various models of collective behavior for understanding this social process.

The Emergence of Convention: An Experimental Study of Social Order (slides)

Overview: Networks Over Time

1:30 pm–1:40 pm

Speaker: Patrick Wolfe, University College London

Estimating Time-Varying Networks

1:40 pm–2:05 pm

Speaker: Mladen Kolar, Carnegie Mellon University

Extracting knowledge and providing insights into the complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Networks are an example of simple, yet powerful tools for capturing relationships among entities over time. I will present a line of work that deals with the estimation of high-dimensional dynamic networks from limited amounts of data. In particular, I will present a general framework of flexible semiparametric models, capable of capturing the dynamics of network changes, and efficient estimation procedures tailored to different assumptions on the underlying dynamics. Illustrative applications of exploring complex real worlds systems will be presented as well.

Estimating Time-Varying Networks (slides)

Pre-Processing of Dynamic Networks: Its Impact on Analytics

2:05 pm–2:30 pm

Speaker: Sofus Macskássy, Facebook

Pre-Processing of Dynamic Networks: Its Impact on Analytics (slides)

Graph Streams and Sketches: What Can and Can't Be Done

3:00 pm–3:25 pm

Speaker: Andrew McGregor, University of Massachusetts Amherst

Graph Streams and Sketches: What Can and Can't Be Done (slides)

The Trials and Tribulations of Tractably Counting Triangles

3:25 pm–3:50 pm

Speaker: C. Seshadhri, Sandia National Laboratories

The humble triangle is well known to be an important feature in large social networks, and it forms the basis of many data analyses and algorithms. Nonetheless, getting accurate triangle counts in a large, especially streamed, graph is a big challenge. I will give some recent advances in this problem for various settings. This is one of those lucky situations where theory and practice can be unified: interesting theoretical ideas for triangle counting have a practical impact.

The Trials and Tribulations of Tractably Counting Triangles (slides)

Thursday, November 21st, 2013

Overview: Scalability/Practical Challenges

9:00 am–9:10 am

Speaker: Deepak Agarwal, LinkedIn

Graph Algorithms for Modern Data Models

9:10 am–9:35 am

Speaker: Ashish Goel, Stanford University

I will summarize some architectures that are currently used in practice (distributed streaming, single large memory or flash machines, Hadoop, sharded key-value stores, etc). I will outline some graph algorithms that run well on these architectures, and describe some problems for which no good combination of algorithm and architecture is currently known.

On Unifying Asymptotic Complexity with Real-world Performance in Matrix-based Network Computations

9:35 am–10:00 am

Speaker: David Gleich, Purdue University

Many network analysis questions can be posed as solving a system of linear equations with a stochastic random-walk matrix, e.g. PageRank. Monte Carlo methods often have the best asymptotic complexity for these problems. Yet, those same methods are notoriously slow to find high precision answers. In this talk, we pose some questions on unifying asymptotic complexity with real-world performance.

On Unifying Asymptotic Complexity with Real-world Performance in Matrix-based Network Computations (slides)

Evolving Graphs

10:30 am–10:55 am

Speaker: Ravi Kumar, Google

We consider the following computational model on graphs that slowly change over time. At each time step, the graph incurs a unit update and an algorithm has to periodically probe the graph to learn about the update and modify its solution accordingly. We discuss algorithms in this model for basic graph questions including path connectivity, minimum spanning trees, and PageRank.

Theoretical and Practical Challenges of Recommending Content to Users in a Social Network

10:55 am–11:20 am

Speaker: Deepak Agarwal, LinkedIn

Classical Recommender Systems provide serving schemes that display items to users to optimize some user satisfaction metrics. But what happens when such users are connected to each other as in social media platforms like Facebook, LinkedIn and Twitter? Content in such a network is produced by nodes (users) and flows through edges in the graph. This gives rise to challenging and brand new issues when dealing with classical problems like recommending content to users. I will talk about such issues, based on my experience working with feed recommendation problems at LinkedIn.