Program
s
Fall 2013

Theoretical Foundations of Big Data Analysis

Aug. 22Dec. 20, 2013

We live in an era of "big data": science, engineering, and technology are producing increasingly large data streams, with petabyte and exabyte scales becoming increasingly common. In scientific fields, such data arise in part because tests of standard theories increasingly focus on extreme physical conditions (e.g., particle physics) and in part because science has become increasingly exploratory (e.g., astronomy and genomics). In commerce, massive data arise because so much of human activity is now online and because business models aim to provide services that are increasingly personalized.

The big data phenomenon presents opportunities and perils. On the optimistic side of the coin, massive data may amplify the inferential power of algorithms that have been shown to be successful on modest-size data sets. The challenge is to develop the theoretical principles needed to scale inference and learning algorithms to massive, even arbitrary scale. On the pessimistic side of the coin, massive data may amplify the error rates that are part and parcel of any inferential algorithm. The challenge is to control such errors even in the face of the heterogeneity and uncontrolled sampling processes underlying many massive data sets. Another major issue is that big data problems often come with time constraints, where a high-quality answer that is obtained slowly can be less useful than a medium-quality answer that is obtained quickly. Overall, we have a problem in which the classical resources of the theory of computation — e.g., time, space, and energy — trade off in complex ways with the data resource.

Various aspects of this general problem are being faced in the theory of computation, statistics, and related disciplines — where topics such as dimension reduction, distributed optimization, Monte Carlo sampling, compressed sensing, low-rank matrix factorization, streaming, and hardness of approximation are of clear relevance — but the general problem remains untackled. This program brought together experts from these areas with the aim of laying the theoretical foundations of the emerging field of big data.

Organizers:

Michael Jordan (UC Berkeley; chair), Stephen Boyd (Stanford University), Peter Bühlmann (ETH Zürich), Ravi Kannan (Microsoft Research India), Michael Mahoney (International Computer Science Institute and UC Berkeley), Muthu Muthukrishnan (Rutgers University and Microsoft Research India)

Long-Term Participants (including Organizers):

Alexandr Andoni (MIcrosoft Research), Ivona Bezáková (Rochester Institute of Technology), Peter Bickel (UC Berkeley), Josh Bloom (UC Berkeley), Sébastien Bubeck (Princeton University), Aydın Buluç (Lawrence Berkeley National Laboratory), Emmanuel Candès (Stanford University), Amit Chakrabarti (Dartmouth College), James Demmel (UC Berkeley), Petros Drineas (Rensselaer Polytechnic Institute), Noureddine El Karoui (UC Berkeley), Michael Friedlander (University of British Columbia), David Gleich (Purdue University), Alexander Gray (Georgia Institute of Technology and Skytree, Inc.), Moritz Hardt (IBM Almaden), Dorit Hochbaum (UC Berkeley), Kazuo Iwama (Kyoto University), Michael Jordan (UC Berkeley; chair), Ravi Kannan (Microsoft Research India), Valerie King (University of Victoria), Jian Li (Tsinghua University), Michael Mahoney (International Computer Science Institute and UC Berkeley), Andrew McGregor (University of Massachusetts), Muthu Muthukrishnan (Rutgers University and Microsoft Research India), Jennifer Neville (Purdue University), Robert Nowak (University of Wisconsin-Madison), Ely Porat (Bar-Ilan University), Yuval Rabani (Hebrew University, Jerusalem), Chris Ré (Stanford University), Benjamin Recht (University of Wisconsin, Madison), Peter Richtarik (University of Edinburgh), Richard Samworth (University of Cambridge), Leonard Schulman (California Institute of Technology), Daniel Štefankovič (University of Rochester), Mario Szegedy (Rutgers University), Joel Tropp (California Institute of Technology), David Tse (UC Berkeley), Suresh Venkatasubramanian (University of Utah), Martin Wainwright (UC Berkeley), David Woodruff (IBM Almaden)

Research Fellows:

Leonid Barenboim (Weizmann Institute), Xi Chen (New York University), Martin Jaggi (ETH Zürich), Mladen Kolar (University of Chicago), Yi Li (Max Planck Institute, Saarbrücken), Han Liu (Princeton University), Sang-Yun Oh (Lawrence Berkeley National Laboratory), Eric Price (University of Texas at Austin; Google Research Fellow), Or Sheffet (Harvard University), Nikhil Srivastava (Microsoft Research India), Justin Thaler (Yahoo Labs; Microsoft Research Fellow), Caroline Uhler (IST Austria)

Visiting Graduate Students and Postdocs:

John Duchi (Stanford University), Sagar Kale (Dartmouth College), Arindam Khan (Georgia Institute of Technology), Jakub Konečný (University of Edinburgh), Martin Takáč (University of Edinburgh), Gongguo Tang (Colorado School of Mines), Yixin Xu (Rutgers University)

Workshops

Tuesday, Sep. 3Friday, Sep. 6, 2013

Organizers:

Michael Jordan (UC Berkeley)
Monday, Sep. 16Thursday, Sep. 19, 2013

Organizers:

Petros Drineas (Rensselaer Polytechnic Institute; chair), Francis Bach (INRIA and École Normale Supérieure Paris), Peter Bühlmann (ETH Zürich), Emmanuel Candès (Stanford University), Piotr Indyk (Massachusetts Institute of Technology), Ravi Kannan (Microsoft Research India), Muthu Muthukrishnan (Rutgers University and Microsoft Research India), Robert Nowak (University of Wisconsin-Madison), Stephen Wright (University of Wisconsin-Madison)
Monday, Oct. 21Thursday, Oct. 24, 2013

Organizers:

Michael Mahoney (International Computer Science Institute and UC Berkeley; chair), Guy Blelloch (Carnegie Mellon University), John Gilbert (UC Santa Barbara), Chris Ré (Stanford University), Martin Wainwright (UC Berkeley)
Monday, Nov. 18Thursday, Nov. 21, 2013

Organizers:

Michael Kearns (University of Pennsylvania; co-chair), Jennifer Neville (Purdue University; co-chair), Deepak Agarwal (LinkedIn), Edo Airoldi (Harvard University), Ashish Goel (Stanford University), Matt Jackson (Stanford University)
Wednesday, Dec. 11Saturday, Dec. 14, 2013

Organizers:

Kunal Talwar (; chair), Avrim Blum (Carnegie Mellon University), Kamalika Chaudhuri (UC San Diego), Cynthia Dwork (Harvard University), Michael Jordan (UC Berkeley)
Monday, Dec. 15Wednesday, Dec. 17, 2014

Organizers:

Michael Jordan (UC Berkeley)

Program image: "Say Big Oh" by Muthu.

Past Internal Program Activities

Tuesday, December 10th, 12:30 pm1:30 pm
Yuval Rabani (Hebrew University of Jerusalem) and Eric Price (Massachusetts Institute of Technology)
Tuesday, December 3rd, 12:30 pm1:30 pm
Tuesday, November 26th, 12:30 pm1:30 pm
Suresh Venkatasubramanian (University of Utah) and Moritz Hardt (IBM Almaden)
Tuesday, November 12th, 12:30 pm1:30 pm
Tuesday, November 5th, 12:30 pm1:30 pm
Yash Deshpande (Stanford University)
Tuesday, October 15th, 12:30 pm1:30 pm
Dorit Hochbaum (UC Berkeley)
Tuesday, October 8th, 12:30 pm1:30 pm
Mary Wootters (University of Michigan) and Ely Porat (Bar-Ilan University)
Tuesday, October 1st, 12:30 pm1:30 pm
Justin Thaler (Harvard University) and Nikhil Srivastava (Microsoft Research India)