Program
Fall 2018
Foundations of Data Science
Aug. 15–Dec. 14, 2018
Data arising from experimental, observational, and simulational processes in the natural and social sciences, as well as in industrial applications and other domains, have created enormous opportunities for understanding the world we live in. The pursuit of such understanding requires the development of systems and techniques for processing and analyzing data, falling under the general term "data science." Data science is a blend of old and new. Much of the old involves ideas and techniques that have been developed in existing methodological and application domains, and much of the new is being developed in response to new technologies that create enormous quantities of data.
This program brought together researchers working on algorithmic, mathematical, and statistical aspects of modern data science, with the aim of identifying a set of core techniques and principles that form a foundation for the subject. While the foundations of data science lie at the intersection of computer science, statistics, and applied mathematics, each of these disciplines in turn developed in response to particular long-standing problems. Building a foundation for modern data science requires rethinking not only how those three research areas interact with data, implementations, and applications, but also how each of the areas interacts with the others. For example, differing applications in computer science and scientific computing have led to different formalizations of appropriate models, questions to consider, computational environments (such as single machine vs. distributed data centers vs. supercomputers), and so on. Similarly, business, Internet, and social media applications tend to have certain design requirements and generate certain types of questions, and these tend to be very different from those that arise in scientific and medical applications. As well as these differences, there are also many similarities among these areas. Developing the theoretical foundations of data science requires paying appropriate attention to the questions and issues of domain scientists who generate and use the data, as well as to the computational environments and platforms supporting this work.
This program brought together researchers working on algorithmic, mathematical, and statistical aspects of modern data science, with the aim of identifying a set of core techniques and principles that form a foundation for the subject. While the foundations of data science lie at the intersection of computer science, statistics, and applied mathematics, each of these disciplines in turn developed in response to particular long-standing problems. Building a foundation for modern data science requires rethinking not only how those three research areas interact with data, implementations, and applications, but also how each of the areas interacts with the others. For example, differing applications in computer science and scientific computing have led to different formalizations of appropriate models, questions to consider, computational environments (such as single machine vs. distributed data centers vs. supercomputers), and so on. Similarly, business, Internet, and social media applications tend to have certain design requirements and generate certain types of questions, and these tend to be very different from those that arise in scientific and medical applications. As well as these differences, there are also many similarities among these areas. Developing the theoretical foundations of data science requires paying appropriate attention to the questions and issues of domain scientists who generate and use the data, as well as to the computational environments and platforms supporting this work.
Our emphasis was on such topics as dimensionality reduction, randomized numerical linear algebra, optimization, probability in high dimensions, sparse recovery, statistics (including inference and causality), and streaming and sublinear algorithms, as well as a variety of application areas that can benefit from these fields and other techniques for processing massive data sets. Each of these related areas has received attention from a diverse set of research communities, and an important goal for us was to explore and strengthen connections between methods and problems in these areas, discover new perspectives on old problems, and foster interactions among different research communities that address similar problems from quite different perspectives.
This program was supported in part by the Kavli Foundation and the Patrick J. McGovern Foundation.
Organizers:
David Woodruff
(Carnegie Mellon University; chair),
Ken Clarkson
(IBM Almaden),
Ravi Kannan
(Microsoft Research India),
Michael Mahoney
(International Computer Science Institute and UC Berkeley),
Andrea Montanari
(Stanford University),
Santosh Vempala
(Georgia Institute of Technology),
Rachel Ward
(University of Texas at Austin)
Long-Term Participants (including Organizers):
Ery Arias-Castro
(UC San Diego),
Laura Balzano
(University of Michigan),
Peter Bartlett
(Simons Institute, UC Berkeley),
Shai Ben-David
(University of Waterloo),
Peter Bickel
(UC Berkeley),
Vladimir Braverman
(Johns Hopkins University),
Amit Chakrabarti
(Dartmouth College),
Ken Clarkson
(IBM Almaden),
Artur Czumaj
(University of Warwick),
Luc Devroye
(McGill University),
Ilias Diakonikolas
(University of Southern California),
Maryam Fazel
(University of Washington),
Aditya Guntuboyina
(UC Berkeley),
Anupam Gupta
(Carnegie Mellon University),
Mohammad Hajiaghayi
(University of Maryland),
Adel Javanmard
(University of Southern California),
T.S. Jayram
(IBM Almaden),
Jiantao Jiao
(UC Berkeley),
Brendan Juba
(Washington University in St. Louis),
Ravi Kannan
(Microsoft Research India),
Michael Kapralov
(Ecole Polytechnique Fédérale de Lausanne),
Robert Krauthgamer
(Weizmann Institute),
Mike Luby
(Qualcomm Inc),
Gábor Lugosi
(Pompeu Fabra University),
Michael Mahoney
(International Computer Science Institute and UC Berkeley),
Yury Makarychev
(Toyota Technological Institute at Chicago),
Alan Malek
(Massachusetts Institute of Technology),
Dustin Mixon
(Ohio State University),
Andrea Montanari
(Stanford University),
Sayan Mukherjee
(Duke University),
Boaz Nadler
(Weizmann Institute),
Deanna Needell
(UCLA),
Rasmus Pagh
(IT University of Copenhagen),
Jeffrey Phillips
(University of Utah),
Eric Price
(University of Texas at Austin),
Sofya Raskhodnikova
(Boston University),
Fred Roosta
(University of Queensland),
Barna Saha
(University of Massachusetts, Amherst),
Sujay Sanghavi
(University of Texas at Austin),
Michael Saunders
(Stanford University),
Nikhil Srivastava
(UC Berkeley),
Madeline Udell
(Cornell University),
Santosh Vempala
(Georgia Institute of Technology),
Martin Wainwright
(UC Berkeley),
Bei Wang
(University of Utah),
Rachel Ward
(University of Texas at Austin),
David Woodruff
(Carnegie Mellon University; chair),
Richard Zemel
(University of Toronto)
Research Fellows:
Michal Derezinski
(UC Santa Cruz; Patrick J. McGovern Research Fellow),
Jelena Diakonikolas
(Boston University; Microsoft Research Fellow),
Amir Gholaminejad
(UC Berkeley),
Wooseok Ha
(UC Berkeley),
Gautam Kamath
(Massachusetts Institute of Technology; Microsoft Research Fellow),
Rajiv Khanna
(University of Texas at Austin; Patrick J. McGovern Research Fellow),
Francois Lanusse
(UC Berkeley),
Jerry Li
(Massachusetts Institute of Technology; VMware Research Fellow),
Yian Ma
(UC Berkeley),
Marco Mondelli
(Stanford University; Patrick J. McGovern Research Fellow),
Yan Shuo Tan
(University of Michigan; Patrick J. McGovern Research Fellow)
Visiting Graduate Students and Postdocs:
Ainesh Bakshi
(Carnegie Mellon University),
Soheil Behnezhad
(University of Maryland),
Kush Bhatia
(UC Berkeley),
Xiang Cheng
(UC Berkeley),
Mahsa Derakhshan
(University of Maryland),
Charlie Dickens
(University of Warwick),
Simon Du
(Carnegie Mellon University),
Raaz Dwivedi
(UC Berkeley),
Melih Elibol
(UC Berkeley),
Alireza Farhadi
(University of Maryland),
Avishek Ghosh
(UC Berkeley),
Vipul Gupta
(UC Berkeley),
Sam Hopkins
(UC Berkeley),
Rajesh Jayaram
(Carnegie Mellon University),
Chi Jin
(UC Berkeley),
Swanand Kadhe
(UC Berkeley),
John Kallaugher
(University of Texas at Austin),
Jason Li
(Carnegie Mellon University),
Lydia T. Liu
(UC Berkeley),
Sourabh Pradeep Palande
(University of Utah),
Juan Perdomo
(UC Berkeley),
Hamed Saleh
(University of Maryland),
Saeed Seddighin
(University of Maryland),
Fei Shi
(Carnegie Mellon University),
Zhao Song
(University of Texas at Austin),
Anastasia Voloshinov
(University of Southern California),
Ruosong Wang
(Carnegie Mellon University),
Shirley Wu
(University of Texas at Austin),
Hongyang Zhang
(Carnegie Mellon University),
Banghua Zhu
(UC Berkeley)
Workshops
Monday, Aug. 27–Friday, Aug. 31, 2018
Organizers:
David Woodruff
(Carnegie Mellon University; chair),
Ken Clarkson
(IBM Almaden),
Ravi Kannan
(Microsoft Research India),
Michael Mahoney
(International Computer Science Institute and UC Berkeley),
Andrea Montanari
(Stanford University),
Santosh Vempala
(Georgia Institute of Technology),
Rachel Ward
(University of Texas at Austin)
Monday, Sep. 24–Thursday, Sep. 27, 2018
Organizers:
Petros Drineas
(Purdue University; chair),
Ken Clarkson
(IBM Almaden),
Prateek Jain
(Microsoft Research India),
Michael Mahoney
(International Computer Science Institute and UC Berkeley)
Monday, Oct. 29–Friday, Nov. 2, 2018
Organizers:
Andrea Montanari
(Stanford University; chair),
Emmanuel Candès
(Stanford University),
Ilias Diakonikolas
(University of Southern California),
Santosh Vempala
(Georgia Institute of Technology)
Tuesday, Nov. 27–Friday, Nov. 30, 2018
Organizers:
Robert Krauthgamer
(Weizmann Institute; chair),
Artur Czumaj
(University of Warwick),
Aarti Singh
(Carnegie Mellon University),
Rachel Ward
(University of Texas at Austin)
Monday, Dec. 16–Thursday, Dec. 19, 2019
Organizers:
David Woodruff
(Carnegie Mellon University; chair),
Ken Clarkson
(IBM Almaden),
Ravi Kannan
(Microsoft Research India),
Michael Mahoney
(International Computer Science Institute and UC Berkeley),
Andrea Montanari
(Stanford University),
Santosh Vempala
(Georgia Institute of Technology),
Rachel Ward
(University of Texas at Austin)
Thursday, Jan. 6–Friday, Jan. 7, 2022
Organizers:
Program image by Luisa Lee
Past Internal Program Activities
Friday, December 7th, 12:30 pm–2:00 pm
Wednesday, December 5th, 2:00 pm–3:30 pm
Friday, November 30th, 10:30 am–12:00 pm
Wednesday, November 28th, 2:00 pm–3:30 pm
Tuesday, November 27th, 2:00 pm–3:30 pm
Monday, November 26th, 11:00 am–12:30 pm
Bei Wang
Friday, November 23rd, 10:30 am–12:00 pm
Wednesday, November 21st, 2:00 pm–3:30 pm
Tuesday, November 20th, 2:00 pm–3:30 pm
Monday, November 19th, 11:00 am–12:30 pm
Brendan Juba
Friday, November 16th, 12:30 pm–2:00 pm
Friday, November 16th, 10:30 am–12:00 pm
Wednesday, November 14th, 2:00 pm–3:30 pm
Tuesday, November 13th, 4:00 pm–5:00 pm
Jiantao Jiao
Tuesday, November 13th, 2:00 pm–3:30 pm
Friday, November 9th, 12:30 pm–2:00 pm
Friday, November 9th, 10:30 am–12:00 pm
Wednesday, November 7th, 2:00 pm–3:30 pm
Tuesday, November 6th, 2:00 pm–3:30 pm
Monday, October 15th, 11:00 am–12:30 pm
Shai Ben-David
Monday, October 1st, 11:00 am–12:30 pm
Barna Saha
Monday, September 17th, 11:00 am–12:30 pm
Adel Javanmard, USC
Wednesday, September 12th, 2:00 pm–3:30 pm
Monday, September 10th, 11:00 am–12:30 pm
Robert Krauthgamer, Weizmann Institute of Science
Wednesday, September 5th, 2:00 pm–3:30 pm