Logic and probability are ancient subjects whose unification holds significant potential for the field of artificial intelligence. The BLOG (Bayesian LOGic) language provides a way to write probability models using syntactic and semantic devices from first-order logic. In modern parlance, it is a relational, open-universe probabilistic programming language that allows one to define probability distributions over the entire space of first-order model structures that can be constructed given the constant, function, and predicate symbols of the program. I will describe the language mainly through examples and cover its application to monitoring the Comprehensive Nuclear-Test-Ban Treaty.
Tuesday, October 4th, 2016
Probabilistic model checking is an automatic procedure for establishing if a desired property holds in a probabilistic model, aimed at verifying quantitative probabilistic specifications such as the probability of a critical failure occurring or expected time to termination. Much progress has been made in recent years in algorithms, tools and applications of probabilistic model checking, as exemplified by the probabilistic model checker PRISM (www.prismmodelchecker.org). However, the unstoppable rise of autonomous systems, from robotic assistants to self-driving cars, is placing greater and greater demands on quantitative modelling and verification technologies. To address the challenges of autonomy we need to consider collaborative, competitive and adversarial behaviour, which is naturally modelled using game-theoretic abstractions, enhanced with stochasticity arising from randomisation and uncertainty. This paper gives an overview of quantitative verification and strategy synthesis techniques developed for turn-based stochastic multi-player games, summarising recent advances concerning multi-objective properties and compositional strategy synthesis. The techniques have been implemented in the PRISM-games model checker built as an extension of PRISM.
A transformation mapping a labelled Markov chain to a simple stochastic game is presented. In the resulting simple stochastic game, each vertex corresponds to a pair of states of the labelled Markov chain. The value of a vertex of the simple stochastic game is shown to be equal to the probabilistic bisimilarity distance, a notion due to Desharnais, Gupta, Jagadeesan and Panangaden, of the corresponding pair of states of the labelled Markov chain. Bacci, Bacci, Larsen and Mardare introduced an algorithm to compute the probabilistic bisimilarity distances for a labelled Markov chain. A modification of a basic version of their algorithm for a labelled Markov chain is shown to be the policy iteration algorithm applied to the corresponding simple stochastic game. Furthermore, it is shown that this algorithm takes exponential time in the worst case. Finally, experimental results confirm that it performs better in practice than other algorithms to compute the probabilistic bisimilarity distances. Joint work with Qiyi Tang.
We develop a quantitative analogue of equational reasoning which we call quantitative algebraic reasoning. We define an indexed equality relation of type a =_e b which we think of as saying that “a is approximately equal to b up to an error of e”. The models are universal algebras defined on metric spaces.
We have interesting examples where we have a quantitative equational theory whose free algebras correspond to well known structures. In each case we have finitary and continuous versions. The cases are: Hausdorff metrics from quantitive semilattices; p-Wasserstein metrics (hence also the Kantorovich metric) from barycentric algebras; and the total variation distance from a particular type of barycentric algebras.
This is a joint work with Gordon Plotkin and Prakash Panangaden; preliminary results have been presented at LICS2016.
In this talk, I will give an overview on existing results in the automata/verification community concerning the control of probabilistic systems under partial observation. This includes probabilistic automata (over finite or infinite words), optimal strategies for unbounded horizon objectives in POMDP, and more. I will then list several possible research directions in this area.
We examine the computable content of three key objects in probability theory: the mixing measure in de Finetti's theorem, the Aldous-Hoover-Kallenberg representation of an exchangeable array, and the disintegration map used to form conditional distributions. Joint work with Nathanael Ackerman, Jeremy Avigad, Daniel Roy, and Jason Rute.
Wednesday, October 5th, 2016
Probabilistic Databases (PDBs) extend traditional relational databases by annotating each record with a weight, or a probability. Although PDBs define a very simple probability space, by adding constraints one can represent rich models such as Markov Logic Networks. While in traditional databases query evaluation corresponds to model checking and is always tractable, in probabilistic databases it becomes model counting, a computationally hard problem. Research on probabilistic databases has lead to new approaches to weighted model counting that exploit the structure of the query expression, similar to lifted inference in statistical relational models. For restricted fragments of First Order Logic, a dichotomy theorem holds: for any query in that fragment, weighted model counting is either in polynomial time, or is provably #P-hard, and the complexity can be determined by analyzing the query. I will also describe absolute lower bounds on the runtime of DPLL-based model counting algorithms, which rely on compiling the query into a Decision Diagram, then proving exponential size lower bounds on the size of the Decision Diagram.
In this talk I will discuss a complexity dichotomy for exact query evaluation in probabilistic databases, where each record in the database is an independent probabilistic event. I will show that the data complexity of any relational algebra query, which has no repeating relation symbols and disjunction but may have negation, is either polynomial time or #P-hard, and the tractable queries can be recognised efficiently.
This is joint work with Robert Fink.
Graphical models provide a compelling framework for representing complex high dimensional probability distributions in a compact form. We explore a different type of compact representation based on the Hadamard-Fourier transform. We show that a large class of probabilistic models have a compact Fourier representation. This formal result opens up a new way of approximating complex probability distributions. We demonstrate the advantage of incorporating a Fourier representation into a variable elimination inference strategy. Compared to other state-of-the-art probabilistic inference techniques, we demonstrate significant computational gains in several domains.
This is joint work with Yexiang Xue, Stefano Ermon, Ronan Le Bras, and Carla P. Gomes.
In this talk, I will introduce probabilistic soft logic (PSL), a declarative probabilistic programming language for collective inference from richly structured data. In PSL, logical dependencies are expressed using a collection of weighted rules, expressed in a declarative, datalog-like language. These weighted rules are then interpreted as potential functions in a probabilistic factor graph which we refer to as a hinge-loss Markov random field (HL-MRF). A unique property of HL-MRFs is that maximum a posteriori (MAP) inference is convex; this makes collective inference from richly structured data highly scalable. HL-MRFs unify three different approaches to convex inference: LP approximations for randomized algorithms for solving MAXSAT, local relaxations for probabilistic graphical models, and inference in soft logic. I will show that all three lead to the same inference objective. HL-MRFs typically have richly connected yet sparse dependency structures, and I will describe an inference algorithm that exploits the fine-grained dependency structures and is much more scalable than general-purpose convex optimization approaches.
We consider an agent trying to bring a system to an acceptable state by repeated probabilistic action (stochastic control). Specifically, in each step the agent observes the flaws present in the current state, selects one of them, and addresses it by probabilistically moving to a new state, one where the addressed flaw is most likely absent, but where one or more new flaws may be present. Several recent works on algorithmizations of the Lov\'{a}sz Local Lemma have established sufficient conditions for such an agent to succeed. Motivated by the paradigm of Partially Observable Markov Decision Processes (POMDPs) we study whether such stochastic control is also possible in a noisy environment, where both the process of state-observation and the process of state-evolution are subject to adversarial perturbation (noise). The introduction of noise causes the tools developed for LLL algorithmization to break down since the key LLL ingredient, the sparsity of the causality (dependence) relationship, no longer holds. To overcome this challenge we develop a new analysis where entropy plays a central role, both to measure the rate at which progress towards an acceptable state is made and the rate at which the noise undoes this progress. The end result is a sufficient condition that allows a smooth tradeoff between the intensity of the noise and the amenability of the system, recovering an asymmetric LLL condition in the noiseless case. To our knowledge, this is the first tractability result for a nontrivial class of POMDPs under stochastic memoryless control.
First-order model counting recently emerged as a computational tool for high-level probabilistic reasoning. It is concerned with counting satisfying assignments to sentences in first-order logic and upgrades the successful propositional model counting approaches to probabilistic reasoning. We give an overview of model counting as it is applied in statistical relational learning, probabilistic programming, probabilistic databases, and hybrid reasoning. A short tutorial illustrates the principles behind these solvers. Finally, we show that first-order counting is a fundamentally different problem from the propositional counting techniques that inspired it.
Thursday, October 6th, 2016
Probabilistic programming is, in the abstract, the study of algorithmic processes that represent and transform uncertainty. In practice, there are many probabilistic programming systems that, to varying degrees of generality and efficiency, allow users to characterize states of uncertainty via probability models and update those models in light of data, either exactly or approximately. I will give a survey of the field and characterize some challenges ahead.
If people could communicate with and interactively modify the behavior of AI systems, both people and machines could behave more intelligently. Unfortunately, most AI systems are black boxes designed to solve a single narrowly defined problem, such as chess or face recognition or click prediction, and adjusting their behavior requires deep technical expertise. In this talk, I will describe progress towards more transparent and flexible AI systems capable of augmenting rather than just replacing human intelligence, building on the emerging field of probabilistic programming. Probabilistic programming draws on probability theory, programming languages, and system software to provide concise, expressive languages for modeling and general-purpose inference engines that both humans and machines can use. This talk focuses on BayesDB and Picture, domain-specific probabilistic programming platforms being developed by my research group, aimed at augmenting intelligence in the fields of data science and computer vision, respectively. BayesDB, which is open source and in use by organizations like the Bill & Melinda Gates Foundation and JPMorgan, lets users who lack statistics training understand the probable implications of data by writing queries in a simple, SQL-like language. Picture, a probabilistic language being developed in collaboration with Microsoft, lets users solve hard computer vision problems such as inferring 3D models of faces, human bodies and novel generic objects from single images by writing short (<50 line) computer graphics programs that generate and render random scenes. Unlike bottom-up vision algorithms, Picture programs build on prior knowledge about scene structure and produce complete 3D wireframes that people can manipulate using ordinary graphics software. This talk will also briefly illustrate the fundamentals of probabilistic programming using Venture, an interactive platform suitable for teaching and applications in fields ranging from statistics to robotics, and concludes with a summary of current and future research directions.
Differential Privacy is a notion of privacy for statistical databases that has been studied extensively in recent years. I will give an introduction to Differential Privacy and discuss properties of the underlying measure of distance between distributions, and some implications.
Many modern machine learning applications involve sensitive correlated data, such private information on users connected together into a social network, and measurements of physical activity of a single user across time. However, the current gold standard of privacy in machine learning, differential privacy, cannot adequately address privacy issues in this kind of data. This work looks at a recent generalization of differential privacy, called Pufferfish, that can be used to address privacy in correlated data. The main challenge in applying Pufferfish to correlated data problems is the lack of suitable mechanisms. In this talk, I will present two such mechanisms -- a general mechanism, called the Wasserstein Mechanism, which applies to any Pufferfish framework, and a more computationally efficient one, called the Markov Quilt Mechanism, that exploits structural properties of the correlation model for computational efficiency.
Constrained sampling and counting are two fundamental problems in data analysis. In constrained sampling the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of given constraints. This problem has numerous applications, including probabilistic reasoning, machine learning, statistical physics, and constrained-random verification. A related problem is that of constrained counting, where the task is to count the total weight, subject to a given weighting function, of the set of solutions of the given constraints. This problem has applications in machine learning, probabilistic reasoning, and planning, among other areas. Both problems can be viewed as aspects of one of the most fundamental problems in artificial intelligence, which is to understand the structure of the solution space of a given set of constraints. This work focuses on the development of new algorithmic techniques for constrained sampling and counting, based on a universal hashing -- a classical algorithmic technique in theoretical computer science. Many of the ideas underlying the proposed approach were go back to the 1980s, but they have never been reduced to practice. Recent progress in Boolean reasoning is enabling us to reduce these algorithmic ideas to practice, and obtain breakthrough results in constrained sampling and counting, providing a new algorithmic toolbox in design verification, machine learning, probabilistic reasoning, and the like. Joint work with Supratik Chakraborty and Kuldeep S. Meel.
There is a long history of interaction between probability and logic. One such context has been the study of countable structures whose relations and functions are assigned probabilities rather than binary truth values; of particular interest are those assignments that do not depend on the labeling of the underlying set. In this talk I will discuss the early work of Gaifman, Scott and Krauss that introduced this line of investigation, and then describe an equivalent setting from descriptive set theory, consisting of a certain space of countable structures with fixed underlying set. The ergodic probability measures on this space that are invariant under permutations of the underlying set provide a natural notion of exchangeable model-theoretic structure: to each such measure there is associated a complete and consistent infinitary theory. Furthermore, any such measure can be seen as a limit of sampling measures derived from a convergent sequence of finite structures. I will discuss these connections, and present some related new results due to Ackerman, Freer, Kruckman and myself.
Friday, October 7th, 2016
We consider a setting known as combinatorial auctions in which self-interested agents have preferences over items or sets of items, and interact with an auction or allocation mechanism that determines what items are given to which agents. We consider the perspective of an outside observer who each day can only see which agents show up and what they get, or perhaps just which agents are satisfied and which are not. Our goal will be from observing a series of such interactions to try to learn the agent preferences and perhaps also the rules of the allocation mechanism.
As an example, consider observing web pages where the agents are advertisers and the winners are those whose ads show up on the given page. Or consider observing the input-output behavior of a server, where the input consists of a set of agents requesting service, and the output is a partition of them into some whose requests are
actually fulfilled and the rest that are not---due to overlap of their resource needs with higher-priority requests. We give learning algorithms for scenarios of this form, for fairly simple but interesting classes of preferences and mechanisms. We also consider a classic Myerson single-item auction setting, where from observing who wins and also being able to participate ourselves we would like to learn agents' valuation distributions.
In examining these problems we will see connections to decision-list learning and to Kaplan-Meier estimators from medical statistics.
This is based on joint work with Yishay Mansour and Jamie Morgenstern.
Probabilistic inference in high-dimensional probabilistic models (i.e., with many variables) is one of the central problems of statistical machine learning and stochastic decision making. To date, only a handful of distinct methods have been developed, most notably (MCMC) sampling, decomposition, and variational methods. In this talk, I will introduce a new approach where random projections are used to simplify a high-dimensional model while preserving some of its key properties. These random projections can be combined with traditional variational inference methods (information projections) and combinatorial optimization tools. These novel randomized approaches provide provable guarantees on the accuracy, and outperform traditional methods in a range of domains.
In this talk we use a simple auction design problem to illustrate a number of approaches to interpolating between worst- and average-case analysis. Specifically, what selling procedure should a seller use to maximize her revenue? This is a hard question because the best auction to run --- like choosing the right opening bid in an eBay auction or the right reserve price in a sponsored search auction --- depends on what bidders are willing to pay. The straightforward worst-case analysis approach fails to produce any meaningful results. The traditional approach in economics is to maximize the seller's expected revenue with respect to a known distribution over what bidders are willing to pay. Two critiques of this "average-case" approach are: (i) it requires detailed knowledge of the input distribution; and (ii) in many cases, the theoretically optimal auction is too complex for direct use. Are there useful "sweet spots" between worst- and average-case analysis, that inherit the robustness of the former model and the strong guarantees of the latter model, and that naturally lead to plausibly implementable solutions? This talk surveys several such interpolations that have proved useful: partial distributional knowledge, in the form of coarse statistics or independent samples; prior-independent auctions, where the goal is to be simultaneously near-optimal for a wide range of distributions; and using appropriate revenue benchmarks to recover meaningful worst-case guarantees.
I will survey some algorithmic models that try to capture uncertainty in optimization problems, talk about some example problems, and indicate some of the techniques and ideas used to tackle the uncertainty in these problems and get provable guarantees on the performance of these algorithms.
Probabilistic programming refers to the idea of using standard programming constructs for specifying probabilistic models and employing generic inference algorithms for answering various queries on these models. Although this idea itself is not new and was, in fact, explored by several programming-language researchers in the early 2000, it is only in the last few years that probabilistic programming has gained a large amount of attention among researchers in machine learning and programming languages, and that serious probabilistic programming languages (such as Anglican, Church, Infer.net, PyMC, Stan, and Venture) started to appear and have been taken up by a nontrivial number of users.
In this talk, I will explain a project that tries to address one of the most important challenges in probabilistic programming: how to build an efficient inference algorithm? The concrete goal of the project is to develop efficient variational inference algorithms that work for any probabilistic models written in an expressive probabilistic programming language such as Anglican and that, at the same time, are able to exploit the structures of these models automatically. I and my colleagues are far from achieving our goal. So far we have tried to develop a black-box variational inference algorithm for general probabilistic programs following Ranganath et al.'s guideline. During the talk, I will explain very preliminary but slightly unexpected results and observations of this project. This is joint work Raphael Monat from ENS Lyon and Yee Whye Teh from Oxford.