Monday, February 26th, 2018

9:30 am10:30 am

The Large Synoptic Survey Telescope (LSST) is a large, 8-meter, ground-based telescope that will survey half the sky every few nights in six optical bands. Starting in 2022, it will explore a wide range of astrophysical questions, ranging from discovering `killer' asteroids, to examining the nature of dark energy.

The LSST will generate up to 10 million real-time notifications to changes in the sky (either due to change of flux or motion), around a 10x increase over expected pre-LSST volumes. In this talk, I will overview the LSST data stream, talk about the current implementation plans for the processing system, and discuss challenges and opportunities in making high-volume data streams broadly useful by the astronomical community.

11:00 am12:00 pm

I describe efforts to apply statistical machine learning to large-scale astronomy datasets both in batch and streaming mode. For the past decade, feature-engineering-based approaches applied to the discovery of supernovae and the characterization of tens of thousands of variable stars led the way to novel astronomical inference. Here I will show that new auto-encoder recurrent neural network architectures, without hand-crafted features, rival those traditional methods.  Autonomous discovery and inference are part of a larger worldwide onus to federate precious (and heterogeneous) follow-up resources to maximize our collective scientific returns.

1:30 pm2:30 pm

After the ground-breaking detection of gravitational waves (GWs) from merging binary black holes (BHs) during Advanced LIGO first observing run, the field of GW astronomy made the big leagues with the dazzling discovery of the binary neutron star (NS) merger GW170817. In this talk, I will review what we have learned from recent GW detections, what questions remain open, and what are the prospects for future multi-messenger studies of the transient sky.

Tuesday, February 27th, 2018

9:30 am10:30 am

A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30 GB fusion simulation data, we observed linear speedup on 1,024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.

11:00 am12:00 pm

DOE's Office of Basic Energy Sciences (BES) operates scientific user facilities serving 10,000's of researchers each year, including light sources, neutron sources, and nano-centers. These national facilities provide scientists access to the most advanced accelerators, the fastest detectors, and the state-of-the-art experimental techniques necessary to address and solve the most important problems in material sciences, biology, chemistry, energy sciences and beyond. As these experimental resources become faster, more automated, and increasingly focused on in-situ and in-operando experiments, the data volume, velocity, and complexity are increasing exponentially as well.  SPOT Suite is an ecosystem of cooperating tools that provide data management, metadata tools, real-time data processing, and web-based visualization for DOE researchers using BES's lightsource user facilities.

1:30 pm2:30 pm
Speaker: Amedeo Perazzo, SLAC National Accelerator Laboratory

This presentation describes the data acquisition, data flow, data reduction, data management and data analysis systems for the upgrade of the Linac Coherent Light Source, LCLS-II, which will start operations at the SLAC National Accelerator Laboratory in 2020. These systems face formidable challenges due to the extremely high data throughput generated by the detectors and to the intensive computational demand for data processing and scientific interpretation.

3:00 pm4:00 pm

Around 2030, the High-Energy Large Hadron Collider (LHC) experiments will face multiple order-of-magnitude increases in data volume and rate. The high computational complexity of traditional pattern recognition algorithms will limit the experiments ability to track charged particles in real-time and consequently their capacity to discover New Physics phenomena. We discuss the essence and potential of Quantum Associative Memory (QuAM) - a quantum alternative to state-of-the-art associative memory systems used at the LHC - in the context of operational requirements of LHC real-time charged track pattern recognition. We examine the quantum circuits implementing storage and recall QuAM protocols, theoretical probability bounds, practical limits of QuAM storage capacity and difficulties of implementing the QuAM on the latest IBM Quantum Experience processors.

Wednesday, February 28th, 2018

9:30 am10:30 pm

The Large Hadron Collider provides its experiments with an unprecedented amount of data that is impossible to record in full. The trigger systems of each experiment decide whether to retain the data for further analysis, on a timescale of the order of milliseconds. This talks contributes ideas for discussion of real-time decision making in particle physics (focusing on the ATLAS, CMS and LHCb experiments). It also presents physics cases requiring novel data acquisition techniques to make the most of LHC data, such as performing measurements or searches with physics objects reconstructed and analyzed directly within the trigger system.

11:00 am12:00 pm

Extreme data rates at the Large Hadron Collider (LHC) provide a unique opportunity for advancing fundamental physics but are also a grand challenge for online and offline algorithms.  After briefly describing recent advances for online analysis at the LHC, I will focus on new techniques for accelerating data analysis offline.   Fast algorithms are becoming increasingly important offline as the size and complexity of our data continue to grow with increasingly constrained computing resources.  As an example, I will show how generative adversarial neural networks show great promise for reducing the time to perform inference by orders of magnitude.

1:30 pm2:30 pm

The NOAA Warn-on-Forecast (WoF) program is developing thunderstorm-scale (1-3-km grid spacing) numerical weather prediction (NWP) systems that will be used by National Weather Service (NWS) forecasters to improve 0-3-h forecasts of tornadoes, large hail, damaging wind, and flash flooding. Due to the inherent limitations of atmospheric observations and current NWP models, storm-scale forecasts are characterized by large uncertainty.  Storm-scale NWP systems therefore use an ensemble of O(10) members to provide probabilistic forecast information.  Ensemble analyses of storms and the larger-scale environment are generated by assimilating radar, satellite, surface, and upper-air observations every 5-15 min into the three-dimensional model system state (e.g., wind, temperature, humidity, precipitation). Data assimilation at these scales typically uses one of two approaches. The most common is the ensemble Kalman filter (EnKF), which uses statistics from the (non-linear) ensemble forecasts to estimate the covariances between observations and the system state vector to update the analysis. Another common approach uses a hybrid framework that leverages both the flow-dependent covariances provided by the EnKF and the dynamical constraints (e.g., mass conservation) provided by (fixed-covariance) variational approaches. The resulting ensemble of analyses characterizes the uncertainty in the initial atmospheric state. Varying the physics parameterization schemes among the ensemble members additionally accounts for the uncertainty in the model.  Thus, the ensemble forecast initialized from the ensemble analysis ideally provides a representative sampling of the probability distribution function of the future state of the atmosphere.


The ensemble prediction framework presents significant challenges, however. First, large forecast errors still frequently occur, and these arise partly from computational limitations imposed by the real-time requirement. For example, the NWP models cannot be run at arbitrarily fine resolution if forecasts are to be available well in advance of the event; ensemble membership sizes are likewise limited. This has motivated a series of predictability studies to evaluate the impacts of different forecast error sources. These studies will provide valuable guidance for optimizing the design of WoF systems. In addition, initial efforts are underway to apply machine learning techniques (e.g., random forests) to ensemble forecast output to mitigate the impacts of model errors.


A second major challenge is the massive data volume generated by storm-scale ensembles. Each forecast typically comprises O(10) ensemble members, O(10) prognostic and diagnostic variables, O(1,000,000) grid points, and O(10) forecast output times. Due to the rapid evolution of severe thunderstorms, forecasters do not have time to thoroughly interrogate all of this output. This requires that ensemble forecasts be post-processed and distilled into the most operationally relevant severe thunderstorm guidance. Using output from a prototype WoF prediction system that is run each spring, physical and social scientists have begun working closely with NWS forecasters to learn how to tailor the forecast guidance to their needs. The guidance is made available in real-time on a public web page, which poses an additional computational challenge.

Thursday, March 1st, 2018

9:30 am10:30 am

The US has three cities in the top five congested cities in the world - which accounts for a collective economic cost of $62bn for just these three cities. Designing transportation solutions for real-world urban scale systems has previously been accomplished with very limited analytics because of the computational scale. New large scale computational capabilities (e.g. cloud computing and supercomputing), data analytics (e.g machine learning and intelligent data compression) and modeling (e.g. dynamic traffic assignment and agent-based modeling) that scale in both time and space are now possible. By combining massive amounts of data from real-world sensors and very large road network models, both closed form analytics and emergent behaviors from large scale agent models can be used to build our understanding of urban scale problems. This discussion will focus on (1) the real-world issues of handling geospatial data at scale, the data veracity issues and intelligent data compression opportunities and (2) the use of large-scale computing to address urban scale dynamic models that can be used to examine emergent behavior under a variety of conditions.

11:00 am12:00 pm

This talk presents inference, control, and game-theoretic algorithms developed to improve traffic flow in transportation networks. The talk will investigate various factors that intervene in decisions made by travelers in large scale urban environments. We will discuss disruptions in demand due to the rapid expansion of the use of “selfish routing”apps, and how they affect urban planning.  These disruptions cause congestion andmake traditional approaches of traffic management less effective. Game theoretic approaches to demand modeling will be presented. These models encompass heterogeneous users (some using routing information, some not) that share the same network and compete for the same commodity (capacity). Results will be presented for static loading, based on Nash-Stackelberg games, and in the context of repeated games, to account for the fact that routing algorithms learn the dynamics of the system over time when users change their behavior. The talk will present some potential remedies envisioned by planners, which range from incentivization to regulation.

1:30 pm2:30 pm

We discuss several simulation technologies that have enabled science applications to resolve physical features and processes at unprecedented scales. This in turn has enabled highly-precise reduced-order models that can be used for control and optimization, targeting real-time applications. The first technology, adaptive mesh refinement, enables highly-accurate, fast solutions for localized or intermittent phenomena, although it greatly complicates software. The second breakthrough has been higher-order numerical accuracy in space and time, even in complex and moving geometries, without requiring complex mesh generation. Not only do these approaches reduce simulation error more rapidly, but they better match physical theory and can even provide error sensitivity and reduced-order models at coarse resolutions, if properly designed. The third has been the incredible increase in computational science capability from community software investments, which have adapted to HPC hardware trends to enable massively parallel, high-performance science codes. We will show some multi-scale, multi-physics results that use all three, and can help identify cheaper physical models that, with careful assessment, enable design optimization and real-time control.

3:00 pm4:00 pm

In most astronomy contexts, there are trade-offs among the three competing goals of (1) making as-efficient-as-possible measurements of quantities of interest, (2) creating openings for unanticipated discoveries (especially in real time), and (3) leaving behind public data sets of high legacy value for new investigations. I bring an economic or operational attitude to these questions: These trades could be made well if we could put quantitative measures of utility on various kinds of high-level outcomes. But I will also show that such valuation is impossible to do with precision. I discuss some of the relevant issues in the context of the Sloan Digital Sky Survey collection of cosmology experiments, the Astrometry.net image-recognition system, and ground-based radial-velocity-method exoplanet discovery projects. Expect to hear the phrase "Long-term future discounted free-cash flow" (which I will define).

Friday, March 2nd, 2018

9:30 am10:30 am

Technology has advanced to the point that it is possible to image the entire sky every night and process the data in real time. The sky is hardly static: many interesting phenomena occur, including variable stationary objects such as stars or QSOs, transient stationary objects such as supernovae or M dwarf flares, and moving objects such as asteroids and the stars themselves. Funded by NASA, we have designed and built a sky survey system for the purpose of finding dangerous near-Earth asteroids (NEAs). This system, the "Asteroid Terrestrial-impact Last Alert System" (ATLAS), has been optimized to produce the best survey capability per unit cost, and therefore ATLAS is an efficient and competitive system for finding potentially hazardous asteroids (PHAs) but also for tracking variables and finding transients. I will describe our system, the survey strategy, ATLAS-derived NEA population statistics, and some of the interesting scientific discoveries made by ATLAS and other similar surveys.

11:00 am12:00 pm

The challenges of the real-time tsunami forecasting are multiple and formidable. The main problem stem from the requirement to produce accurate forecast during very short time and based on very limited available data. The problem demands innovative mathematical and computational solutions to be implemented into real-time operational environment.
The 2004 Sumatra tsunami has triggered the efforts of intensive implementation of tsunami science results into operational tsunami warning capabilities. At present several tsunami forecast systems based on various modeling and detection capabilities are operational. Over 40 tsunamis since 2004, including the Great East Coast Japan Tsunami of 2011, initial tests for the tsunami forecast system performances. Preliminary assessment of forecast performance is now available based on the analysis of the U.S. operational tsunami inundation forecast capability. Assessing forecast performance is important to evaluate the needs for improvement and further research. Baseline of the tsunami forecast skills has now been established and will be presented based on the data from the tsunami during the decade. The goals for future improvements and future challenges are going to also be discussed.

1:30 pm2:30 pm

Astronomical surveys, justified to achieve specific science objectives, return torrents of data that are inherently likely to produce unexpected discoveries. Realization of these serendipitous science results may be as fruitful as fulfilling the original scientific objectives. In some cases, a multidisciplinary scientific outlook is necessary to  identify and effectuate these spin-offs. This talk will describe an astrophysics-inspired technology spinoff that may represent a major step forward for laboratory plasma physics experiments and in-space ion propulsion.

3:00 pm4:00 pm

The cost of sequencing has fallen to the point where prokaryotic genome sequencing is a vendor service; but while that sequence is easily obtained, determining what it actually does is expensive and somewhat risky. As a result enormous amounts of sequence data has been collected but can only be used for unsupervised learning; it cannot be mined for actionable hypotheses or leveraged for bioengineering. We know that bacteria have amazing metabolisms but we cannot begin to pinpoint where in their genomes they encode that behavior, so we cannot modify that behavior. We are building a platform for the automatic generation of massive-scale functional data to label bacterial genome sequences. With the data we will learn enough about bacterial function to enable the design of genomes with new and improved functions. We hope our efforts will yield true deep learning in genomics, which will form a solid basis for retrosynthesis — the engineering of cells to produce things that no extant cells can now make — at scale.