Abstracts

Regulatory Genomics and Epigenomics

Monday, March 7th, 2016

Algorithms for Single Cell RNAseq Analysis

9:00 am–9:30 am

Speaker: Serafim Batzoglou, Stanford University

No abstract available.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data

9:30 am–10:00 am

Speaker: Kevin Chen, Rutgers University

We are currently working on developing and applying spectral learning algorithms to epigenetics data. Recently, international consortia such as ENCODE and Roadmap Epigenomics have released massive epigenetics data sets from hundreds of human cell types with the aim of interpreting Genome-wide Association Studies for many human diseases. To analyze this data, we have implemented and extensively tested spectral algorithms for HMMs in our Spectacle software and found that they have significantly improved run time and biological interpretability compared to the EM algorithm. This is particularly important when the underlying classes are highly imbalanced, a pervasive issue in biology. To model multiple cell types, we developed novel spectral algorithms for tree structured HMMs and show that the tree model further improves our prediction of functional elements in the genome.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data

Accurate, Fast, and Model-Aware Transcript Expression Quantification with Salmon

10:00 am–10:30 am

Speaker: Carl Kingsford, Carnegie Mellon University

We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is both extremely fast and that supports rich, experiment-specific models to reduce the effects of biases of the RNA-seq protocol. Salmon does this by combining a novel technique for mapping reads to transcripts with a dual-phase stochastic inference algorithm and a feature-rich probabilistic model. These innovations allow Salmon to obtain very accurate estimates of transcript abundance, while improving on the speed of already-fast techniques such as Sailfish.

This is joint work with Rob Patro and Geet Duggal.

Accurate, Fast, and Model-Aware Transcript Expression Quantification with Salmon

Reconstructing Dynamic Signaling and Regulatory Networks

11:00 am–11:30 am

Speaker: Ziv Bar-Joseph, Carnegie Mellon University

Biological processes, including those involved in immune response and disease progression, are often dynamic. To model the regulatory and signaling networks that are activated as part of these systems we are developing methods to combine the abundant static regulatory, proteomic and epigenetic data with time series gene and miRNA expression data. The reconstructed networks characterize the pathways involved in the response, their time of activation, and the affected genes. I will present methods based on probabilistic graphical models and on combinatorial search algorithms for reconstructing these networks and will discuss application of the methods to study response to flu, HIV progression and to the analysis of single cell data.

Learning Feature-Based Protein-DNA Recognition Models from SELEX Data

11:30 am–12:00 pm

Speaker: Harmen Bussemaker, Columbia University

SELEX-seq and HT-SELEX are sequencing-based methods for elucidating the intrinsic DNA binding specificity of transcription factor (TF) complexes at high resolution. While the amount of raw information that modern SELEX provides is unprecedented, the computational methods for building DNA recognition models (“motifs”) from these data are still far from mature. The standard is to tabulate of the relative enrichment of each oligomer of a given length, for which we have developed efficient software. Unfortunately, having to use oligomer tables as an intermediate step for feature-based analysis has two key disadvantages: (i) limited range over which readout can be analyzed, as counts decrease exponentially with footprint size; and (ii) requirement for prior ad hoc sequence-based alignment of different oligomers. We present a new and versatile framework for motif discovery from SELEX data that overcomes these limitations. It uses a hierarchical maximum likelihood approach to fit a feature-based biophysically motivated protein-DNA recognition model directly to the raw SELEX data. This allows us to consider base and shape readout in more detail and over a larger footprint than was possible before, which we illustrate using data for the steroid hormone receptors AR and GR. We can now for the first time analyze shape readout for TFs with low binding specificity, which we demonstrate using Hox monomer data.

Beyond Enhancer Modularity: Locus-level Control of Gene Expression in Development

2:00 pm–2:30 pm

Speaker: Angela DePace, Harvard University

Variation in gene regulation is thought to underlie much of human development, disease and evolution. However, we cannot predict which sequence changes will affect gene expression and in turn, downstream organismal phenotypes such as susceptibility to disease. Regulatory sequence variants occur at multiple DNA length-scales, from single nucleotide polymorphisms, to large-scale rearrangements and duplications. Most studies have focused on the effect of polymorphisms on transcription factor binding within enhancers, but the most common type of

variation in regulatory DNA is structural. These variants affect locus-level properties, such as redundancy and the distance of enhancers from each other and their target promoter. If enhancers function as independent modules, as is commonly assumed, this type of sequence variation should have little or no effect; however, we and others are questioning this fundamental assumption. I will present evidence that enhancers do not act as independent modules and discuss the resulting challenges for how we annotate and interpret regulatory DNA sequence.

Co-occupancy Networks for Histone Modifications and Chromatin Associated Proteins

2:30 pm–3:00 pm

Speaker: Martin Vingron, Max Planck Institut für molekulare Genetik

This talk will focus on modeling the relationship between histone modifications and gene expression, together with the correlations and anticorrelations among modificiations and chromatin modifiers. Some methods for analysis and visualization will be explained and examples of their application will be given. Network construction methods will be explored with the goal of identifying direct interactions of modifications and chromatin modifiers.

Co-occupancy Networks for Histone Modifications and Chromatin Associated Proteins

Scoring Transcript Variation in Single Cell RNA-seq Data

3:00 pm–3:30 pm

Speaker: Xiuwei Zhang, EMBL

Single cell RNA-sequencing reveals the differences in gene and exon expression levels across individual cells. In particular, recent studies showed considerable difference in the distributions of reads from different cells for the same gene. This variation of isoform usage across single cells was not observed from bulk RNA-seq data. We seek to quantify this variation, understand the sources of the variation, and identify the patterns of the different in isoform usage. To quantify the variation, we have developed a profile-variation (PV) score for each gene while accounting for various confounding factors in the data, and this score allows us to extract genes with highly variable read density profiles across cells.

Based on the PV score we can study the sources of the transcript variation. Gene Ontology analysis of genes with high PV reveals two levels in the isoform variation in terms of gene functions. As we analyzed date sets from different cell types, we found that the first level of functions are common for all cell types, whereas the second level of functions is cell type specific, for example, immunology related functions in activated T helper cells. We further studied the patterns of the isoform usage across cells. Although we found genes which switch isoforms between cell types, they do not switch in a correlated manner, showing high stochasticity in isoform generation in single cells. Finally, we show that applying our PV score on single cell RNA-seq data finds genes which are not detected on bulk RNA-seq data with traditional methods to be differentially spliced, and these genes potentially represent the gradual change from one cell type to another.

Scoring Transcript Variation in Single Cell RNA-seq Data

Tuesday, March 8th, 2016

Mapping Nucleosome Positions Using DNase

9:00 am–9:30 am

Speaker: Alexander Hartemink, Duke University

Although deoxyribonuclease I (DNase) was used to probe the structure of the nucleosome in the 1960s and 70s, in the current high-throughput sequencing era, DNase has mainly been used to study genomic regions where nucleosomes are absent. Here, we show that DNase can be used to precisely map the (translational) positions of in vivo nucleosomes genome-wide. Specifically, exploiting a distinctive DNase cleavage profile within nucleosome-associated DNA, we develop a Bayes-factor–based method that can be used to map nucleosome positions along the genome. Compared to methods that require genetically-modified histones, our DNase-based approach is easily applied in any organism, which we demonstrate by producing maps in yeast and human. Compared to MNase-based methods that map nucleosomes based on cuts in linker regions, we utilize DNase cuts both outside and within nucleosomal DNA; the oscillatory nature of the DNase I cleavage profile within nucleosomal DNA enables us to identify translational positioning details not apparent in MNase digestion of linker DNA. Because the oscillatory pattern corresponds to nucleosome rotational positioning, it also reveals the rotational context of transcription factor (TF) binding sites. We show that potential binding sites within nucleosome-associated DNA are often centered preferentially on an exposed major or minor groove. This preferential localization may modulate TF interaction with nucleosome-associated DNA as TFs search for binding sites.

Computational Genomics of Post-Transcriptional Gene Regulation

9:30 am–10:00 am

Speaker: Uwe Ohler, Humboldt University/Max-Delbruck Center

High throughout sequencing technologies are now allowing us to interrogate intermediate layers of gene expression from nascent transcription to translation. At the same time, new sequencing protocols can help to determine where RNA-binding proteins interact with target transcripts and control these different layers of post-transcriptional gene regulation. These new protocols require and motivate dedicated computational approaches to analyze the resulting noisy data. I present our recent and ongoing projects to identify and analyze interactions of RNA-binding proteins and ribosomes on coding and non-coding transcripts.

Quantitative Modeling of Transcription Factor Binding Specificities using DNA Shape

10:00 am–10:30 am

Speaker: Remo Rohs, University of Southern California

Our current knowledge of genome function is the result of sequence-based data in the form of one-dimensional strings of letters. However, DNA-binding proteins recognize the double helix as a three-dimensional object. Therefore, an understanding of transcription factor (TF) binding specificity must ultimately include DNA shape. The sequence-structure relationship in DNA is highly degenerate, and different nucleotide sequences can give rise to the same structure, while single nucleotide sequence variants sometimes change DNA shape over a region of several base pairs. To explore these effects on a genomic scale, we developed a method for the high-throughput DNA shape features. We used these structural features to augment nucleotide sequence in binding specificity models derived from statistical machine learning approaches. Based on data derived from high-throughput binding assays for many TFs from diverse protein families, we demonstrated that shape-augmented models are generally more efficient than existing sequence models in terms of accuracy, number of features, and computation time. Our models provide information on the importance of specific DNA sequence and shape features and thus reveal TF family-specific readout mechanisms and better explain why a given TF binds in vivo to a specific genomic target site.

Deep Learning Frameworks for Regulatory Genomics and Epigenomics

11:00 am–11:30 am

Speaker: Anshul Kundaje, Stanford University

We present novel deep learning frameworks capable of learning jointly from raw DNA sequence and diverse functional genomic profiling experiments to learn fundamental predictive relationships between regulatory sequence, chromatin architecture, chromatin state and transcription factor binding. Recently, the ATAC-seq assay was developed to simultaneously profile chromatin accessibility and architecture of regulatory elements from low input samples based on direct in vitro transposition of sequencing adaptors into native chromatin. We train multi-task, multi-modal deep convolutional neural networks (CNNs) on a novel 2D representation of ATAC-seq data that leverages subtle patterns in insert-size distributions to simultaneously predict multiple histone modifications, combinatorial chromatin state and binding sites of a key insulator protein (CTCF) with high accuracy. Models trained on related assays such as DNase-seq and MNase-seq data also achieve high performance genome-wide and across cell-types supporting a fundamental predictive mapping between local chromatin architecture and chromatin state. We develop novel feature importance scores and visualization methods to extract biologically meaningful predictive patterns from deep neural networks. We further present new deep hybrid architectures consisting of convolutional and recurrent layers to predict in-vivo transcription factor binding events and learn regulatory sequence grammars from raw DNA sequence and chromatin accessibility profiles across cell types and tissues. Our methods potentially enable detailed characterization of context-specific regulatory landscapes from low input samples of rare cell types using a single assay.

Deep Learning Frameworks for Regulatory Genomics and Epigenomics

Genome in 3D: Modeling Chromosome Organization

11:30 am–12:00 pm

Speaker: Leonid Mirny, MIT

Chromosome Conformation Capture technique (Hi-C) provides comprehensive information about frequencies of spatial interactions between genomic loci. Inferring 3D organization of chromosomes from these data is a challenging biophysical problem. We develop a top-down approach to biophysical modeling of chromosomes. Starting with a minimal set of biologically motivated interactions we build ensembles of polymer conformations that can reproduce major features observed in Hi-C experiments. I will present our work on modeling organization of human metaphase and interphase chromosomes. Our works suggests that active processes of loop extrusion can be a universal mechanism responsible for formation of domains in interphase and chromosome compaction in metaphase.

Genome in 3D: Modeling Chromosome Organization

Incomplete MyoD Transdifferentiation is Mediated by Chromatin Remodeling Deficiencies

2:00 pm–2:30 pm

Speaker: Raluca Gordan, Duke University

MyoD is a master transcription factor critical for normal muscle development. Overexpression of MyoD has been shown to transdifferentiate cells from many non-muscle lineages into cells with muscle-like expression and phenotypic characteristics. However, expression studies on MyoD transdifferentiated cells show that only a fraction of myogenic genes are upregulated in response to MyoD. In addition, many cell-type specific genes are not silenced during MyoD transdifferentiation. The reasons for this incomplete transdifferentiation are unknown, and this characteristic is common for transdifferentiation of other master regulators.

In this study, we aimed to more fully understand the mechanism of MyoD-induced transdifferentiation of fibroblasts, and to identify potential reasons behind incomplete transdifferentiation. To this end, we have analyzed global gene expression (RNA-seq), chromatin accessibility (DNase-seq), and MyoD binding (ChIP-seq) on human primary skin fibroblasts that have been transduced with inducible human MyoD. To determine if transdifferentation was complete, we compared these data to expression and chromatin accessibility profiles generated from primary human myoblasts and skin fibroblasts. Comparing these three cell types revealed that MyoD contributed to a wide spectrum of DNaseI hypersensitive (DHS) site changes, with some DHS sites being completely reprogrammed, while others are either not reprogrammed or partially reprogrammed. Analysis of MyoD ChIP-seq data in the transdifferentiated cells revealed that the vast majority of myoblast-specific DHS sites that open up, i.e. are reprogrammed, have a strong MyoD ChIP-seq signal. Conversely, non-reprogrammed myoblast-specific DHS sites are not bound by MyoD, despite the fact that many of these sites contain MyoD binding motifs. The mechanisms that drive MyoD binding and chromatin reprogramming at some DHS sites but not others are not currently known.

We used two classification approaches, elastic net and random forest, to identify genomic and epigenomic features that can distinguish reprogrammed versus non-reprogrammed myoblast-specific DHS sites. Interestingly, the most predictive features were the affinity of putative MyoD sites and the affinities of MyoD cofactor sites (such as Meis1), which were higher in the reprogrammed DHS sites. In addition, non-reprogrammed DHS sites were enriched for binding sites of SAND-domain transcription factors. Interestingly, one member of the SAND family, Ski, has been shown to convert non-muscle quail cells to muscle cells, which leads us to hypothesize that overexpression of Ski, in addition to MyoD, could increase transdifferentiation efficiency through reprogramming of DHS sites not opened by MyoD.

A similar classification analysis of reprogrammed versus non-reprogrammed fibroblast-specific DHS sites (i.e. sites that are open in fibroblast but closed in myoblast) identified several epigenomic features predictive of reprogramming status. Active histone marks such as H3K4me2, H3K4me3, H3K27ac, H2az, and H3K9ac, were enriched at non-reprogrammed fibroblast-specific DHS sites, suggesting that these active marks may be responsible for maintaining the chromatin in an open state.

Combined analyses of chromatin accessibility (DNase-seq) and gene expression (RNA-seq) data revealed that, as expected, reprogrammed DHS sites are enriched around myoblast-specific genes that are upregulated in response to MyoD. However, we also found a subset of myogenic genes that are not upregulated during the transdifferentiation process, despite the fact that they have undergone substantial chromatin remodeling. This indicates that, for this subset of myogenic genes, MyoD is capable of inducing a chromatin state that is similar to that of primary muscle lineages, but additional factors are necessary for more complete reprogramming at the gene level.

The Enhancer Mutation Problem: Figuring out Phenotype Based on Genotype

2:30 pm–3:00 pm

Speaker: Nadav Ahituv, UCSF

Mutations in enhancers can lead to a wide range of phenotypes, including Mendelian disease; however, we are currently limited in predicting the phenotypic impact of these mutations. With whole-genome datasets becoming commonly available, we need to obtain a better understanding of the functional consequences of nucleotide variants in enhancer sequences. Here, we will present the SHH limb enhancer, termed also as the zone of polarizing activity (ZPA) regulatory sequence (ZRS), as a case study. Several labs including ours have detected mutations in this enhancer that can lead to various limb malformations. Point mutations in this enhancer usually cause polydactyly and triphalangeal thumb, but there are other specific single nucleotide changes in the ZRS that cause a more severe limb phenotype and nucleotide variants that don’t lead to an observable phenotype. Our current tools, both computational and functional, are limited in their ability to predict the phenotypic impact of a novel mutation in this enhancer. Using massively parallel reporter assays (MPRAs) combined with computational tools, we are attempting to address this problem. By designing MPRAs to learn regulatory grammar or to carry out saturation mutagenesis of every possible nucleotide change in the ZRS and other disease causing enhancers, we are increasing our understanding of the phenotypic consequences of enhancer mutations.

Wednesday, March 9th, 2016

Understanding DNA-binding Preferences of Transcription Factor Homologs

9:00 am–9:30 am

Speaker: Trevor Siggers, Boston University

Many eukaryotic transcription factor (TF) families comprise multiple members with highly similar DNA-binding specificity. A fundamental problem in modeling eukaryotic gene regulatory networks is identifying and modeling factor-specific differences of TF homologs. High-throughput (HT) biochemical approaches for measuring protein-DNA binding provide the rich datasets needed to identify TF-specific preferences. I will present work using protein-binding microarrays (PBMs) to characterize DNA-binding preferences of TF homologs. I will discuss computational approaches and challenges to identify TF-specific binding preferences from PBM datasets. Finally, I will discuss approaches we are using to understand and model homolog-specificity in gene regulatory networks.

Challenges in Sequence-to-Expression Modeling

9:30 am–10:00 am

Speaker: Saurabh Sinha, University of Illinois Urbana-Champaign

No abstract available.

Modeling Gene Expression and Chromatin State in Terms of Regulatory Sites

10:00 am–10:30 am

Speaker: Erik Van Nimwegen, Basel University

No abstract available.

Modeling Gene Expression and Chromatin State in Terms of Regulatory Sites

Selecting Genomics Assays

11:00 am–11:30 am

Speaker: William Stafford Noble, University of Washington

Genomic sequencing assays such as ChIP-seq and DNase-seq can measure a wide variety of types of genomic activity, but the high cost of sequencing limits the number of these assays that are usually performed in a given experimental condition. I will discuss a principled method for selecting which genomics assays to perform, given a limited budget. The method relies upon optimization over submodular functions, which are discrete set functions that have properties analogous to certain continuous convex functions. I will also show how a similar submodular optimization approach can be brought to bear on the problem of selecting a representative subset of protein sequences from a large database.

I will also describe some of our work developing methods for using unsupervised machine learning to interpret large, heterogeneous collections of genomic data. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain.

Selecting Genomics Assays

Elucidating Sequence-Structure Binding Motifs by Uncovering Selection Trends in HT-SELEX Experiments

11:30 am–12:00 pm

Speaker: Teresa Przytycka, National Institutes of Health

No abstract available.

Personal Transcription Factor Binding Site Mutations Point to Personal Medical Histories

2:00 pm–2:30 pm

Speaker: Gill Bejerano (Stanford University)

I will show that erosion of gene regulation by mutation load appears to significantly contribute to observed heritable phenotypes that manifest in the medical history. The test I will introduce exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them.

Personal Transcription Factor Binding Site Mutations Point to Personal Medical Histories

Telomere Length, Nature and Nurture

2:30 pm–3:00 pm

Speaker: Martin Kupiec, Tel Aviv University

Telomeres protect the chromosome ends and play important roles in aging and cancer development. We have systematically screened libraries of the yeast Saccharomyces cerevisiae for mutants with altered telomere length. Our work uncovered ~400 TLM (telomere length maintenance) genes responsible for a strict telomere length homeostasis. These genes, most of which are evolutionarily conserved, span a broad range of functional categories and different cellular compartments. Further work followed both “vertical” (Molecular Biology) and “horizontal” (Systems Biology) approaches. The “vertical” approach aims to explore the role of individual genes in telomere length maintenance using genetic, molecular biology and biochemical methodologies. In the “horizontal” approach a bird’s eye view of the system is obtained by combining molecular and systems biological methods. We have started to chart the cellular network underlying telomere length, revealing a complex set of genetic interactions responsible for the very tight length homeostasis. In addition, we have found that environmental cues can affect telomere length and we have started to investigate the interphase between this intricate genetic network and environmental signals that affect telomere length. Thus, for the first time, it is possible to study the interphase between genome and environment (nature and nurture) in a system in which almost all the genetic “players” are known, and the environment affects them.

Telomere Length, Nature and Nurture

Thursday, March 10th, 2016

Hierarchical Regulatory Domain Inference from Hi-C Data

9:00 am–9:30 am

Speaker: Bartek Wilczynski, University of Warsaw

When we study gene regulation, majority of computational models developed historically, ignore the spatial position of genes, as we had almost no information on chromosome structure in living cells at the resolution required for gene expression regulation models. In recent years, several different experimental procedures based on chromosome conformation capture have been developed to probe the contacts between chromosome and combined with next-generation sequencing, they gave us unprecedented insight into the relative distances between various parts of chromosomes in different cell types. From the computational point of view, the chromosome contact matrices pose multiple challenges and interesting problems: from data normalization and statistical testing of contact significance to the more complicated questions regarding the modular structure of regulatory domains. I will talk about two computational approaches: SHERPA (Simple HEuRistic Pearson Aggregation) and OPPA (Optimal PCA-like Pearson Aggegation) that aim at finding the optimal division of chromosomes into hierarchical domain structure and I will give examples where we can see that such approach gives better results than classical division into flat topological domain structure.

Hierarchical Regulatory Domain Inference from Hi-C Data

Analysis Methods for Single Cell RNA-seq with Application to T-cell Function

9:30 am–10:00 am

Speaker: Nir Yosef, UC Berkeley

No abstract available.

Genome-wide Prediction of Enhancers and Their Target Genes using ENCODE data

10:00 am–10:30 am

Speaker: Zhiping Weng, University of Massachusetts

The Encyclopedia of DNA Elements (ENCODE) Consortium has generated tens of thousands of high-throughput genomic datasets with the goal of cataloging all of the functional elements of the human genome. Now, our goal is to integrate these complex data types to annotate regulatory elements such as enhancers and create an encyclopedia of elements for the human and mouse research communities.

We began by analyzing enhancer prediction methods. We tested many different models incorporating data such as DNase-seq, histone mark ChIP-seq, and DNA methylation. We evaluated our methods using experimentally validated enhancer regions from the VISTA enhancer database on four embryonic mouse tissues: limb, hindbrain, midbrain, and neural tube. Overall, the best performing method was centering predictions on DNase peaks and ranking these peaks by the average rank of DNAse and H3K27ac signal. We then applied this method to all mouse and human cell types in ENCODE.

After identifying candidate enhancers, we next sought to identifying the target genes of these regions. In order to evaluate different methods, we created training/validation/test datasets from promoter capture Hi-C datasets in GM12878. We began by analyzing correlation based methods where enhancer-gene links are predicted by high correlation of DNase or H3K27ac signal across multiple cell types. While these methods have previously been used in the literature, we found that they performed poorly (AUROC=0.6 , AUPR=0.06). We then decided to use a Random Forest based approach which would incorporate additional data such as distance between the gene and enhancer, average DNase and H3K27ac signals as well as correlation. Though model had a substantial increase in performance (AUROC=0.78, AUPR=0.16) there is still a great deal of improvement that can be made. We hope to add additional features to our model as well as find the best performing model with limited features that can be applied across many different ENCODE cell types.

Genome-wide Prediction of Enhancers and Their Target Genes using ENCODE data

Utilizing de Bruijn Graphs in Universal Sequence Design for Discovery of Regulatory Elements

11:00 am–11:30 am

Speaker: Ron Shamir, Tel-Aviv University and Yaron Orenstein, MIT

Recent technological advancements allow the measurement of protein binding to thousands of DNA or RNA probes on a single microarray. Since the space on the array is limited, the challenge is how to efficiently generate a minimum-size set of sequences that together cover all k-mers. In this talk, we will first introduce de Bruijn graphs and their applications in efficient coverage of DNA k-mers. Then, we will describe a generalization of the problem, in which the sequences are required to obtain certain properties (e.g., unstructured RNAs). We will prove that in this formalization the problem is NP-hard and give a (infeasible) approximation algorithm. We will present a heuristic based on random walks in de Bruijn graphs which works well in practice. If time allows, we shall discuss questions arising in analysis of novel high throughput in vitro methods for motif discovery.

Utilizing de Bruijn Graphs in Universal Sequence Design for Discovery of Regulatory Elements

Modelling Gene Expression Dynamics with Gaussian Processes

11:30 am–12:00 pm

Speaker: Magnus Rattray, University of Manchester

Gaussian processes provide a convenient and flexible class of non-parametric model for temporal and spatial data. We are applying Gaussian processes in a range of biological applications involving high-throughput time course data, e.g. modeling the elongation dynamics of polymerase, uncovering mRNA production delays, inferring regulatory networks and most recently identifying perturbations and bifurcations from high-throughput expression data. I will provide an overview of Gaussian process inference and describe some of our recent work in modeling gene expression dynamics.

Modelling Gene Expression Dynamics with Gaussian Processes

Discovery of Transcription Factor Binding Motifs from Large Sequence Sets

2:00 pm–2:30 pm

Speaker: Esko Ukkonen, Helsinki University

The position weight matrix (PWM) model of binding site motifs of transcription factors specifies a multinomial distribution of sequences that has only one dominating seed sequence. To make the model more accurate, one can use several seeds and also utilize the fact that the transcription factors not only bind to DNA but also to each other, forming dimeric and higher order regulatory complexes. Moreover, the internal dependencies possibly present within the motif should be represented by the model. The talk will describe developments in modeling and predicting binding motifs, using multiseed models that include mixtures of monomeric and dimeric PWMs, and are learned from large sequence sets.

Chromatin Dynamics and Gene Regulation During Motor Neuron Programming

2:30 pm–3:00 pm

Speaker: Mahmoud Ibrahim, Max-Delbrueck-Center, Berlin

No abstract available.