Abstracts

Monday, April 11th, 2016

11:00 am11:30 am

Biological networks of an organism show how different bio-chemical entities, such as enzymes or genes interact with each other to perform vital functions for that organism. In this talk, we will discuss the computational challenges centered on uncertainty in the topology of biological networks. We will discuss our new mathematical model, which represent probabilistic networks as collections of polynomials. We show that this is a powerful model that enables solving seemingly very tough computational problems on probabilistic networks efficiently and precisely. We will demonstrate the expressive power of this model on the signal reachability problem, which computes whether an extracellular signal reaches from a membrane receptor to a reporter gene.

11:30 am12:00 pm
In protein-protein interaction (PPI) networks, or more general protein-protein association networks, functional similarity is often inferred based on the some notion of proximity among proteins in a local neighborhood. In prior work, we have introduced diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture more fine-grained notions of similarity from the neighborhood structure that we showed could improve the accuracy of network-based function-prediction algorithms. Boehnlein, Chin, Sinha and Liu have recently shown that a variant of the DSD metric has deep connections to Green's function, the normalized Laplacian, and the heat kernel of the graph. 
 
Because DSD is based on random walks, changing the probabilities of the underlying random walk gives a natural way to incorporate experimental error and noise (allowing us to place confidence weights on edges), incorporate biological knowledge in terms of known biological pathways, or weight subnetwork importance based on tissue-specific expression levels, or known disease processes. Our framework provides a mathematically natural way to integrate heterogeneous network data sources for classical function prediction and disease gene prioritization problems.
2:00 pm2:30 pm

Large-scale biological networks map functional relationships between most genes in the genome and can potentially uncover high level organizing principles governing cellular functions. Despite the availability of an incredible wealth of network data, our current understanding of their functional organization is very limited and essentially opaque to biologists. To facilitate the discovery of functional structure and advance its biological interpretation, we developed a systematic quantitative approach to determine which functions are represented in a network, which parts of the network they are associated with and how they are related to one another. Our method, named Spatial Analysis of Functional Enrichment (SAFE), detects network regions that are statistically overrepresented for a functional group or a quantitative phenotype of interest, and provides an intuitive visual representation of their relative positioning within the network. Using SAFE, we examined the most recent genetic interaction network from budding yeast Saccharomyces cerevisiae, which was derived from the quantitative growth analysis of over 20 million double mutants. By annotating the genetic interaction network with GO biological process, protein localization and protein complex membership data, SAFE showed that the network is structured hierarchically and reflects the functional organization of the yeast cell at many different levels of resolution. In addition, we analyzed the network using a large-scale chemical genomics dataset and generated a global view of the yeast cellular response to chemical treatment. This view recapitulated the known modes-of-action of chemical compounds and identified a potentially novel mechanism of resistance to the anti-cancer drug bortezomib. Our results demonstrate that SAFE is a powerful tool for annotating biological networks and a unique framework for understanding the global wiring diagram of the cell.

2:30 pm3:00 pm

Cellular processes are largely controlled by the protein-protein and protein-DNA interactions that define them.  While conservation of common protein domains can indicate which proteins are likely to engage in these interactions, how they determine what partners to interact with is a much more complicated question. Many experimental techniques have been developed to answer this question however, many are either biased towards high affinity interactions, can be labor intensive, or they require specialized equipment or expertise. To address these limitations we are expanding the application of a simple bacteria hybrid assay that employs multiple reporters simultaneously. By normalizing the output of a test reporter to the presence of a secondary reporter we are able to return outputs that are strongly correlated to the affinity of the test interaction.  We have applied this approach to measure both protein-DNA and protein-protein interactions, recovering signal above background for known, low affinity interactions that are often missed by common methodology.  We hope that continued development of this platform will allow us to harness the 109 transformation efficiency of bacteria and screen large libraries to capture the low end of affinity while providing affinity-informed specificities.

3:30 pm4:00 pm

We are interested in the causes of bronchopulmonary dysplasia (BPD), a respiratory complication of preterm birth whose etiology is the subject of ongoing debate.  Molecular causes of this disorder, and their potential relationship with lifelong respiratory health, are relatively unexplored. We consider the problem of identifying molecular pathways implicated in BPD and two pulmonary disorders affecting patients at later life stages (asthma and COPD).  In this talk, we will define the notion of "pathway centrality" in a molecular network and demonstrate how this concept can be used to find pathways potentially mediating observed expression changes in pulmonary disorders.  Our observations identify common molecular pathways and processes between all three disorders, generate novel hypotheses, and highlight developmental delays that may contribute to BPD.  A temporal modeling technique based on outlier detection methods lends additional support to the developmental delay hypothesis.

Tuesday, April 12th, 2016

9:30 am10:00 am

Rapid advances in high-throughput technologies, including next-generation sequencing, proteomics, and metabolomics, are providing exceptionally detailed descriptions of the molecular changes that occur in diseases. However, it is difficult to use these data to reveal new therapeutic insights for several reasons. Despite their power, each of these methods still only captures a small fraction of the cellular response. Moreover, when different assays are applied to the same problem, they provide apparently conflicting answers. I will show how specific network modeling approaches reveal the underlying consistency of the data by identifying small, functionally coherent pathways linking the disparate observations. These patient-specific networks may provide critical insights for targeted therapies.

10:00 am10:30 am
DNA sequencing technologies now allow the collection of somatic mutations in a large number of patients from the same cancer type. One of the main goals in the analysis of such datasets is the identification of driver mutations associated with cancer, distinguishing them from random, passenger mutations. This goal is hindered by the extensive genetic heterogeneity in cancer, with different genes mutated in different patients. This heterogeneity is due to the fact that genes and mutations act in the context of networks, with groups of interacting genes (pathways) that perform different cellular functions, and each function can be altered by mutating any of the genes in the pathway.
 
I will discuss two problems that arise in the analysis of cancer somatic mutations in network contexts. The first one is the problem of finding connected subnetworks of a large gene-gene interaction network that are mutated in a large number of patients, that we prove is NP-hard. I will present a polynomial time approximation algorithm that we designed to identify subnetworks with provable guarantees on their quality. I will also present a recent ILP formulation that identifies solutions of better quality compared to the approximation algorithm and is much faster on data from large cancer studies. 
 
The second problem is the problem of identifying subnetworks of a large gene-gene interaction network that have somatic mutations associated with survival from genome-wide mutation data of a large cohort of cancer patients. We formally define the associated computational problem by using a score for sub networks based on the test statistic of the log-rank test, a widely used statistical test for comparing the survival of two given populations. We show that the computational problem is NP-hard in general and that it remains NP-hard even when restricted to graphs with at least one node of large degree, the case of interest for gene-gene interaction networks. I will present Network of Mutations Associated with Survival (NoMAS), a novel color-coding based algorithm that we have designed to find subnetworks of a large interaction network whose mutations are associated with survival time. I will present results of the application of NoMAS on large cancer datasets.
11:00 am11:30 am

Identification and prioritization molecular alterations that potentially act as drivers of cancer remain as a crucial challenge in cancer genomics and a bottleneck in the therapeutic development. The problem is particularly complicated by extensive mutational heterogeneity observed in the cancer (sub)types, yielding a long-tailed distribution of mutated genes across the patients, possibly implying the existence of many private drivers. In order to address this problem we have developed HIT’nDRIVE, a combinatorial algorithm that integrates genomic and transcriptomic (expression) data to identify patient-specific gene alterations that can collectively influence the dysregulated transcriptome of the patient. HIT’nDRIVE aims to solve the “random-walk facility location” (RWFL) problem on a gene/protein interaction network – thus differs from the standard facility location problem by its use of “hitting time”, the expected minimum number of hops in a random-walk originating from any sequence altered gene (i.e. a potential driver) to reach an expression altered gene, as the distance measure. Interestingly, hitting time when used as a distance measure, the distance between multiple facilities and a “target” is not the minimum distance. HIT’nDRIVE reduces RWFL (with multi-hitting time as the distance) to a weighted multi-set cover problem, which it solves as an integer linear program (ILP). Applying HIT’nDRIVE to 2200 (TCGA) tumors from four major cancer types has revealed many potentially druggable driver genes, several of which happen to be private. It is also possible to perform accurate phenotype prediction for these samples by only using HIT’nDRIVE implied driver genes and their “network modules of influence” (subnetworks involving each driver gene where the aggregate expression profile correlates well with the cancer phenotype) as features, providing additional evidence that these genes may be driving the cancer phenotype. Further analysis of these modules reveals patterns of mutual exclusivity among multiple driver genes modulating oncogenic or metabolic networks.

11:30 am12:00 pm

No abstract available.

2:00 pm2:30 pm
Precision medicine needs to be data driven and corresponding analyses comprehensive and systematic.  We use models of biological systems to integrate diverse types of information. This ranges from multiple high-throughput datasets, functional annotations and orthology data to expert knowledge about biochemical reactions and biological pathways. Such integrative systems are used to develop new hypotheses and answer complex questions such as what factors cause disease; which patients are at high risk; will patients respond to a given treatment; how to rationally select a combination therapy to individual patient, etc.
 
Semantic biological pathway modeling has been studied for some time, but it is still at an early stage of development. Specifically, we discuss challenges and experiences in the design and construction of pathway representation models, as well as tools and strategies for using these models for visualization, data integration, and hypotheses generation. Such models of integrated signaling cascades may enable characterizing and in turn treating cancer successfully. Using a systematic graph theoretic analysis of relevant networks we predict and validate drug combinations for "repairing" cancer network.
2:30 pm3:00 pm
We are faced with a flood of molecular and clinical data.  Various biomolecules interact in a cell to perform biological function, forming large, complex systems. Large amounts of patient-specific datasets are available, providing complementary information on the same disease type.  The challenge is how to mine these complex data systems to answer fundamental questions, gain new insight into diseases and improve therapeutics.  Just as computational approaches for analyzing genetic sequence data have revolutionized biological understanding, the expectation is that analyses of networked “omics” and clinical data will have similar ground-breaking impacts.  However, dealing with these data is nontrivial, since many questions we ask about them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions.
 
We develop methods for extracting new biomedical knowledge from the wiring patterns of large networked biomedical data, linking network wiring patterns with function and translating the information hidden in the wiring patterns into everyday language.  We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interactions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, prediction of driver genes whose mutations trigger the onset and development of cancers, and re-purposing of drugs for treating particular cancer patient groups. Our new methods stem from network science approaches coupled with graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets.

Wednesday, April 13th, 2016

9:30 am10:00 am

The ENCODE project, via generation of unprecedented transcriptomic and epigenomic profiles, has revealed a complex layer of transcriptional regulation mediated by distal regulatory enhancers distributed throughout the human genome.  These data open up more questions than they answer. We will discuss their nuclear organization, how they can be used for better genotype-phenotype associations and their potential emergent properties by virtue of their spatial proximity.

10:00 am10:30 am

Non-coding variants implicated in genome-wide association studies (GWAS) are enriched in enhancer elements active in disease-relevant cellular contexts. Identifying context-specific target genes and downstream pathways affected by enhancers harboring regulatory variants remains a challenge. We develop novel learning algorithms that leverage the modular dynamics of gene expression and enhancer associated chromatin marks across a vast collection of diverse human cell types and tissues from the ENCODE and Roadmap Epigenomics Projects to infer highly-connected, context-specific enhancer-gene networks. Chromatin conformation maps and expression QTLs validate the superior accuracy and tissue-specificity of our predicted networks compared to existing approaches. We find that a significant proportion of enhancers do not associate with their nearest genes indicating pervasive distal regulation potentially mediated by long-range chromatin contacts. Linked enhancers significantly improve tissue-specific regression models of gene expression. Distal co-association of regulatory sequence motifs suggests synergistic regulation of genes by multiple enhancers with a key role for protein-protein interactions between lineage-specific transcription factors in mediating enhancer-promoter interactions. Networks of cooperating enhancers with shared motif composition and target genes are depleted of disease-associated variants, suggesting regulatory buffering mechanisms. We demonstrate the utility of our context-specific enhancer-gene links to predict putative target genes, biological processes and pathways of non-coding variants associated with diverse traits and diseases

11:00 am11:30 am

In network biology, a cell is commonly described as a gene regulatory network and as such a cell-type is modeled by a state-dependent system over the network. Hence, understanding the topological structures of gene regulatory network plays a crucial role in uncover the biology of cell types. The talk will cover our recent work on the topological structures and dynamics of cell-specific regulatory networks.

11:30 am12:00 pm

No abstract available.

2:00 pm2:30 pm

No abstract available. 

2:30 pm3:00 pm

Understanding and predicting phenotypic effects of gene copy number variations is for understanding for understanding the way in which cell buffers expression changes and in for diseases studies.  Genetic alterations propagate trough the molecular system disrupting biological activities within cells. In particular, it has become clear that deviations from normal gene dosage are associated with multiple disorders in a range of species including humans. Genome-wide expression profiling Drosophila melanogaster deficiency heterozygotes reveals diverse genomic responses. We have systematically examined deficiencies on the left arm of chromosome 2 and (i) characterize gene-by-gene dosage responses/compensations (ii) their impact on gene network (iii) their impact expression noise and (iv) developed methods to utilize this data to study TF-gene regulation. We show that, surprisingly, expression noise was increased by gene dosage compensation – a property of gene deletions that could contribute to the phenotypic heterogeneity of diseases associated with haploinsufficiency. Additionally, we show that both – expression chances and expression variations associated with reduced dose of transcription factors propagate through the gene interaction network, impacting a large number of downstream genes. Finally, we utilized our data to learn new regulatory interaction vie a new iterative algorithm called Rewire Network Component Analysis (Rewire_NCA) that we developed for this purpose.

3:30 pm4:00 pm

The fission yeast Schizosaccharomyces pombe has more metazoan-like features than the budding yeast Saccharomyces cerevisiae with similarly facile genetics. Yet, it is significantly under-studied with little functional genomic information available. Here, we screened the whole fission yeast proteome three times (>75 million protein pairs) to generate the first high-coverage high-quality binary interactome network for S. pombe, FissionNet, comprising ~2300 interactions among ~1300 proteins. ~50% of these interactions were previously not reported in any species. FissionNet unravels previously unreported interactions implicated in processes such as gene silencing and pre-mRNA splicing. We developed a rigorous network comparison framework that accounts for assay sensitivity and specificity, revealing extensive species-specific network rewiring between fission yeast, budding yeast, and human. Surprisingly, although genes are better conserved between the yeasts, S. pombe interactions are significantly better conserved in human than in S. cerevisiae. Our framework also reveals that different modes of gene duplication influence the extent to which paralogous proteins are functionally repurposed. Finally, cross-species interactome mapping demonstrates that coevolution of interacting proteins is remarkably prevalent, a result with important implications for studying human disease in model organisms. Overall, FissionNet is a valuable resource for understanding protein functions and their evolution.

4:00 pm4:30 pm

In this work we shift the focus of two common biological network problems: the global network alignment problem and the problem of differential analysis. We do so by moving away from identifying local structural similarities or differences, and instead embed the networks into a continuous metric space based on function. We introduce a new solution, CANDL — Coarsely Aligning Networks with Diffusion and Landmarks. Unlike previous methods that seek to conserve local motifs, this technique focuses instead on finding coherent, functionally related groups of genes across species. In the second part of the talk, we show that by using this functional embedding allows for comparison across networks concerned with differences not just similarities.

Thursday, April 14th, 2016

9:30 am10:00 am
In combined experimental and computational approaches, we want to elucidate the role of post-translational protein modifications, such as phosphorylation, for dynamic cellular processes and investigating how the large number of changing PTMs is coordinated in cellular protein networks and likewise how PTMs may modulate protein-protein interaction networks.
 
The global analysis of post-translational modification and phosphorylation-dependent interactions indicate coordinated targeting of specific molecular functions via PTMs at different levels emphasizing a protein network approach as requisite to better understand modification impact on cellular signaling. For example, local linear phosphor-motif information can be augmented with network context of kinases and phospho-proteins to substantially increase specificity in substrate-kinase assignments. We present novel approaches to enhance network based predictions of kinase-substrate relationships.
 
Network approaches are about to strongly shape our view on how specificity is achieved in cellular signal transduction and post-transcriptional regulation and may ultimately reveal the molecular changes in cellular processes that occur in human diseases such as cancer.
10:00 am10:30 am

In January of this year, the number of publicly available gene expression assays topped 1.9 million. Near the time of this workshop, there will be 2 million samples available. Our lab is developing algorithms to integrate these data into models of the underlying biological systems that can be used to discover the pathways and processes that play roles in cells' responses to their environment. One of the methods that we've developed, ADAGE, adapts techniques from deep learning to perform unsupervised extraction of co-regulated modules from  noisy publicly available data. Once trained, the ADAGE model can be applied to newly generated data to reveal the pathways altered by a newly performed experiment. This analysis, the output of which resembles a pathway analysis from commonly used software, is unsupervised and entirely data-driven. This means that the technique can be applied to systems for which gene expression data exist but no curated knowledge bases are available. Subsampling analysis suggests that there are currently about 150 organisms for which enough data exists to construct  an ADAGE model, and for many of these curated knowledge bases are unavailable or limited to homology-transferred annotations. In addition  to continuing methodological developments, we are also developing the software infrastructure to provide data-driven pathway analysis for this set of organisms.

11:00 am11:30 am
Most biological characteristics of a cell arise from the complex interactions between its numerous constituents such as DNA, RNA, proteins and small molecules. Cells use signaling pathways and regulatory mechanisms to coordinate multiple functions, allowing them to respond to and acclimate to an ever-changing environment. For such complex biological systems, representation as a parameterized network and graph theoretical analysis of this network have led to many useful insights. In this talk, I will describe some recent and past works of mine and my collaborators in synthesizing, analyzing and simplifying biological networks that utilize non-trivial higher-order connectivity properties of these networks via pathway level information such as the following:
 
• I will describe efficient computational approaches for synthesizing and simplifying signal transduction networks.
• I will describe efficient computational approaches for measuring topological degeneracy and redundancy of biological networks and resulting biological conclusions one can draw from these measurements such as the role of currency metabolites in redundancy of metabolite networks.
• I will discuss how non-trivial higher-order connectivities of regulatory networks based on the topological structure of the geodesics can lead to useful biological insights such as knock-outs of up or down regulation of one node by another.
 
Biological implications and conclusions of our results will be illustrated on published biological networks such as the ABA-induced stomatal closure network (for A. thaliana), the C. elegans metabolic network and the T-cell large granular lymphocyte (T-LGL) survival signaling network.
11:30 am12:00 pm

Protein networks are increasingly used to enrich our knowledge about disease by integrating diverse information sources such as sequence and expression data into one computational framework. In this talk I will describe two recent works that use network propagation to associate novel genes and modules with disease. I will demonstrate how the propagation methodology allows processing raw mutation and expression signals to infer disease components that cannot be readily revealed from the measured molecular data.

This is joint work with the labs of Mehmet Koyuturk and Erich Wanker.

2:00 pm2:30 pm
The world is on average growing older, with people over 60 years of age representing 11% of the global population. Because of this, and because susceptibility to diseases increases with age, studying molecular causes of aging continues to gain importance. However, human aging is hard to study experimentally due to long lifespan as well as ethical constraints. Therefore, human aging-related knowledge needs to be inferred computationally. Computational analyses of gene expression or genomic sequence data, which have been indispensable for investigating human aging, are limited to studying genes (or their protein products) in isolation, ignoring their cellular interconnectivities. But proteins do not function in isolation; instead, they carry out cellular processes by interacting with other proteins. And this is exactly what biological networks, such as protein-protein interaction (PPI) networks, model. Thus, analyzing topologies of proteins in PPI networks could contribute to our understanding of the processes of aging.

The majority of the current methods for analyzing systems-level PPI networks deal with their static representations, due to limitations of biotechnologies for PPI collection, even though cellular functioning is dynamic. For this reason, and because different data types can give complementary biological insights, we integrate current static PPI network data with aging-related gene expression data to computationally infer dynamic, age-specific PPI networks. Then, we apply a series of sensitive measures of network topology to the  dynamic PPI network data to study cellular changes with age. For example, we apply a graphlet-based measure of local network position (or centrality) of a node; graphlets are small connected induced subgraphs. By doing so, we find that while global PPI network topologies do not significantly change with age, local topologies (i.e., network centralities) of a number of genes do. We predict such genes to be key players in the processes of aging [1]. We demonstrate the credibility of our predictions by: 1) observing significant overlap between our predicted aging-related genes and known "ground truth" aging-related genes; 2) observing significant overlap between functions and diseases that are enriched in our aging-related predictions and those that are enriched in the "ground truth" data; 3) providing evidence that diseases which are enriched in our aging-related predictions are linked to human aging; and 4) validating our high-scoring novel predictions in the literature.
 
In the above work, we study network (e.g., graphlet-based) positions of a node in each individual (static) age specific PPI network "snapshot" and then simply consider time series of the results. In the process, we still overlook likely important relationships between the different snapshots. To capture the inter-snapshot relationships explicitly, we take the well-established and proven ideas behind static graphlets to the next level to develop novel theory of dynamic graphlets that are needed to allow for truly dynamic analysis of the age specific PPI networks [2]. When we apply the dynamic graphlet approach to study human aging (just as described above), this approach further improves upon our previous work in terms of the quality of aging-related predictions. Namely, our new predictions lead to better overlap with "ground truth" aging-related data as well as to more aging-relevant functional and disease enrichments. Importantly, our new approach unveils novel knowledge about human aging with high (e.g., literature) validation accuracy, thus complementing the existing aging-related knowledge.
 
1. Faisal F.E. and Milenković T. 2014. Dynamic networks reveal key players in aging. Bioinformatics 30(12):1721-29. 2. Hulovatyy Y., Chen H. and Milenković T. 2015. Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 31(12):i171-180.

 

2:30 pm3:00 pm

In systems biology, the solution space for a broad range of problems is composed of sets of functionally associated biomolecules. Since connectivity in molecular interaction networks is an indicator of functional association, such sets can be identified from connected induced subgraphs of molecular interaction networks. Applications typically quantify the relevance (e.g., modularity, conservation, disease association) of connected subnetworks using an objective function and use a search algorithm to identify sets of subnetworks that maximize this objective function. Efficient enumeration of connected subgraphs of a large graph is therefore useful for these applications, and many existing search algorithms can be used for this purpose. However, there is a lack of non-heuristic algorithms that minimize the total number of subgraphs evaluated during the search for subgraphs that maximize the objective function. In this talk, we describe and evaluate an algorithm that reduces the computations necessary to enumerate subgraphs that maximize an objective function given a monotonically decreasing bounding function.

3:30 pm4:00 pm
A longstanding puzzle in human genetics is what causes many hereditary diseases to manifest clinically in few tissues only. For example, familial mutations in BRCA1 lead primarily to breast and ovarian cancers, and familial mutations in RB1 lead primarily to retinoblastoma, while the causal BRCA1 and RB1 genes are expressed ubiquitously throughout the body without harming many other tissues. We approach this fundamental question using comparative network analyses and assess it quantitatively for the first time. Firstly, we integrated recent RNA-sequencing profiles with data of protein interactions, and constructed protein interactomes for tens of human tissues. Using this resource, we assessed the similarity between the interactomes, and then characterized different factors that contribute to the tissue specificity of over 300 hereditary diseases. 
 
The tissue-specificity of some diseases was easily explained, namely causal genes were expressed exclusively in the disease-manifesting tissue (e.g., Duchenne muscular dystrophy, caused by germline mutations in the muscle-specific DMD gene). However, this was relevant to only ~6% of the diseases in our dataset. Another factor was the level of expression of causal genes: In ~30% of the diseases, the causal gene was ubiquitously expressed yet had elevated expression preferentially in the disease tissue, suggesting that this could lead to its tissue-specific phenotypes. The most intriguing factor was identified by comparative interactome analysis. We found that in 20% of the diseases, causal proteins had increased number of tissue-exclusive protein interactions in their disease tissues compared to unaffected tissues. In several cases, these tissue-exclusive interactions highlighted previously identified disease mechanisms, showing that comparative interactome analysis provides a powerful approach for interrogating disease etiologies. Our recent finding relates to the role of causal genes paralogues. For years, it was hypothesized that paralogues that are functionally redundant with causal genes compensate for them across tissues, except for within the disease tissue. However, this hypothesis was never analyzed rigorously. Using our resource, we found strong quantitative evidence for this hypothesis. In summary, network approaches can shed a much needed light on the ways by which genetic aberrations lead to phenotypes, and can effectively enhance our currently limited understanding of the molecular basis of a large variety of hereditary diseases.

Friday, April 15th, 2016

9:30 am10:00 am
In this talk I will introduce a novel probabilistic model for combining coding variants and gene expression to identify a set of genes driving a given disease, a disease mechanism. I will show how to use gene networks as priors to improve the robustness of disease mechanism detection. I will contrast this approach to methods such as dmGWAS, that first estimate marginal importance of genes and then combine them using a network. Our simulations show that using networks as priors has several advantages, including allowing to identify larger gene sets, disjoint subnetworks and being less sensitive to the noise in the network. Our experiments indicate that using networks as priors allows to take advantage of the current knowledge about protein interactions while being less sensitive to the noise.
 
10:00 am10:30 am
Many research questions can be approached by extracting meaningful subnetwork modules from biological networks. Some approaches for this task are based on network information alone, e.g., to identify protein complexes or conserved subnetworks, while others compute modules based on additional information such as gene expression profiles or other “omics” data. 
 
I will present simple combinatorial models for two different problems: global network alignment and active module discovery. Despite their simplicity, the models lead to NP-hard optimization problems. I will present exact algorithms that build on previous work of the mathematical optimization community on the related problems of Quadratic Assignment and Prize-Collecting Steiner Trees. The resulting algorithms work well in practice, both on simulated and real-world data from various case studies. If time permits, I will describe our recent efforts to address the combined problem of finding conserved active modules.
11:00 am11:30 am
Speaker: Fengzhu Sun, USC
The increasing availability of time series data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many available analytical techniques for detecting interactions, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. We developed algorithms for LSA with/without replicates and statistical theory for evaluating its statistical significance based on the classical theory of Feller (1951) on the range of partial sums of Markov random variables with mean 0. We applied the LSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified. Recent large scale comparative studies of different methods for the identification of interactions among OTUs in metagenomics studies clearly showed the superior performance of LSA in most situations. We implemented the eLSA technique and theoretical p-value calculation into an easy-to-use analytic software package, which can be accessed at http://meta.usc.edu/softs/lsa.
11:30 am12:00 pm
We will describe a new algorithm for finding coherent and flexible modules in 3-way data, e.g., measurements of gene expression for a group of patients over a sequence of time points.  Our method can identify both core modules that appear in multiple patients and patient-specific augmentations of these core modules that contain additional genes. Our algorithm uses a hierarchical Bayesian data model and Gibbs sampling. We demonstrate its utility and advantage in analysis of gene expression time series following septic shock response, and in analyzing brain fMRI time series of subjects at rest. Networks are used to put the results in biological and medical context.
 
Joint work with D. Amar, A. Maron-Katz, D. Yekutieli (Tel Aviv University), and T. Hendler (Sourasky Medical Center)