Monday, February 1st, 2016
Cancer progression is an evolutionary process characterized by the accumulation of mutations and responsible for tumor growth, clinical progression, and drug resistance development. Evolutionary theory can be used to describe the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular profiling data. We present recent approaches to modeling the evolution of cancer, including population genetics models of tumorigenesis, phylogenetic methods of intra-tumor subclonal diversity, and probabilistic graphical models of tumor progression.
Huge amounts of diverse biomedical data - in particular cancer data - are available, but most established analysis methods were tailored to a single dataset, decreasing their power to detect disease-specific events. Here we take an integrative approach in order to exploit data from multiple studies and across many diseases while correcting for biases arising from the complexity of the data (e.g., different technologies, or cancer subtypes as well as other diseases). For example, we extracted clinically meaningful and reliable disease biomarkers by analyzing more than 14,500 gene expression profiles from more than 180 studies. We detected cancer subtype-specific differential genes by correcting both for biological and disease-ontology related biases. The detected gene sets are highly informative and integrating them with non-expression data (e.g., somatic mutations and biological networks) reveals therapeutic potential.
No abstract available.
In this short talk, we will present algorithmic approaches to questions pertaining to the identification of Breakage Fusion Bridge, Chromothripsis, episome formation, and other mechanistic explanations for amplification in the tumor genome.
Tuesday, February 2nd, 2016
This talk will focus on the combinatorial method LICHeE, designed to efficiently reconstruct multi-sample cancer cell lineage trees and infer the subclonal composition of tumor samples using somatic single nucleotide variants (SSNVs). Given a set of validated deeply sequenced SSNVs from multiple normal and tumor samples of individual cancer patients, LICHeE uses the presence patterns and variant allele frequencies (VAFs) of SSNVs across the samples as lineage markers by relying on the perfect phylogeny model (PPM), which assumes that mutations do not recur independently in different cells. This assumption allows us to formulate a set of SSNV ordering constraints, which are leveraged by LICHeE to limit the search space of the possible underlying lineage trees and evaluate the validity of the resulting topologies. In particular, LICHeE’s key strategy is to encode all possible precedence relationships among clusters of SSNVs into an evolutionary constraint network, which embeds all possible valid lineage trees and allows us to formulate the task of inferring such trees as a search for all spanning trees satisfying the derived PPM-based constraints. Due to this substantial reduction in search space, LICHeE can process large SSNV datasets in seconds. As a result, LICHeE reports a set of lineage trees that are fully consistent with the SSNV presence patterns and VAFs within each sample under PPM. Given each such tree, LICHeE also provides estimates of the subclonal mixtures of the samples by inferring sample heterogeneity simultaneously with phylogenetic cell lineage tree reconstruction. LICHeE’s effectiveness has been demonstrated on several large recently published ultra-deep-sequencing multi-sample datasets, as well as on simulated datasets. For more information, please see the following publication: Popic, V., Salari, R., Hajirasouliha, I., Kashef-Haghighi, D., West, R.B. and Batzoglou, S., 2015. Fast and scalable inference of multi-sample cancer lineages. Genome biology, 16(1), p.91.
Wednesday, February 3rd, 2016
We present a novel method for detecting and genotyping somatic structural variations (SVs) in multiple whole-genome sequencing (WGS) tumor samples, taken from a cancer patient. In contrast to standard SV discovery approaches in cancer genomes, which do not leverage phylogenetic information, we make use of the multi-sample lineage tree structure reconstructed from ultra-deep sequencing somatic SNV datasets. We demonstrate that leveraging lineage trees boosts sensitivity in detecting and genotyping of SVs. Our method effectively pools samples that share a common ancestor in the tree and finds clusters of discordant paired-end reads that suggest the same SV breakpoint across these samples. Placement of SVs onto specific branches of the lineage tree results in a more comprehensive roadmap of the tumor's genome evolution that begins at the zygote.
Thursday, February 4th, 2016
Friday, February 5th, 2016
Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. In two case studies, we demonstrate how BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.
Alternative isoform usage is known to be have an important impact in some genes related to cancer progression. Mutations of genes in the splicesome have also been reported in some cancers, which could result in widespread alteration in the splicing patterns in tumors. I will discuss in my talk methods we have developed for detecting patterns based on alternative splicing in cancer. The first approach focuses on the question of finding differential expression in splicing patterns between known groups. The other focuses on clustering methods to find clusters that are characterized by differences in alternative splicing.
Data-driven approaches to molecular classification of cancer patients for diagnosis, prognosis or drug response prediction is often challenging due to the high dimensionality of omics data, resulting in suboptimal performance in prediction and difficulty to identify robust biomarkers. A possible strategy to overcome this issue is to replace the input omics data by simpler representations more amenable to statistical learning. In this talk I will discuss two recent attempts to represent high-dimensional omics profiles by simpler, rank-based representations: one based on full-quantile normalization, where the target distribution is optimized to solve the learning problem, and one based on all pairwise comparisons, which leads to efficient learning with kernel methods. This is joint work with Marina Le Morvan and Yunlong Jiao.