Fast and Scalable Inference of Cancer Cell Lineages Using Multi-sample Deep Sequencing Somatic SNVs
Calvin Lab
This talk will focus on the combinatorial method LICHeE, designed to efficiently reconstruct multi-sample cancer cell lineage trees and infer the subclonal composition of tumor samples using somatic single nucleotide variants (SSNVs). Given a set of validated deeply sequenced SSNVs from multiple normal and tumor samples of individual cancer patients, LICHeE uses the presence patterns and variant allele frequencies (VAFs) of SSNVs across the samples as lineage markers by relying on the perfect phylogeny model (PPM), which assumes that mutations do not recur independently in different cells. This assumption allows us to formulate a set of SSNV ordering constraints, which are leveraged by LICHeE to limit the search space of the possible underlying lineage trees and evaluate the validity of the resulting topologies. In particular, LICHeE’s key strategy is to encode all possible precedence relationships among clusters of SSNVs into an evolutionary constraint network, which embeds all possible valid lineage trees and allows us to formulate the task of inferring such trees as a search for all spanning trees satisfying the derived PPM-based constraints. Due to this substantial reduction in search space, LICHeE can process large SSNV datasets in seconds. As a result, LICHeE reports a set of lineage trees that are fully consistent with the SSNV presence patterns and VAFs within each sample under PPM. Given each such tree, LICHeE also provides estimates of the subclonal mixtures of the samples by inferring sample heterogeneity simultaneously with phylogenetic cell lineage tree reconstruction. LICHeE’s effectiveness has been demonstrated on several large recently published ultra-deep-sequencing multi-sample datasets, as well as on simulated datasets. For more information, please see the following publication: Popic, V., Salari, R., Hajirasouliha, I., Kashef-Haghighi, D., West, R.B. and Batzoglou, S., 2015. Fast and scalable inference of multi-sample cancer lineages. Genome biology, 16(1), p.91.