PhD Proposal: Accurate and Scalable Phylogenetic Tree Estimation via Graph Cuts

Talk
Yunheng Han
Time: 
05.09.2024 14:00 to 15:00
Location: 

IRB 3137

An evolutionary tree, commonly referred to as a phylogeny, depicts the relationships among biological entities, like species or organisms. Phylogenetic trees provide insight into key questions related to biodiversity, evolution, ecology, and genetics. However, they are difficult to estimate accurately because of complex evolutionary processes, the hardness of the related optimization problems, and incomplete and inaccurate data in practice. To address these challenges, I introduce TREE-QMC, a new method for estimating a species tree from gene trees based on four-leaf trees, called quartets. TREE-QMC is based on the divide-and-conquer approach proposed by Sagi Snir and Satish Rao (2010). My major contribution is showing that a mathematical object called “the quartet graph” can be directly built from gene trees (without explicitly enumerating quartets). This enables TREE-QMC to run in cubic time, making it the first method to break the quartic time barrier (without down-sampling the input) since the divide-and-conquer approach was proposed in 2010. Moreover, TREE-QMC improves accuracy under challenging scenarios by normalizing the quartet weights to account for “artificial taxa”, which are introduced during the divide phase so subproblem solutions can be combined during the conquer phase. I recently showed how to extend the algorithm to incorporate extra information including branch lengths and support values in weighting quartets, further improving robustness to error-prone inputs, including homology errors. Going forward, I propose future directions to extend the TREE-QMC approach to reconstruct phylogenetic networks rather than trees because tree representation is insufficient for many evolutionary scenarios, like hybridization or horizontal gene transfers.