In 2014 the Science journal featured an article on the bird tree of life, mentioning the essential role of algorithms and supercomputers that enable modern research in evolutionary biology for all types of living beings. Now, a decade and a giant leap in tool development later, part of the team that coordinated the computer analyses at that time co-authored another paper in Nature on the complexity of avian evolution.
Phylogenetic relationships are key to understanding the evolution of species. These relationships are typically identified by comparing, among other things, similarities in DNA or anatomical features. An international team of researchers from the “Bird 10,000 Genomes Project” (B10K) have now analyzed genomes of 363 bird species by using their intergenic regions and a plethora of computational methods. The result is a well supported tree, which however also exhibits a stunning degree of discordance.
In order to attain these results, large amounts of data are necessary to resolve discrepancies, which can be due to the diversity of species sampled, the phylogenetic method used, and the choice of genomic regions. Some of the most essential tools for processing these data were developed by the team of the Computational Molecular Evolution group (CME) at the Heidelberg Institute for Theoretical Studies (HITS), together with scientists from its sister group, the Biodiversity Computing Group (BCG) at the Institute of Computer Science (ICS) of the Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece.
Enabling research in evolutionary biology
“The new computational approaches allowed us to reconstruct over 150000 local phylogenies across the whole genome, each of which provides a small window into the evolutionary history of birds.” says Josefin Stiller, one of the lead authors of the study and former visitor of CME at HITS. “What we mainly do is to enable research in evolutionary biology via software, algorithms and model development ,” says CME group leader Alexandros Stamatakis, who also holds an EU-funded ERA chair at FORTH. “The ParGenes software, for example, which is very central for the paper, can efficiently schedule the inference of a huge number of per-gene phylogenetic trees on distinct input gene datasets on a large compute cluster. This is classic fundamental computer science as it focuses on efficient job scheduling.”
ParGenes is based on RAxML-NG, the group’s flagship phylogenetic inference tool, and Modeltest-NG, a tool for selecting the best fit statistical model of evolution for a given dataset. The NG in the tool names stands for Next Generation, which denotes a set of existing tools (mainly the group’s own tools) that were completely re-designed and re-written since 2014 to yield them more maintainable, versatile, and scalable. Especially RAxML-NG is very flexible in the sense that it seamlessly scales from the laptop to a supercompter. It was used as stand-alone tool for this paper to infer a tree on the dataset comprising the entire genomes on a supercomputer.
“Pythia”: Machine learning to predict phylogenetic difficulty“
“A late addition to this paper was the Pythia difficulty prediction developed by Julia Haag, a PhD student in my group. Given an input dataset it predicts how difficult a phylogenetic inference on that dataset will be, that is, how much signal for a single tree there is in the data, using machine learning techniques,” says Stamatakis. “As our Nature paper focuses a lot on assessing the phylogenetic and evolutionary signal in different genomic regions of the bird genome, it was a very useful addition to the paper as we can now also provide phylogenetic difficulty scores for the distinct genomic regions assessed.”
A versatile and flexible tool for researchers
The tools the CME group have developed and that are being used in this paper are all open source and extremely highly cited. Especially the RAxML-NG tool regularly enables research in different disciplines of the life sciences. During the pandemic, the tool was for example used to analyze how the distinct viral strains evolved. “In this project, we provide the basic toolbox for our fellow scientists that actually enables them to do their science,” says Stamatakis. “I personally find this very gratifying.”
Publication:
Stiller J et al: Complexity of avian evolution revealed by family-level genomes. Nature (advance online publication), 1 April 2024, DOI: 10.1038/s41586-024-07323-1
https://www.nature.com/articles/s41586-024-07323-1.
HITS, the Heidelberg Institute for Theoretical Studies, was established in 2010 by physicist and SAP co-founder Klaus Tschira (1940-2015) and the Klaus Tschira Foundation as a private, non-profit research institute. HITS conducts basic research in the natural, mathematical, and computer sciences. Major research directions include complex simulations across scales, making sense of data, and enabling science via computational research. Application areas range from molecular biology to astrophysics. An essential characteristic of the Institute is interdisciplinarity, implemented in numerous cross-group and cross-disciplinary projects. The base funding of HITS is provided by the Klaus Tschira Foundation.
This page is only available in English