GFKIT: Memoization for faster computations on genealogical forests

30. September 2024

The field of population genetics attempts to advance our understanding of evolutionary processes. It has applications, for example, in medical research, wildlife conservation, and — in conjunction with recent advances in ancient DNA sequencing technology — studying human migration patterns over the past few thousand years. The basic toolbox of population genetics includes genealogical trees, which describe the shared evolutionary history among individuals of the same species. They are calculated on the basis of genetic variations. However, in recombining organisms, a single tree is insufficient to describe the evolutionary history of the whole genome. Instead, a collection of correlated trees can be used, where each describes the evolutionary history of a consecutive region of the genome.

Our C++ library Genealogical Forest Kit (gfkit) is 2.1 to 11.2 (median 4.0) times faster than the state-of-the-art tool (tskit) on empirical and simulated datasets at computing important population genetics statistics such as the Allele Frequency Spectrum, Patterson’s f, the Fixation Index, Tajima’s D, pairwise Lowest Common Ancestors, and others. On Lowest Common Ancestor queries with more than two samples as input, gfkit scales asymptotically better than the state-of-the-art (tskit), and is thus up to 990 times faster.

See more here: https://github.com/lukashuebner/gfkit

Publication:
Lukas Hübner and Alexandros Stamatakis. Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp.5:1-5:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/LIPIcs.WABI.2024.5



About HITS

HITS, the Heidelberg Institute for Theoretical Studies, was established in 2010 by physicist and SAP co-founder Klaus Tschira (1940-2015) and the Klaus Tschira Foundation as a private, non-profit research institute. HITS conducts basic research in the natural, mathematical, and computer sciences. Major research directions include complex simulations across scales, making sense of data, and enabling science via computational research. Application areas range from molecular biology to astrophysics. An essential characteristic of the Institute is interdisciplinarity, implemented in numerous cross-group and cross-disciplinary projects. The base funding of HITS is provided by the Klaus Tschira Foundation.

Switch to the German homepage or stay on this page