EPSRC Reference: |
EP/W007134/1 |
Title: |
Algebraic Invariants for Phylogenetic Network Inference |
Principal Investigator: |
Leggett, Dr R |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Research Faculty |
Organisation: |
Earlham Institute |
Scheme: |
Standard Research - NR1 |
Starts: |
04 January 2022 |
Ends: |
03 January 2023 |
Value (£): |
64,069
|
EPSRC Research Topic Classifications: |
Algebra & Geometry |
Artificial Intelligence |
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
The key goal in phylogenetics is to be able to infer the evolutionary histories of species from DNA sequence data of their living relatives. This has applications in many fields, such as tracing the mutations of viral outbreaks, understanding speciation events to aid conservation, and even tracing the histories of ancient manuscripts that were copied by hand through generations.
Most evolutionary histories can be described with a phylogenetic tree, where the "leaves" of the tree represent species that are alive today, and the vertices higher up the tree represent common ancestor species. However, for many biological problems, a tree cannot properly represent the evolutionary history of the species involved. Such problems are said to have seen "horizontal evolution". One example occurs in microbiomes, where different microbial species are able to share portions of their DNA in a process called horizontal gene transfer. This is one mechanism by which antibiotic resistance can spread between bacteria, and so being able to describe when such events have occurred has important implications for human health. To describe horizontal evolution, biologists use what's called a phylogenetic network. Here, one can use a tree structure as a backbone, onto which further edges are drawn to represent horizontal evolution events.
The problem of inferring the evolutionary histories of species where horizontal evolution has occurred is particularly challenging, and is the focus of much of the research in phylogenetics today. One method of phylogenetic inference is to use algebraic invariants. These have seen significant development for inferring evolution along a tree, and in some cases have been shown to outperform other methods. For phylogenetic networks however, very little research on algebraic invariants has been done. This project will develop and test the method of using algebraic invariants for phylogenetic network inference.
For a particular phylogenetic network, the process of evolution along it can be modelled using a type of probabilistic model called a Markov model. Under this model, one can calculate the probability of observing particular patterns of DNA at the leaves of the network, and these probabilities can be expressed as polynomials in the numerical parameters of the model. By allowing the numerical parameters to vary freely (i.e. treating them as variables) we can represent the network as the set of solutions to the equations describing the probabilities. Such a set of solutions forms an object that algebraists call an algebraic variety. Using this model gives us the advantage of being able to use the powerful machinery of algebraic geometry in determining whether observed DNA sequence data is a good fit for the network. In particular, we can describe the variety corresponding to a network by using expressions called algebraic invariants. To determine whether a particular network is a good fit for observed DNA sequence data, the idea is to calculate the frequencies of patterns in the data, and then apply the network's algebraic invariants to these frequencies. The resulting quantities will determine how closely the data matches the network.
This project will examine how effective this method is to infer phylogenetic networks from DNA sequence data. To do this, we will utilize the most recent developments in the field to calculate the invariants for a small class of phylogenetic networks. Next, we will develop a computational tool that will infer the network that best describes the evolutionary history coming from a set of DNA sequence data, by using the invariants we have calculated. We will then test our tool on both simulated DNA sequence data and real DNA sequence data, and compare the results to state of the art methods.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
|