EPSRC Reference: |
EP/Y01720X/1 |
Title: |
Artificial intelligence methods applied to Genomic Data for improved health (AGENDA) |
Principal Investigator: |
Ennis, Professor S |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Human Development and Health |
Organisation: |
University of Southampton |
Scheme: |
Standard Research - NR1 |
Starts: |
02 October 2023 |
Ends: |
01 April 2025 |
Value (£): |
624,229
|
EPSRC Research Topic Classifications: |
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
Variation between genomes is the driving force behind inter-individual differences in health outcomes. Some patterns of genetic variation that cause disease or alter susceptibility to poor health have been very difficult to detect using limited genetic information on modest numbers of patients. But sequencing data captures the wealth of genetic variation. By 2024, at great cost to UK taxpayers, 500k patients and 100k newborn babies will have had their genomes sequenced through NHS Genomic Strategy initiatives.
Genomic data generation has outpaced the development of new methods to best realise their value. Old approaches such as genome-wide association studies, detect punctate points of common variation across the genome. These methods are statistically underpowered for sequencing data whose hallmark is rare, very rare, and unique genetic changes. As more and more people have their DNA sequenced, the complete set of observed genetic variants is expanding, but because most variants occur in few people, the data are increasingly sparse. New methods are essential to collapse the vastness of genomic data into more intuitive and useful data.
Lack of new methods means that currently, interpretation of genomic data is lagging behind data generation. Many of the new mutations found in a patient's genome are of uncertain clinical significance. This is causing huge delays in reviewing and reporting genomics test results. There is a national shortage of clinical bioinformaticians with expertise in genomic data interpretation and reporting - yet much of their valuable time is being spent on labour-intensive manual curation. Scalable, digital, knowledge-inference tools are essential to improve turnaround times so that patients can benefit from accurate diagnoses and targeted therapies.
Applied to massive cohorts, AI has the power to reveal cryptic, non-linear patterns between patient subgroups. Methods developed within this project will help dissolve the discipline specific barriers between genomicists and computer scientists. This project develops algorithms to assimilate and reduce dimensionality of immense yet sparse genomic data, into intuitive gene-level 'GenePy' matrices. For each individual variant, information on its population frequency, its conservation across species, its impact on protein function and interaction is retained. For each patient, these data are then collapsed for the variant set observed across their sequence of an entire gene, providing a pathogenic burden score - for each person, for each gene. We have demonstrated these scores accurately detect the majority of established diagnoses for thousands of Genomics England patients with recessive diseases. In addition, our hypothesis-free methods detect hundreds of causal variants missed by manual curation. These methods can be implemented by limited manpower, in a fraction of the time, for thousands of samples. This project will develop these tools to incorporate more complex genetic variants and to harness the value of long read sequencing data.
As GenePy scores scale variants to gene-level, they are intuitive input data for various modelling approaches. By mapping GenePy scores onto gene-interaction networks, topology analyses can reveal biological pathway mechanisms, therapeutic targets and identify novel biomarkers for the development of future clinical tests.
Using the existing wealth of experimentally-derived functional evidence of impact for thousands of point mutations in the human genome, AI can help us learn to interpret the most likely clinical impact of the billions of new variants we are discovering. This project uses AI to train protein modelling software to categorise genetic variants as benign, or likely to impair protein function or indeterminate and requiring additional modelling. We will define the steps required to have an end-to-end automated pipeline that can provide functional support to interpret data for personalised medicine.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.soton.ac.uk |