EPSRC logo

Details of Grant 

EPSRC Reference: EP/N011317/1
Title: High-dimensional mixture model selection and alternative splicing
Principal Investigator: Savage, Dr RS
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Duke University Inst for Res in Biomed (IRB Barcelona) Mayo Clinic and Foundation (Rochester)
Department: Statistics
Organisation: University of Warwick
Scheme: First Grant - Revised 2009
Starts: 01 October 2015 Ends: 30 September 2017 Value (£): 99,528
EPSRC Research Topic Classifications:
Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
Healthcare
Related Grants:
Panel History:
Panel DatePanel NameOutcome
07 Sep 2015 EPSRC Mathematics Prioritisation Panel Sept 2015 Announced
Summary on Grant Application Form
Suppose that you are sitting alone at a table in a crowded restaurant where you can hear a mixture of several conversations. This mixed sound is not particularly interesting, but suppose you could record it and then use some sort of algorithm to extract the individual conversations that were going on in the restaurant, that would certainly be much more informative. In statistical jargon this is called a "mixture model", and there is a surprisingly large number of real-life situations where they are very useful. As a motivating application we consider an important biomedical problem called alternative splicing. Although all humans have the same genes encoded in our DNA, it turns out that each of our genes can be expressed in several variations (called splicing variants), and that each of these variations performs different functions in the organism; some may even help cause complex neurodegenerative diseases or cancer. Fortunately, technologies from recent years produce data that allow us for the first time to study this phenomenon in detail. We can now observe the overall expression of the gene from which, similar to the restaurant example, we would like to learn what are the individual contributions of each gene variant (indeed, to learn whether a given variant was even present at all). These technologies are becoming cheaper every year, and one can easily envision a nearby future where they are part of our regular medical check-ups, but solving this mixture problem poses formidable methodological and practical challenges. For instance, the number of possible solutions even when considering a single gene is larger than the number of atoms in the universe, and the required calculations can be prohibitive even on the latest computers. This example highlights some of the most important challenges that are common to many modern applications of mixture models, hence solving them would have positive implications in a much wider range of areas (e.g. technology, industry, public policy, social sciences).

In this project we aim to develop a framework that can be used to solve the alternative splicing and other challenging mixture model problems. Our first goal is to propose a novel formulation for general mixture models that has proven highly successful in other complex settings, studying both theoretical and practical aspects. In our example this formulation says that, when identifying different conversations in the restaurant, we cannot have two tables uttering exactly the same words (else these should be regarded as a single conversation). This apparently simple consideration turns out to have important mathematical consequences that greatly simplify the problem. Our second goal is to apply these general principles to solve the alternative splicing problem, where we will also bring to bear scientific considerations to ensure that the solution is useful in practice. Our third goal is to propose and study strategies to make fast and accurate calculations, which can quickly become prohibitive, so that a computer can find the solution in reasonable time. As part of this project we will provide open-source software that others can use freely for their own research or applied data analysis.

Given the technical challenges involved the bulk of the research will be carried at the Dept. of Statistics at the University of Warwick by the PI working with other members of the department and several further statistical and biomedical collaborators from prestigious overseas universities and hospitals who will be actively involved in the project, e.g. helping translate our methodology to biomedical research and clinical practice, or ensuring that our statistical predictions are indeed accurate.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.warwick.ac.uk