EPSRC logo

Details of Grant 

EPSRC Reference: EP/N014189/1
Title: Joining the dots: from data to insight
Principal Investigator: Brodzki, Professor J
Other Investigators:
Frey, Professor JG Forster, Professor J Smith, Professor PJS
Djukanovic, Professor R Niranjan, Professor M Gilmour, Professor S
Skipp, Professor P
Researcher Co-Investigators:
Project Partners:
Department: Sch of Mathematical Sciences
Organisation: University of Southampton
Scheme: Standard Research
Starts: 01 December 2015 Ends: 30 November 2019 Value (£): 1,218,040
EPSRC Research Topic Classifications:
Algebra & Geometry Information & Knowledge Mgmt
Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
03 Sep 2015 Making Sense From Data Panel - Full Proposals Announced
Summary on Grant Application Form
The relentless growth of the amount, variety, availability, and the rate of change of data has profoundly transformed essentially all aspects of human life. The Big Data revolution has created a paradox: While we create and collect more data than ever before, it is not always easy to unlock the information it contains. To turn the easy availability of data into a major scientific and economic advantage, it is imperative that we create analytic tools that would be equal to the challenge presented by the complexity of modern data.

In recent years, breakthroughs in topological data analysis and machine learning have paved the way for significant progress towards creating efficient and reliable tools to extract information from data.

Our proposal has been designed to address the scope of the call as follows.

To 'convert the vast amounts of data produced into understandable, actionable information' we will create a powerful fusion of machine learning, statistics, and topological data analysis. This combination of statistical insight, with computational power of machine learning with the flexibility, scalability, and visualisation tools of topology will allow a significant reduction of complexity of the data under study. The results will be output in a form that is best suited to the intended application or a scientific problem at hand. This way, we will create a seamless pathway from data analysis to implementation, which will allow us to control every step of this process. In particular, the intended end user will be able to query the results of the analysis to extract the information relevant to them. In summary, our work will provide tools to extract information from complex data sets to support user investigations or decisions.

It is now well established that a main challenge of Big Data is how 'to efficiently and intelligently extract knowledge from heterogeneous, distributed data while retaining the context necessary for its interpretation'. This will be addressed first of all by developing techniques for dealing with heterogenous data. A main strength of topology is its ability to identify simple components in complex systems. It can also provide guiding principles on how to combine elements to create a model of a complex system. It also provides numerical techniques to control the overall shape of the resulting model to ensure that it fits with the original constraints. We will use the particular strengths of machine learning, statistics and topology to identify the main properties of data, which will then be combined to provide an overall analysis of the data. For example, a collection of text documents can be analysed using machine learning techniques to create a graph which captures similarities between documents in a topological way. This is an efficient way to classify a corpus of documents according to a desired set of keywords. An important part of our investigation will be to develop robust techniques of data fusion. This is important in many applications. One of our main applications will address the problem of creating a set of descriptors to diagnose and treat asthma. There are five main pathways for clinical diagnosis of asthma, each supported by data. To create a coherent picture of the disease we need to understand how to combine the information contained in these separate data sets to create the so called 'asthma handprint' which is a major challenge in this part of medicine.

Every novel methodology of data analysis has to prove that its 'techniques are realistic, compatible and scalable with real- world services and hardware systems'. The best way to do that is to engage from the outset with challenging applications , and to ensure that theoretic and modelling solutions fit well the intended applications. We offer a unique synergy between theory and modelling as well as world-class facilities in medicine and chemistry which will provide a strict test for our ideas and results.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.soton.ac.uk