EPSRC logo

Details of Grant 

EPSRC Reference: EP/V002694/1
Title: New challenges in robust statistical learning
Principal Investigator: Cannings, Dr T I
Other Investigators:
Researcher Co-Investigators:
Project Partners:
BIOS Health Ltd Cambridge Cancer Genomics
Department: Sch of Mathematics
Organisation: University of Edinburgh
Scheme: New Investigator Award
Starts: 01 May 2021 Ends: 30 April 2024 Value (£): 266,366
EPSRC Research Topic Classifications:
Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
24 Nov 2020 EPSRC Mathematical Sciences Prioritisation Panel November 2020 Announced
Summary on Grant Application Form
In recent years, our ability to collect, store and process vast amounts of data, coupled with rapid advances in technology, have led to the widespread adoption of data-driven decision-making. This includes new application areas, such as precision medicine, where doctors are using data to inform their diagnoses and treatment recommendations. In other areas, such as finance, banks use huge amounts of historical data in order to decide whether a new customer is likely (or not) to default on their loan repayments. It is often the case that we are required to make a discrete prediction about some future patient or customer, based on some (training) data relating to existing patients. In statistics, problems of this type are called classification problems.

Many methods for classification are built on the assumption that any future data we may encounter has the same distribution as our training data. Of course, this assumption is not always valid -- data relating to one set of patients or customers will not necessarily follow the same distribution as data from a new set of people. In this research, we will develop new robust classification algorithms that can deal with noisy and incomplete data. In particular, the new methodology will enable practitioners to combine multiple sources of noisy data, propose modifications to existing methods in order to guarantee they are robust to corruptions in the data, and introduce novel ways of overcoming the issues caused by missing data. We will also provide new theoretical understanding of the limitations of decision-making algorithms when faced with noisy, corrupted and incomplete data.

There are a number of scenarios where our new approaches will be applicable:

- We may have data collected from patients in a particular location (lab or hospital) but wish to make predictions in a different location.

- We may not have access to the full dataset. For example, for privacy reasons, uses may not disclose some of their personal information. In other settings, we may be required to anonymise the data by removing some identifying covariates.

- Often the complexity of the type of data involved will mean that we don't observe the true data. Instead, we only have access to an approximation of the data. This typically occurs in modern settings, where practitioners use crowd-sourcing services such as the Amazon Mechanical Turk to label their data -- such services are rarely perfectly accurate.

- It may be that an adversary is able to arbitrarily contaminate a small proportion of the data (for instance by performing artificial activity online).

Our work will enable practitioners to utilise data that is currently not appropriate for use. We will also provide new insight into the kinds of data that are most useful for a particular purpose.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk