EPSRC logo

Details of Grant 

EPSRC Reference: EP/W016117/1
Title: A suite of new nonparametric methods for missing data and data from heterogeneous sources based on the theory of Frechet classes
Principal Investigator: Berrett, Dr T
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Statistics
Organisation: University of Warwick
Scheme: New Investigator Award
Starts: 01 February 2022 Ends: 31 January 2024 Value (£): 116,297
EPSRC Research Topic Classifications:
Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
24 Nov 2021 EPSRC Mathematical Sciences Prioritisation Panel November 2021 Announced
Summary on Grant Application Form
From the traditional settings of clinical trials to the technologically-driven mass collection of data in many modern application areas, the statistician's raw material is often plagued with missing data. Whether this be down to nonresponse, or the increasing heterogeneity of data sources, incompleteness is typically unavoidable in practice. The vast majority of statistical procedures are designed for use with complete information, and without it may become inapplicable, uninterpretable or unreliable. Restricting attention to complete cases, i.e. data points without missing variables, however, will often drastically reduce the utility of a data set, both by throwing away useful information in the non-complete cases, and by introducing the possibility of bias due to the complete cases not providing a representative sample of the population.

When a practitioner encounters missing data, the first questions they must ask themselves concern the mechanism by which the data came to be missing, and whether the missingness will cause serious problems in the analysis of their data set and the interpretation of their results. If the absence of information on certain variables can be modelled as independent of the value of the data, then the data is said to be Missing Completely at Random (MCAR), and subsequent analysis is significantly simpler than it would otherwise be. However, the consequences of making this assumption without proper basis can be severe.

We will begin with a rigorous study of the consequences of the MCAR assumption, presenting new characterisations of this property and providing novel connections to concepts studied in other fields, including copula theory and convex and computational geometry. Leveraging knowledge developed in these disciplines, we will design new tools for statisticians, bring new perspectives to the analysis of incomplete data, and open up new frontiers in the study of missingness. Specifically, we will link the property of MCAR to Fréchet classes and compatibility.

With the necessary framework in place, we will introduce hypothesis tests for the assumption of MCAR. In the first instance these will be applicable to contingency tables, but they will be extended to continuous data through binning. Certain alternatives are indistinguishable from the null, but we will show that these tests have power against all fixed alternative hypotheses that are distinguishable, and give situations in which they have optimal power.

Although a crucial first step, the assumption of MCAR is often too restrictive to be useful in practice. However, it may be that the missingness can be explained by certain fully-observed variables (CDM). Using additional insights from the problem of conditional independence testing we may extend our earlier work to test this more flexible assumption that is similar to, though stronger than, the usual MAR assumption.

In high-dimensional settings, the use of such flexible tests is likely to result in low power and we are limited to simple tests. To circumvent this issue, our next goal will be to define and analyse new tests in a relaxed version of the problem, which only attempt to find departures from the null that manifest in incompatibility of means and covariance matrices. We will show that all such departures can be detected, even when dimension grows polynomially in the sample size.

Once hypothesis tests have been carried out and reasonable assumptions developed, a practitioner will typically want to perform inference such as estimating an unknown quantity with confidence. In the framework we provide, the construction of confidence intervals for linear estimands is dual to the testing problems we consider. We combine our new technology with empirical process theory to provide minimal width confidence intervals, even in settings where consistent estimation is not possible.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.warwick.ac.uk