EPSRC Reference: |
EP/V00641X/2 |
Title: |
Partial recovery of missing responses - a toolbox for efficient design and analysis when data may be missing not at random |
Principal Investigator: |
Mitra, Dr R |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Statistical Science |
Organisation: |
UCL |
Scheme: |
Standard Research |
Starts: |
01 September 2022 |
Ends: |
20 October 2023 |
Value (£): |
104,252
|
EPSRC Research Topic Classifications: |
Statistics & Appl. Probability |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
Missing data are a common problem in many application areas. The presence of missing values complicates analyses, and if not dealt with properly can result in incorrect conclusions being drawn from the data. It is often helpful to assume there is a process that produces the missing values, typically called a missing data mechanism. A particularly problematic scenario is when this mechanism is in part determined by some other unknown variables, such as the missing values themselves. This is known as a missing not at random (MNAR) mechanism.
If missing values arise due to a MNAR mechanism then conclusions drawn from the data will typically be biased. Also, importantly, it is not possible to know whether this problem occurs or not in the data. This is the challenging problem area that this proposal seeks to address, namely developing procedures that can best test whether or MNAR occurs in the data.
The proposal will consider scenarios where it is possible to estimate some of the missing values through a follow up sample. The main purpose of this is to learn about the missing data mechanism and specifically test whether the MNAR assumption is valid or not. Further, the recovered data will also help to correct for the effect the missing data have on conclusions. The proposal makes use of optimal design techniques to decide which missing values to follow up. Essentially certain missing values might yield more information about the type of missing data mechanism than others; in addition some values might be more likely than others to be recovered. In this way we would ensure maximum information from the recovered data is obtained. This will allow data analysts to determine whether the presence of MNAR is likely and take appropriate action.
We will collaborate with our project partners, the Office for National Statistics and NHS Blood and Transplant in the development of these methods. Our project partners will provide relevant data for us to consider realistic scenarios and we will discuss interim results with them to ensure our methods are most useful for practitioners. We will also present the work as part of a missing data course at the African Institute of Mathematical Sciences (AIMS) to maximise the global benefit of the work.
The methods developed in this proposal will be disseminated through papers and presentations. In addition, we will create a free to use R package that will implement the methods to allow easy uptake by users. We will provide training in using this R package as part of a two-day workshop where we will describe our methods to users. A dedicated website will be updated throughout the project to describe developments and facilitate engagement with interested parties.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
|