EPSRC logo

Details of Grant 

EPSRC Reference: EP/Y036395/1
Title: FAIRClinical: FAIR-ification of Supplementary Data to Support Clinical Research
Principal Investigator: Beck, Dr T
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: School of Medicine
Organisation: University of Nottingham
Scheme: Standard Research - NR1
Starts: 01 March 2024 Ends: 28 February 2026 Value (£): 101,253
EPSRC Research Topic Classifications:
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:  
Summary on Grant Application Form
The project aims to enhance the FAIR-ness of all supplementary data files and significantly improve the reuse of unstructured clinical case report forms (CRFs). Supplementary data are commonly attached to a scientific publication, either directly in biomedical libraries such as PMC, or via generalist deposition platforms such as Zenodo. The file types and formats are highly heterogeneous (e.g., PDF, XLS, CSV, GIF, etc.). CRFs collect the patient data in clinical research studies and trials, and represent an information-rich subset of clinical research literature and unstructured clinical study supplementary data. We propose to specifically enrich the contents - and therefore the interoperability, findability and reusability - of all supplementary data by delivering more normalised contents.

The envisaged normalisation will be performed according to four dimensions, which are common in dataset management catalogues: 1) administrative metadata (e.g., author names, affiliations, licensing models), 2) descriptive metadata (e.g., diseases, gene or gene products, size of populations, experimental settings), 3) structural metadata (e.g., textual contents, images) and finally, 4) cross-references to other data deposition catalogues (e.g., URL, PID). Regarding the descriptive metadata layer, which may significantly vary depending on specialised life and health science areas, we propose to explore the semantic enrichment of clinical information. We will provide broad FAIR-ification of supplementary data files, covering all PMC contents, combined with a specific effort to structure CRFs into an Electronic Health Records (EHR)-like dataset.

We propose to process CRFs to extract clinical concepts required to populate reference standards (e.g., OMOP, FHIR) using standard value sets (e.g., ICD-10, LOINC). The resulting dataset is expected to exhibit properties very similar to EHR contents (e.g., Anamnesis, Intervention, Diagnosis, Prescription). Unlike real EHR contents, which are subject to specific regulations, this unique dataset will be of unrestricted access and likely to support a broad range of health and life science research projects. The newly created collection of near EHR data will be made available on Zenodo, while the SIBiLS, the SIB Literature Services, will be used to provide access to the supplementary data files the same way it is already providing access to MEDLINE and PMC articles. In addition, the methods/pipelines developed here are useful as a toolbox to structure unstructured CRFs and other unstructured clinical text in hospital setups.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.nottingham.ac.uk