Details of Grant

EPSRC Reference:

EP/R014507/1

Title:

Learning Sparse Features from 4D fMRI Data for Brain Disease Diagnosis

Principal Investigator:

Lu, Professor H

Other Investigators:

Researcher Co-Investigators:

Project Partners:

University of Oxford

Department:

Computer Science

Organisation:

University of Sheffield

Scheme:

First Grant - Revised 2009

Starts:

01 January 2018

Ends:

30 June 2019

Value (£):

100,730

EPSRC Research Topic Classifications:

Artificial Intelligence

EPSRC Industrial Sector Classifications:

Healthcare

Information Technologies

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
05 Sep 2017	EPSRC ICT Prioritisation Panel Sept 2017	Announced

Summary on Grant Application Form

Machine learning endows computers with the ability to learn from data to help solve real-world problems. Due to the growth of big data, machine learning methods have become increasingly important tools in a wide range of applications including bioinformatics, computer vision, economics, and medicine. This project investigates machine learning for extracting useful information from fMRI data to help clinicians make more accurate diagnoses for certain brain diseases and develop more effective treatments for them.

Currently, deep learning is the most popular machine learning method. However, it has highly complex architectures and needs vast amounts of data to learn a huge number of parameters. This leads to difficulties when the number of data examples available (n) is very small compared to the number of features in each data example (p), which is the "large p, small n" problem. Indeed, Geoff Hinton, the godfather of deep learning, said recently: "One problem we still haven't solved is getting neural nets to generalise well from small amounts of data".

Most existing solutions for the "large p, small n" problem represent data as vectors. With growing data dimensionality, such vector-based methods become inadequate for severe "large p, small n" problems, e.g., machine learning on fMRI data. fMRI data are sequences of 3D volumes, i.e., 4D data. They are noisy, big, and multidimensional, making comprehensive manual analysis infeasible and machine learning challenging. A typical whole-brain fMRI scan sequence has tens of millions features (voxel measurements), with a file size over 100MB. For such data, even a simple linear basis needs tens of millions parameters (deep learning will need far more) but in practice we often only have sequences for dozens of individuals available in a particular fMRI study due to high cost.

Therefore, we aim to develop a new machine learning method for severe cases of "large p, small n" for multidimensional data such as whole-brain fMRI. We will take a tensor-based approach, where a tensor refers to a multidimensional array. Tensor-based methods have a much smaller number of parameters than vector-based ones. For typical whole-brain fMRI data above, a tensor-based multilinear basis needs only a few hundreds parameters, several orders of magnitude smaller than those needed by a vector-based, linear basis. We will generalise the state-of-the-art sparse feature learning methods for vector input to tensor-based ones for tensor input.

This will be the first study to learn sparse features directly from tensor representations of multidimensional data in a scalable and interpretable way. We will apply our algorithms to a large fMRI dataset on attention deficit hyperactivity disorder (ADHD) to accomplish two major tasks: prediction and interpretation. Firstly, we will detect ADHD and classify its subtypes via a small number of automatically selected voxels. Secondly, collaborating with a brain imaging expert, we will analyse the connectivity of brain regions corresponding to selected voxels to interpret the classification results, gain insights, and identify biomarkers to assist clinicians in further diagnosis and treatment. Our results will be fully reproducible with the dataset in the public domain and our software to be released as open source. The success of this project will advance the state-of-the-art of machine learning and provide a new enabling software tool to applications with severe "large p, small n" problems such as medical imaging with high-cost scanners (e.g., MRI or 3D mammography machines) and translational bioinformatics with big genomic data.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.shef.ac.uk