EPSRC logo

Details of Grant 

EPSRC Reference: EP/K009362/1
Title: Bayesian Inference for Big Data with Stochastic Gradient Markov Chain Monte Carlo
Principal Investigator: Teh, Professor YW
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Statistics
Organisation: University of Oxford
Scheme: Standard Research
Starts: 01 August 2013 Ends: 31 July 2016 Value (£): 200,070
EPSRC Research Topic Classifications:
Artificial Intelligence Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
EP/K009575/1 EP/K009850/1
Panel History:
Panel DatePanel NameOutcome
04 Sep 2012 EPSRC ICT Responsive Mode - Sep 2012 Announced
Summary on Grant Application Form
We are in the midst of an information revolution, where advances in science and technology, as well as the day-to-day operation of successful organisations and businesses, are increasingly reliant on the analyses of data. Driving these advances is a deluge of data, which is far outstripping the increase in computational power available. The importance of managing, analysing, and deriving useful understanding from such large scale data is highlighted by high-profile reports by McKinsey and The Economist as well as other outlets, and by the EPSRC's recent ICT priority of "Towards an Intelligent Information Infrastructure".

Bayesian analysis is one of the most successful family of methods for analysing data, and one now widely adopted in the statistical sciences as well as in AI technologies like machine learning. The Bayesian approach offers a number of attractive advantages over other methods: flexibility in constructing complex models from simple parts; fully coherent inferences from data; natural incorporation of prior knowledge; explicit modelling assumptions; precise reasoning of uncertainties over model order and parameters; and protection against overfitting.

On the other hand, there is a general perception that they can be too slow to be practically useful on big data sets. This is because exact Bayesian computations are typically intractable, so a range of more practical approximate algorithms are needed, including variational approximations, sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC). MCMC methods arguably form the most popular class of Bayesian computational techniques, due to their flexibility, general applicability and asymptotic exactness. Unfortunately, MCMC methods do not scale well to big data sets, since they require many iterations to reduce Monte Carlo noise, and each iteration already involves an expensive sweep through the whole data set.

In this project we propose to develop the theoretical foundations for a new class of MCMC inference procedures that can scale to billions of data items, thus unlocking the strengths of Bayesian methods for big data. The basic idea is to use a small subset of the data during each parameter update iteration of the algorithm, so that many iterations can be performed cheaply. This introduces excess stochasticity in the algorithm, which can be controlled by annealing the update step sizes towards zero as the number of iterations increases. The resulting algorithm is a cross between an MCMC and a stochastic optimization algorithm. An initial exploration of this procedure, which we call stochastic gradient Langevin dynamics (SGLD), was initiated by us recently (Welling and Teh, ICML 2011).

Our proposal is to lay the mathematical foundations for understanding the theoretical properties of such stochastic MCMC algorithms, and to build on these foundations to develop more sophisticated algorithms. We aim to understand the conditions under which the algorithm is guaranteed to converge, and the type and speed of convergence. Using this understanding, we aim to develop algorithmic extensions and generalizations with better convergence properties, including preconditioning, adaptive and Riemannian methods, Hamiltonian Monte Carlo methods, Online Bayesian learning methods, and approximate methods with large step sizes. These algorithms will be empirically validated on real world problems, including large scale data analysis problems for text processing and collaborative filtering which are standard problems in machine learning, and large scale data from ID Analytics, a partner company interested in detecting identity theft and fraud.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ox.ac.uk