EPSRC logo

Details of Grant 

EPSRC Reference: EP/N01426X/1
Title: ReComp: sustained value extraction from analytics by recurring selective re-computation
Principal Investigator: Missier, Professor P
Other Investigators:
Watson, Professor P Chinnery, Professor P James, Professor PM
Researcher Co-Investigators:
Project Partners:
DataONE University of Manchester, The
Department: Sch of Computing
Organisation: Newcastle University
Scheme: Standard Research
Starts: 01 February 2016 Ends: 31 July 2019 Value (£): 584,269
EPSRC Research Topic Classifications:
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
03 Sep 2015 Making Sense From Data Panel - Full Proposals Announced
Summary on Grant Application Form
As the cost of allocating computing resources to data-intensive tasks continues to decrease, large-scale data analytics becomes ever more affordable, continuously providing new insights from vast amounts of data. Increasingly, predictive models that encode knowledge from data are used to drive decisions in a broad range of areas, from science to public policy, to marketing and business strategy. The process of learning such actionable knowledge relies upon information assets, including the data itself, the know-how that is encoded in the analytical processes and algorithms, as well as any additional background and prior knowledge. Because these assets continuously change and evolve, models may become obsolete over time, leading to poor decisions in the future, unless they are periodically updated.

This project is concerned with the need and opportunities for selective recomputation of resource-intensive analytical workloads. The decision on how to respond to changes in these information assets requires striking a balance between the estimated cost of recomputing the model, and the expected benefits of doing so. In some cases, for instance when using predictive models to diagnose a patient's genetic disease, new medical knowledge may invalidate a large number of past cases. On the other hand, such changes in knowledge may be marginal or even irrelevant for some of the cases. It is therefore important to be able, firstly, to determine which past results may potentially benefit from recomputation, secondly, to determine whether it is technically possible to reproduce an old computation, and thirdly, when this is the case, to assess the costs and relative benefits associated with the recomputation.

The project investigates the hypothesis that, based on these determinations, and given a budget for allocating computing resources, it should be possible to accurately identify and prioritise analytical tasks that should be considered for recomputation.

Our approach considers three types of meta-knowledge that are associated with analytics tasks, namely (i) knowledge of the history of past results, that is, the provenance metadata that describes which assets were used in the computation, and how; (ii) knowledge of the technical reproducibility of the tasks; and (iii) cost/benefit estimation models.

Element (i) is required to determine which prior outcomes may potentially benefit from changes in information assets, while reproducibility analysis (ii) is required to determine whether an old analytical task is still functional and can actually be performed again, possibly with new components and on newer input data.

As the first two of these elements are independent of the data domain, we aim to develop a general framework that can then be instantiated with domain-specific models, namely for cost/benefit analysis, to provide decision support for prioritising and then carrying out resource-intensive recomputations over a broad range of analytics application domains.

Both (i) and (ii) entail technical challenges, as systematically collecting the provenance of complex analytical tasks, and ensuring their reproducibility, requires instrumentation of the data processing environments. We plan to experiment with workflows, a form of high level programming and middleware technology, to address both these problems.

To show the flexibility and generality of our framework, we will test and validate it on two, very different case studies where decision making is driven by analytical knowledge, namely in genetic diagnostics, and policy making for Smart Cities.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ncl.ac.uk