EPSRC logo

Details of Grant 

EPSRC Reference: EP/N023056/1
Title: MAGIC: MAnaGing InComplete Data - New Foundations
Principal Investigator: Libkin, Professor L
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Centre for Semantic Web Research LIAFA Logicblox
University of Rome I (La Sapienza)
Department: Sch of Informatics
Organisation: University of Edinburgh
Scheme: EPSRC Fellowship
Starts: 01 October 2016 Ends: 31 August 2022 Value (£): 1,140,111
EPSRC Research Topic Classifications:
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
01 Mar 2016 EPSRC ICT Fellowships Interview Panel - Mar 2016 Announced
03 Feb 2016 EPSRC ICT Prioritisation Panel - Feb 2016 Announced
Summary on Grant Application Form
In our data-driven world, one can hardly spend a day without using highly complex software systems that we have learned to rely on, but that at the same time are known to produce incorrect results. These are systems we have on our laptops, they power websites of companies, and they keep companies and government offices running. And yet the incorrect behaviour is built into them, it is a part of multiple standards, and very little effort is made to change things.

The systems are commercial DBMSs (database management systems). We can rely on them as long as information they store is complete. In an autonomous environment, this is a reasonable assumption, but these days data is generated by a huge number of users and applications, and its incompleteness is a fact of life. The moment incompleteness enters the picture, everything changes: unexpected behaviour occurs; queries that we teach students to write and give them full marks for stop working, and one loses trust in the results one gets from such data.

To make matters worse, many modern applications of data, including data integration, data exchange, ontology based data access, data quality, inconsistency management and a host of others, have incompleteness built into them, and try to rely on standard techniques for handling it. This inevitably leads to questionable, or sometimes plain incorrect results. The key reason behind this is the complexity vs correctness tradeoff: current techniques guaranteeing correctness carry a huge complexity price, and applications look for ways around it, sacrificing correctness in the process.

Our main goal is to end this sorry state of affairs. Correctness and efficiency can co-exist, but we need to develop new foundations of the field of incomplete information, and a new set of techniques based on these foundations, to reconcile the two.

To do so, we need to rethink the very basics of the field; crucially, we need to understand what it means to answer queries over incomplete data with correctness guarantees. The classical theory uses a single one-size-fits-all definition, that, upon a careful examination, does not appear to be universally correct. We have an approach that will let us develop a proper theory of correctness and apply it in standard data management tasks and their new applications as well.

It is crucial to make this approach practical. Commercial systems concentrate on efficient evaluation, sacrificing correctness. Our solutions for delivering correct answers will be implementable on top of existing systems, to guarantee their applicability. There will be no need to throw away existing products and techniques to take advantage of new approaches to handling incomplete information.

Our initial set of goals is to deliver solutions for fixing problems with commercial DBMSs, namely to show how to provide correctness guarantees at low and acceptable costs. We shall do so for a wide variety of queries, going far beyond what is now known to be possible. After that, we shall look at applications that will let us, for the first time, ensure correctness of very expressive queries over integrated and exchanged data. We shall expand further, both in terms of data models (e.g., graphs, XML, noSQL databases), and applications (inconsistent data, ontologies). We shall also look at solutions that take into account very large amounts of data, and produce approximate answers in those scenarios.

With the toolkit we develop, the "curse of incomplete information", i.e., the perceived impossibility of achieving correctness and efficiency simultaneously, should be a thing of the past.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk