EPSRC logo

Details of Grant 

EPSRC Reference: EP/K00607X/1
Title: MaSI3: A Massively Scalable Intelligent Information Infrastructure
Principal Investigator: Motik, Professor B
Other Investigators:
Researcher Co-Investigators:
Project Partners:
ExperienceOn Ventures Samsung Electronics UK Ltd
Department: Computer Science
Organisation: University of Oxford
Scheme: EPSRC Fellowship
Starts: 01 January 2013 Ends: 31 December 2017 Value (£): 817,862
EPSRC Research Topic Classifications:
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
30 Aug 2012 EPSRC ICT Fellowships Interviews - Aug 2012 Announced
18 Jul 2012 EPSRC ICT Responsive Mode - July 2012 Announced
Summary on Grant Application Form
Ontology-based Data Management Systems (ODMSs) are a new kind of data management systems specifically designed to manage large semi-structured data sets needed to power modern intelligent applications. Most ODMSs are based on the Resource Description Framework (RDF) data model, which was specifically designed for the representation of semi-structured data. RDF data sets consist of triples, and RDF data sets are often seen as graphs with labelled vertices and edges. The structure of RDF data is described using an ontology - a set of logical axioms that give semantics to the graph, and enable the derivation of new triples via reasoning. The ontology is often expressed in the Web Ontology Language (OWL), sometimes extended with the Semantic Web Rule Language (SWRL). The main task of an ODMS is to answer queries over the given ontology and data set, with the queries commonly being expressed in the SPARQL language. Reasoning plays a key role in query answering, and modern intelligent applications commonly require an integration of taxonomic, spatio-temporal, mereological, and other kinds of reasoning.

ODMSs can and do exploit implementation techniques described in the database literature. The computational problems that such systems need to solve, however, are very hard, so developing robustly scalable systems is extremely challenging, usually requiring a combination heuristics and careful engineering. Although significant progress has been made and state of the art ODMSs can now deal with nontrivial data sets, their performance still falls far short of what is required by modern `data hungry' applications. This is partly due to the sheer size of the data sets that need to be processed, but also partly due to the complexity of the reasoning tasks that need to be performed.

Critical to the performance of ODMSs is the fact that the units of data that they store (i.e., triples) are very small so, to retrieve useful information, typical queries tend to be quite large. Efficiently answering such queries requires exhaustive data indexing; however, building and maintaining these indices can itself compromise scalability, particularly during update-intensive tasks such as materialisation-based reasoning. Moreover, although query evaluation is subpolynomial in data size, it is NP-hard in query size, so techniques that are effective on small queries may fail on large and complex queries. Finally, scaling ODMSs to deal with Big Data will inevitably require distributed data storage and query processing, but existing data partitioning schemes are unlikely to fully exploit the potential for parallelisation and minimise distributed processing on large queries.

Due to these issues, be believe that the robust scalability required by modern ODMS applications can only be achieved through the principled application of techniques that provide provable performance and/or tractability guarantees. The use of such techniques will not only allow for better and more consistent performance, but will also help ODMS users to better understand and thus avoid performance bottlenecks. We plan to develop the relevant techniques by synthesising and extending the results from three distinct research fields: databases, knowledge representation, and mathematical network theory. Combining these techniques with insightful engineering and extensive optimisation will, we believe, allow us to implement a new ODMS with

scalability surpassing that of existing systems by several orders of magnitude. Finally, we will exploit our contacts with industry (see enclosed Letters of Support) to evaluate and tune our ODMS in real-world settings. We will thus lay both the theoretical and the practical foundations for a massively scalable intelligent information infrastructure capable of powering modern data-intensive applications.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ox.ac.uk