EPSRC Reference: |
EP/F031092/1 |
Title: |
On-Demand Data Integration: Dataspaces by Refinement |
Principal Investigator: |
Paton, Professor NW |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Science |
Organisation: |
University of Manchester, The |
Scheme: |
Standard Research |
Starts: |
01 July 2008 |
Ends: |
30 June 2011 |
Value (£): |
572,897
|
EPSRC Research Topic Classifications: |
Information & Knowledge Mgmt |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
18 Oct 2007
|
ICT Prioritisation Panel (Technology)
|
Announced
|
|
Summary on Grant Application Form |
Web search engines, such as Google or Yahoo, provide access to large numbers of distributed resources. However, the questions such search engines can support are limited, and do not exploit structure within the accessed resources. For example, it is not possible to ask the question what is the phone number of the department where Suzanne Embury works , even though this information can be obtained by navigating from the result of a search for Suzanne Embury . However, one feature of search engines that has made them successful is that they need minimal configuration; for example, no manual annotation of pages is required before they can be searched. As a result, search engines can be seen as providing low-cost low-quality access to distributed data resources.Data integration infrastructures from the database community, by contrast, provide relatively high-cost, high-quality solutions. Where there are multiple data resources, distributed query processing systems provide the illusion that there is only one data resource, and allow complex questions to be answered that refer to data from multiple resources. For example, they could support the question about phone numbers above, even when the information about who Suzanne works for is stored in a different database from the phone number of her department. However, this precision in question answering is only able to be supported where the relationships between data sources have been manually identified, and inconsistencies resolved as part of a time consuming and largely manual data integration process. This proposal seeks to explore the space between search engines and distributed data management systems by providing various of the benefits of the latter with much reduced configuration costs. The term dataspace has been coined to refer to infrastructures that support precise question answering over resources that have been integrated at minimal cost. At present, dataspaces are more a vision than a reality; many design decisions need to be made that explore cost/quality trade-offs, and new techniques will be required for inter-relating data resources, ranking query answers, and for interacting with users about the likely quality of answers obtained. The proposed research hypothesizes that there is no single best position in the cost/quality tradeoff that exists between fully automated and manually constructed data integration. As a result, we propose to develop a flexible software architecture in which it is possible to experiment with different components for constructing mappings between resources, annotating the mappings with measures of their quality, and ranking results according to user-specified criteria. This architecture, in turn, enables exploration of alternative approaches to the design of the components, in particular with a view to allowing incremental refinement of an initial integration that was constructed automatically.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.man.ac.uk |