EPSRC logo

Details of Grant 

EPSRC Reference: EP/K021788/1
Title: Enriching, repairing and merging taxonomies by inducing qualitative spatial representations from the web
Principal Investigator: Schockaert, Professor S
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: Cardiff University
Scheme: First Grant - Revised 2009
Starts: 01 May 2013 Ends: 30 June 2014 Value (£): 98,966
EPSRC Research Topic Classifications:
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
21 Nov 2012 EPSRC ICT Responsive Mode - Nov 2012 Announced
Summary on Grant Application Form
Taxonomies encode how different terms or concepts from a given domain are related to each other. They are used to standardise vocabularies (e.g. biologists use taxonomies to organise species into broader categories such as family and order), and to categorise content such that it can be more easily searched (e.g. librarians assigning categories from a taxonomy to books). While taxonomies are traditionally the result of a careful and time-consuming manual process, recent developments in the world wide web have led to a proliferation of taxonomies of a more informal nature. Online retailers such as Amazon, for instance, organise their products using an ad hoc taxonomy, which reflects how customers use their website, rather than any commitment on the semantics of the underlying product categories. Similarly, applications such as Foursquare allow users to contribute to a taxonomy of place types.

While these informal taxonomies are useful to organise online content (e.g. products on Amazon, or venues on Foursquare), they are often of poor quality, and difficult to reuse among different applications. Moreover, like traditional taxonomies, they focus on a very limited set of semantic relations; usually only the relation "is a sub-category of" is considered. In contrast, in practice the semantic relationship between two categories may not be so clear-cut, among others because of the existence of borderline cases (e.g. should a pub which serves food be categorised as a restaurant?). Nonetheless, the widespread availability of taxonomies is of potentially great interest, provided that they can be improved using automated methods.

The goal of this project is to study how such an improvement can be realised, by statistically analysing meta-data that is available on the web, and in particular from so-called Web 2.0 websites such as Flickr, where users describe photos using short textual annotations called tags.

The proposed approach is built on the idea of discovering semantic relationships between categories by statistically analysing such meta-data. On the one hand, these relations will encode information about typicality and similarity. To see why such relations are useful, consider an application which allows a user to search for restaurants in Cardiff. The search engine may rank venues of type "restaurant" by taking into account features such as distance to the city centre and average ratings (if available). However, as another criterion, one would also want to see "normal" restaurants before venues such as breakfast places, coffee houses, or pubs, which may be considered as restaurants, broadly speaking, but are not what users would typically be interested in when querying about restaurants. Similarly, when the user's query asks about "Sichuan restaurants in Cardiff", and no such restaurants are known, instances of the most similar categories may be shown instead (e.g. Cantonese restaurants).

On the other hand, the relations that are discovered will also encode information that can help us to pinpoint likely errors in existing taxonomies and that can help us to merge different taxonomies to get a single coherent view of a given domain. In particular, these relations will allow us to detect irregularities in existing taxonomies. For example, given the assumption that similar categories usually have similar properties, and the knowledge that Cantonese and Sichuan restaurants are very similar, a taxonomy in which Cantonese and Sichuan restaurants are both sub-categories of Chinese restaurants will be considered more regular than a taxonomy in which they have different super-categories.

Our approach is unique in its data-driven approach to enrich taxonomies with semantic relations for common-sense reasoning, as well as in the proposed methods for repairing and merging existing taxonomies. Regarding applications, the results of this project will form a crucial stepping-stone towards more intelligent search engines.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.cf.ac.uk