EPSRC logo

Details of Grant 

EPSRC Reference: EP/K01904X/1
Title: Visual Sense. Tagging visual data with semantic descriptions
Principal Investigator: Mikolajczyk, Professor K
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Vision Speech and Signal Proc CVSSP
Organisation: University of Surrey
Scheme: Standard Research - NR1
Starts: 08 February 2013 Ends: 31 August 2015 Value (£): 331,317
EPSRC Research Topic Classifications:
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:  
Summary on Grant Application Form
Recent years have witnessed an unprecedented growth in the number of image and video collections, partially due to the increased popularity of photo and video sharing websites. One such website alone (Flickr) stores billions of images. And this is not the only way in which visual content is present on the Web: in fact most web pages contain some form of visual content. However, while most traditional tools for search and retrieval can successfully handle textual content, they are not prepared to handle heterogeneous documents. This new type of content demands the development of new efficient tools for search and retrieval.

The large number of readily accessible multi-media data-collections pose both an opportunity and a challenge. The opportunity lies in the potential to mine this data to automatically discover mappings between visual and textual content. The challenge is to develop tools to classify, filter, browse and search such heterogeneous data. In brief, the data is available, but the tools to make sense of it are missing.

The Visual Sense project aims to automatically mine the semantic content of visual data to enable "machine reading" of images. In recent years, we have witnessed significant advances in the automatic recognition of visual concepts. These advances allowed for the creation of systems that can automatically generate keyword-based image annotations. However, these annotations, e.g. "man" and "pot", fall far short of the sort of more meaningful descriptive captions necessary for indexing and retrieval of images, for example,"Man cooking in kitchen". The goal of this project is to move a step forward and predict semantic image representations that can be used to generate more informative sentence-based image annotations, thus facilitating search and browsing of large multi-modal collections. It will address the following key open research challenges:

1) Develop methods that can derive a semantic representation of visual content. Such representations must go beyond the detection of objects and scenes and also include a wide range of object relations.

2) Extend state-of-the-art natural language techniques to the tasks of mining large collections of multi-modal documents and generating image captions using both semantic representations of visual content and object/scene type models derived from semantic representations of the textual component of multi-modal documents.

3) Develop learning algorithms that can exploit available multi-modal data to discover mappings between visual and textual content. These algorithms should be able to leverage 'weakly' annotated data and be robust to large amounts of noise.

Thus, the main focus of the Visual Sense project is the development of machine learning methods for knowledge and information extraction from large collections of visual and textual content and for the fusion of this information across modalities. The tools and techniques developed in this project will have a variety of applications. To demonstrate them, we will address three case studies: 1) evaluation of generated descriptive image captions in established international image annotation benchmarks, 2) re-ranking for improved image search and 3) automatic illustration of articles with images.

To address these broad challenges, the project will build on expertise from multiple disciplines, including computer vision, machine learning and natural language processing (NLP). It brings together four research groups from University of Surrey (Surrey, UK), Institut de Robotica i Informatica Industrial (IRI, Spain), Ecole Centrale de Lyon (ECL, France), and University of Sheffield (Sheffield, UK) having each well established and complementary expertise in their respective areas of research.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.surrey.ac.uk