EPSRC logo

Details of Grant 

EPSRC Reference: EP/I004327/1
Title: Machine Learning Methods for Personalised, Abstractive Summarisation of Consumer-Generated Media
Principal Investigator: Bontcheva, Professor K
Other Investigators:
Researcher Co-Investigators:
Project Partners:
BT Elsevier (International) Nokia
OXFORD INTERNET INSTITUTE The Fizzback Group Ltd. The Press Association Ltd.
Department: Computer Science
Organisation: University of Sheffield
Scheme: Career Acceleration Fellowship
Starts: 01 October 2010 Ends: 31 May 2018 Value (£): 591,755
EPSRC Research Topic Classifications:
Artificial Intelligence Human-Computer Interactions
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Creative Industries Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
09 Jun 2010 EPSRC Fellowships 2010 Interview Panel F Announced
Summary on Grant Application Form
The success of Web 2.0 and CGM is based on tapping into the social nature of human interactions, by making it possible for people to voice their opinion, become part of a virtual community and collaborate remotely. If we take micro-blogging as an example, the growth in Twitter visits between 2008 and 2009 was over 1,000% and it is projected that by 2010 around 10% of all internet users will be on Twitter. This unprecedented rise in the volume and importance of online content has resulted in companies and individuals spending ever increasing amounts of time trying to keep up with relevant CGM. It is estimated that 700 person hours per year is the absolute minimum that companies and public services need to spend on CGM monitoring, online user engagement, and discovery of new information. This fellowship is about helping people to cope with the resulting information overload, through automatic methods that are capable of adapting to individual's information seeking goals and summarising briefly the relevant media and thus supporting information interpretation and decision making. Automatic text summarisation is key to our goal and consists of compressing the meaning of text documents while preserving the relevant information contained within them. While there has been a lot of research on well-authored texts such as news, summarisation of social media is still in its infancy, with research focused on product reviews. A key experimental finding has been that due to the characteristics of social media (product reviews in particular) it is better first to abstract the relevant information from the different documents and sites and then to use natural language generation to create a fluent text based on this information.In this fellowship I will investigate and evaluate new machine learning methods for personalised, abstractive multi-document summarisation across different social media. For example, diachronic summaries that combine Twitter posts, blog articles, and Facebook wall messages on a given topic. In contrast to previous work, we will pursue an inter-disciplinary approach, which will help us study the social dimension of CGM summarisation and establish actual user needs. The second research challenge is that the algorithms need to be robust in the face of this noisy, jargon-full and dynamic content, as well as needing models capable of representing the contradictory and strongly temporal nature of CGM. A key novel contribution of our work is personalising the summaries, based on a model of user interests, goals, and social context. Issues such as trustworthiness, privacy, and online communities (with their hubs and authorities) will also play an important role. The fourth research challenge is to generate personalised abstractive summaries that can help users with sensemaking and content interpretation. An exciting element of my research will be in studying the different kinds of summaries that are useful for a variety of real users (companies, journalists, and the general public) through multi-disciplinary collaborations with the Press Association, British Telecom, the Oxford Internet Institute, and Sheffield's Department of Journalism. A key project deliverable will be a publicly available browser plugin that provides easy access to the automatically generated summaries. This will allow me to evaluate the project results with real users, on a large scale. It will also provide a new evaluation challenge for the Natural Language Generation community, as researchers will be able to compare their summarisers against those delivered by our open-source algorithms. Last but not least, the fellowship covers not only foundational multi-disciplinary research but it also tests the results in several Digital Economy pilot experiments involving commercial partners (The Press Association, British Telecom, Fizzback).
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.shef.ac.uk