EPSRC logo

Details of Grant 

EPSRC Reference: EP/F055765/1
Title: Global Inference for Summarization Using Integer Linear Programming
Principal Investigator: Lapata, Professor M
Other Investigators:
Grothey, Dr A
Researcher Co-Investigators:
Project Partners:
Department: Sch of Informatics
Organisation: University of Edinburgh
Scheme: Standard Research
Starts: 26 January 2009 Ends: 25 January 2012 Value (£): 269,809
EPSRC Research Topic Classifications:
Comput./Corpus Linguistics Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
21 Apr 2008 ICT Prioritisation Panel (April 2008) Announced
Summary on Grant Application Form
Summarization is the process of condensing a source text into a shorter version while preserving its information content. The applications of summarization are many and varied. From quick access to news and scientific articles to systems that aid physicians in gathering patient information and meeting browsers. Humans summarize on a daily basis and effortlessly (e.g., by describing the contents of a lecture, a meeting or a movie), but producing high quality summaries automatically remains a challenge. The difficulty lies primarily in the nature of the task which is complex, must satisfy many constraints (e.g., summary length, informativeness, coherence, grammaticality) and ultimately requires large-scale text understanding. Since robust text understanding is beyond the capabilities of current NLP technology, most work today focuses on extractive summarization. The idea here is to create a summary simply by identifying and subsequently concatenating the most important sentences in a document. Without a great deal of linguistic analysis, it is possible to create summaries for a wide range of documents, independently of style, text type, and subject matter. Unfortunately, extracts are often documents of low readability and text quality. In this project we will develop novel models for single-document summarization that break away from the sentence extraction paradigm. We will model summarization as an optimisation problem and use integer linear programming (ILP) for finding a summary that is best for the application, task, or user at hand. The ILP formulation is advantageous for two reasons. First, it allows us to explicitly encode the constraints our output summaries must meet. Secondly, ILP is a well studied optimization problem with efficient algorithms for finding a globally optimal solution in the presence of many conflicting constraints. This proposal aims to shift the summarization paradigm by developing novel and unified models based on the ILP framework that are able to identify what is important in a document and express it appropriately. The success of this research will make significant and far-reaching impact on summarization and related areas (e.g., information retrieval) that could not be brought about by incrementally extending conventional models.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk