EPSRC logo

Details of Grant 

EPSRC Reference: EP/G051070/1
Title: Lexical Acquisition for the Biomedical Domain
Principal Investigator: Korhonen, Professor A
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Computer Science and Technology
Organisation: University of Cambridge
Scheme: First Grant Scheme
Starts: 01 August 2009 Ends: 30 September 2012 Value (£): 285,398
EPSRC Research Topic Classifications:
Artificial Intelligence
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
04 Mar 2009 ICT Prioritisation Panel (March 09) Announced
Summary on Grant Application Form
Natural Language Processing (NLP) is now critically needed to assist the processing, mining and extraction of knowledge from the rapidly growing literature in the area of biomedicine. In recent years, considerable progress has been made in the development of basic NLP techniques for biomedicine. The current challenge is to improve these techniques with richer and deeper analysis capable of supporting a wide range of real-world tasks. High-quality lexical resources (e.g. accurate and comprehensive lexicons and word classifications) are critically needed for this. Most lexical resources used in current systems are developed manually by linguists. Manual work is extremely costly, and the resulting resources require extensive labour-intensive porting to new (sub-)domains and tasks. Automatic acquisition or updating of lexical information from repositories of un-annotated text (e.g. corpora of biomedical articles) is a more promising avenue to pursue. Since lexical acquisition gathers usage and frequency information directly from relevant data, it can considerably enhance the viability and portability of NLP technology. Research into automatic lexical acquisition is now starting to produce large-scale resources useful for practical NLP tasks. However, the application of such techniques to biomedical texts has been limited because many existing techniques require adaptation before they can perform optimally in this linguistically challenging domain. In this project, we will take existing techniques capable of acquiring basic syntactic-semantic information for verbs from corpus data and will adapt them to the biomedical domain. We will focus on verbal (i) subcategorization frames, (ii) selectional preferences, and (ii) lexical-semantic classes. This information, when tailored to the domain in question, can aid key NLP tasks such as parsing, anaphora resolution, Information Extraction (IE), and question-answering (QA). Building on our pilot studies and expanding on the adaptive, state-of-the-art text processing tools available to us, we will improve existing techniques further and extend them with novel unsupervised and semi-supervised methods capable of supporting efficient domain adaptation. We will evaluate and demonstrate the capabilities of our techniques directly and in the context of practical BIO-NLP tasks. We will use the final version of the system to acquire a substantial lexical database from a biomedical corpus. The resulting resource will be distributed freely to the research community, along with the software which can be used to tune the frequency information stored in the database to particular biomedical sub-domains/tasks.We expect this project to (i) advance BIO-NLP and improve its usefulness for practical tasks in biomedicine, (ii) advance NLP by improving the accuracy, robustness and portability of lexical acquisition to real-world tasks, and (iii) provide an important large-scale study of domain-adaptation in the critical area of lexical acquisition.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.cam.ac.uk