EPSRC Reference: |
EP/G051070/1 |
Title: |
Lexical Acquisition for the Biomedical Domain |
Principal Investigator: |
Korhonen, Professor A |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Science and Technology |
Organisation: |
University of Cambridge |
Scheme: |
First Grant Scheme |
Starts: |
01 August 2009 |
Ends: |
30 September 2012 |
Value (£): |
285,398
|
EPSRC Research Topic Classifications: |
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
04 Mar 2009
|
ICT Prioritisation Panel (March 09)
|
Announced
|
|
Summary on Grant Application Form |
Natural Language Processing (NLP) is now critically needed to assist the processing, mining and extraction of knowledge from the rapidly growing literature in the area of biomedicine. In recent years, considerable progress has been made in the development of basic NLP techniques for biomedicine. The current challenge is to improve these techniques with richer and deeper analysis capable of supporting a wide range of real-world tasks. High-quality lexical resources (e.g. accurate and comprehensive lexicons and word classifications) are critically needed for this. Most lexical resources used in current systems are developed manually by linguists. Manual work is extremely costly, and the resulting resources require extensive labour-intensive porting to new (sub-)domains and tasks. Automatic acquisition or updating of lexical information from repositories of un-annotated text (e.g. corpora of biomedical articles) is a more promising avenue to pursue. Since lexical acquisition gathers usage and frequency information directly from relevant data, it can considerably enhance the viability and portability of NLP technology. Research into automatic lexical acquisition is now starting to produce large-scale resources useful for practical NLP tasks. However, the application of such techniques to biomedical texts has been limited because many existing techniques require adaptation before they can perform optimally in this linguistically challenging domain. In this project, we will take existing techniques capable of acquiring basic syntactic-semantic information for verbs from corpus data and will adapt them to the biomedical domain. We will focus on verbal (i) subcategorization frames, (ii) selectional preferences, and (ii) lexical-semantic classes. This information, when tailored to the domain in question, can aid key NLP tasks such as parsing, anaphora resolution, Information Extraction (IE), and question-answering (QA). Building on our pilot studies and expanding on the adaptive, state-of-the-art text processing tools available to us, we will improve existing techniques further and extend them with novel unsupervised and semi-supervised methods capable of supporting efficient domain adaptation. We will evaluate and demonstrate the capabilities of our techniques directly and in the context of practical BIO-NLP tasks. We will use the final version of the system to acquire a substantial lexical database from a biomedical corpus. The resulting resource will be distributed freely to the research community, along with the software which can be used to tune the frequency information stored in the database to particular biomedical sub-domains/tasks.We expect this project to (i) advance BIO-NLP and improve its usefulness for practical tasks in biomedicine, (ii) advance NLP by improving the accuracy, robustness and portability of lexical acquisition to real-world tasks, and (iii) provide an important large-scale study of domain-adaptation in the critical area of lexical acquisition.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.cam.ac.uk |