EPSRC Reference: |
EP/L010291/1 |
Title: |
Adaptive Context-Dependent Machine Translation for Heterogeneous Text |
Principal Investigator: |
Cohn, Dr TA |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Science |
Organisation: |
University of Sheffield |
Scheme: |
EPSRC Fellowship |
Starts: |
01 October 2013 |
Ends: |
30 September 2018 |
Value (£): |
907,946
|
EPSRC Research Topic Classifications: |
Artificial Intelligence |
Comput./Corpus Linguistics |
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
While automatic machine translation technologies are undoubtedly
useful to a wide range of users, they have many shortcomings. Notably
they often produce incoherent outputs when translating many types of
input text, e.g., medical texts, literature, or even conversational
text. This project aims to develop new machine translation (MT) systems
which can be more efficiently adapted to new domains and text styles,
and handle heterogeneous mixed-domain inputs. This is framed as a
multi-task machine learning problem in which a collection of
domain-specific translation systems are learned jointly, leveraging
correlations between related domains. This approach will help to
reduce the big data requirements of current translation systems, while
also improving translation quality across a wide range of different
language pairs and application domains.
Existing research has tended to focus on a narrow interpretation of
adaptability, specifically the idea of domain adaptation in which
there is a single target domain and the challenge is how to produce
good translations by using parallel data drawn from other
domains. This project will address the more general setting where
there can be many target domains, or the testing domain is not known
in advance. This is a considerably more challenging and eminently more
useful setting than the single target domain used in the
domain-adaptation literature, improving overall translation quality
and facilitating portability to new language pairs and new domains.
This work will create novel and innovative new evaluation resources,
to supplement the standard evaluation setting which uses text from
only one or two domains. This project will create a new
comprehensive evaluation set covering a wide range of topics, drawn
from many different media sources, including user-generated content
from blogs and wikis, and over multiple challenging language
pairs. This evaluation set will highlight the short-comings of
existing machine translation research in terms of handling
heterogeneous inputs and challenging translation domains, and
contribute a critically important dataset to the research community.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.shef.ac.uk |