Details of Grant

EPSRC Reference:

EP/L010291/1

Title:

Adaptive Context-Dependent Machine Translation for Heterogeneous Text

Principal Investigator:

Cohn, Dr TA

Other Investigators:

Researcher Co-Investigators:

Project Partners:

Alpha CRC Ltd

Carnegie Mellon University

Microsoft

Department:

Computer Science

Organisation:

University of Sheffield

Scheme:

EPSRC Fellowship

Starts:

01 October 2013

Ends:

30 September 2018

Value (£):

907,946

EPSRC Research Topic Classifications:

Artificial Intelligence

Comput./Corpus Linguistics

EPSRC Industrial Sector Classifications:

No relevance to Underpinning Sectors

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
17 Jul 2013	EPSRC ICT Responsive Mode - July 2013	Announced
03 Sep 2013	ICT Fellowships Interviews Meeting - Sept 13	Announced

Summary on Grant Application Form

While automatic machine translation technologies are undoubtedly

useful to a wide range of users, they have many shortcomings. Notably

they often produce incoherent outputs when translating many types of

input text, e.g., medical texts, literature, or even conversational

text. This project aims to develop new machine translation (MT) systems

which can be more efficiently adapted to new domains and text styles,

and handle heterogeneous mixed-domain inputs. This is framed as a

multi-task machine learning problem in which a collection of

domain-specific translation systems are learned jointly, leveraging

correlations between related domains. This approach will help to

reduce the big data requirements of current translation systems, while

also improving translation quality across a wide range of different

language pairs and application domains.

Existing research has tended to focus on a narrow interpretation of

adaptability, specifically the idea of domain adaptation in which

there is a single target domain and the challenge is how to produce

good translations by using parallel data drawn from other

domains. This project will address the more general setting where

there can be many target domains, or the testing domain is not known

in advance. This is a considerably more challenging and eminently more

useful setting than the single target domain used in the

domain-adaptation literature, improving overall translation quality

and facilitating portability to new language pairs and new domains.

This work will create novel and innovative new evaluation resources,

to supplement the standard evaluation setting which uses text from

only one or two domains. This project will create a new

comprehensive evaluation set covering a wide range of topics, drawn

from many different media sources, including user-generated content

from blogs and wikis, and over multiple challenging language

pairs. This evaluation set will highlight the short-comings of

existing machine translation research in terms of handling

heterogeneous inputs and challenging translation domains, and

contribute a critically important dataset to the research community.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.shef.ac.uk