EPSRC logo

Details of Grant 

EPSRC Reference: EP/S001271/1
Title: MTStretch: Low-resource Machine Translation
Principal Investigator: Birch, Dr A
Other Investigators:
Researcher Co-Investigators:
Project Partners:
BBC Quorate Technology Limited
Department: Sch of Informatics
Organisation: University of Edinburgh
Scheme: EPSRC Fellowship - NHFP
Starts: 29 June 2018 Ends: 28 June 2021 Value (£): 517,456
EPSRC Research Topic Classifications:
Artificial Intelligence Computational Linguistics
EPSRC Industrial Sector Classifications:
Creative Industries Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
10 May 2018 EPSRC UKRI CL Innovation Fellowship Interview Panel 6 - 10 and 11 May 2018 Announced
Summary on Grant Application Form
Neural machine translation (NMT) has recently made major advances in translation quality and this technology has been rapidly adopted by industry leaders, such as Google and Amazon, and international organisations, such as the UN and the EU. However, high performing neural models require many millions of human translated sentences for training. For many real-world applications, there is not enough data to build useful MT systems. In this project I plan to stretch the resources and capabilities that we have, in order to develop robust MT technologies which are capable of being deployed for low-resource language pairs and for highly specialised low-resource domains.

I will investigate making translation significantly more robust by using the intuition that translated (or parallel) corpora contain enormous redundancies, and are an inefficient way to learn to translate. Inspired by human learning, we will study Bayesian models which build up meaning compositionally and are able to learn to learn, thus creating models which only need a few training examples. We will also develop machine learning techniques, such as transfer learning and data augmentation, to extract knowledge from monolingual and parallel resources from other languages and domains. This proposal combines fundamental research in rapid deep learning with lower-risk data-driven machine learning research in order to deliver useful products to our industry partners.

My team will provide translations for language pairs which were not previously well served by automatic machine translation. This will allow our partners, BBC World Service and BBC Monitoring, to cover under-resourced languages. Building on an existing scalable platform, created within the EU project called Scalable Understanding of Multilingual MediA (SUMMA), we can already deploy multilingual capabilities in the newsroom. The innovation fellowship will contribute to the commercialisation and sustainability of SUMMA translation components, but crucially it will allow us to cover a wider range of topical and strategic languages. Access to a high-quality translation platform for low-resource languages will help the BBC deliver impartial reporting across the world. Collaboration with our industry partner Quorate, will demonstrate the commercial potential of our research in the highly specialised domain of financial trading.

In the long term, this project will have a wider impact on British industry by breaking down language barriers affecting international trade, and by significantly improving the quality and resilience of transformative AI language technologies.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk