EPSRC logo

Details of Grant 

EPSRC Reference: EP/I010858/1
Title: Bayesian Synchronous Grammar Induction
Principal Investigator: Blunsom, Dr P
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: University of Oxford
Scheme: First Grant - Revised 2009
Starts: 02 May 2011 Ends: 01 August 2012 Value (£): 100,240
EPSRC Research Topic Classifications:
Artificial Intelligence Interpreting & Translation
EPSRC Industrial Sector Classifications:
Communications Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
07 Sep 2010 ICT Prioritisation Panel (Sept 2010) Announced
Summary on Grant Application Form
Statistical Machine Translation (SMT) is the technology that allows computers to learn to translate between human languages (English, French, Chinese etc.) by being shown large numbers of example translations. This is the technology that drives popular online translation tools such as those provided by Google and Bing (Microsoft).The last decade of research in SMT has seen rapid progress, as small scale research systems have matured into large commercial products and popular online tools. Unfortunately the success of SMT has not been uniform; current state-of-the-art translation output varies markedly in quality depending on the languages being translated. Those language pairs that are closely related (e.g. English and French) can be translated with a high degree of precision, while for distant pairs (e.g. English and Chinese) the result is far from acceptable. This effect is clearly discernible when comparing the state-of-the-art for two well studied language pairs: Arabic-English and Chinese-English. While the quality of Arabic-English translation could be described as remarkable, translating Chinese into English often results in unreadable output. Clearly SMT has a long way to go before being usable across a large range of languages. It has been tempting to argue that SMT's current limitations can be overcome simply by increasing the amount of data on which the systems are trained. However large scale evaluation campaigns for Chinese-English translation, with ever increasing model sizes, have not yielded the hoped for gains. The failure to adequately translate between languages such as Chinese and English can be attributed to two significant shortcomings of current translation models: 1. an inability to model large changes in word order between input and output languages (referred to as reordering), 2. no reliable mechanism for directly learning phrasal (non-word based) translation units: a significant issue for non-segmenting languages (languages such as Chinese which don't use spaces to separate words) and languages with complex morphology (e.g. German). While a significant amount of research effort is currently being applied to tackling these issues, the proposed solutions are limited by focusing on more expressive models for producing translations rather than addressing the issue of how the translation units are learnt in the first place. In this research proposal I argue that both the fundamental structure and estimation methods of SMT models must change. By recasting the problem of learning translation models as synchronous grammar induction I aim to build models capable of handling complex translation phenomena, bringing us closer to the goal of readily available translation between all the worlds languages. I propose to go beyond current research on inducing statistical translation models by using non-parametric Bayesian methods to directly learn a state-of-the-art synchronous grammar translation model from parallel sentences. This research will have the following research impacts: 1. At present synchronous grammars for translation are learnt from non-hierarchical word alignment models, losing much of the benefit of the grammars ability to represent difficult translation phenomena not captured in the alignments. By simultaneously learning the alignments and the grammar in one model the full power of these hierarchical models will be unlocked. 2. This research will advance the state-of-the-art for learning complex structured models within a non-parametric formulation, an important contribution for both machine translation and many other areas of machine learning.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ox.ac.uk