EPSRC logo

Details of Grant 

EPSRC Reference: GR/N19106/01
Title: ENABLING MINORITY LANGUAGE ENGINEERING (EMILLE)
Principal Investigator: McEnery, Professor T
Other Investigators:
Gaizauskas, Professor R
Researcher Co-Investigators:
Project Partners:
Association of Translation Companies ATC BBC British Council
Central Ist Indian Languages Elra Intern'l Translation Resources
Lake House Printers National Software Centre Sesame Computer Projects
Sharp Laboratories of Europe Ltd Sylheti Translation Research University of Moratuwa
Department: Linguistics and English Language
Organisation: Lancaster University
Scheme: Standard Research (Pre-FEC)
Starts: 05 June 2000 Ends: 04 October 2003 Value (£): 267,546
EPSRC Research Topic Classifications:
Human Communication in ICT
EPSRC Industrial Sector Classifications:
Creative Industries No relevance to Underpinning Sectors
Related Grants:
Panel History:  
Summary on Grant Application Form
SUMMARY:The Baker & McEnery (1999) survey of language engineers identified a major need for resources to enable this group to build LE systems for Indic languages. EMILLE will equip those Indic languages identified in the survey (Bengali, Gujarati, Hindi, Panjabi Singhalese, Tamil and Urdu) with a language engineering infrastructure. To do so we will have to extend a language engineering architecture (goal one), develop corpora (goal two) and build some basic LE applications (goal 3).Goal one - EMILLE will extend GATE to be UNICODE compliant so that it may act as a framework within goals two and three may be achieved.Goal two - EMILLE will generate written language corpora of 9,000,000 words for Bengali, Gujarati, Hindi, Panjabi, Singhalese, Tamil and Urdu. For those languages with a UK community large enough to sustain spoken corpus collection (Bengali, Gujarati, Hindi, Panjabi Singhalese, Tamil and Urdu) EMILLE will also produce spoken corpora of 500,000 words per language.Goal three - Within the GATE framework tools will be developed to allow for mapping a range of font-based representations of Indic writing systems into UNICODE, part-of-speech tagging of at least one of the languages represented in the corpus and allign the parallel corpora within EMILLE.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.lancs.ac.uk