EPSRC logo

Details of Grant 

EPSRC Reference: EP/L02411X/1
Title: Large-scale Unsupervised Parsing for Resource-Poor Languages
Principal Investigator: Cohen, Dr S
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Sch of Informatics
Organisation: University of Edinburgh
Scheme: First Grant - Revised 2009
Starts: 11 November 2014 Ends: 10 February 2016 Value (£): 100,651
EPSRC Research Topic Classifications:
Artificial Intelligence Comput./Corpus Linguistics
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
04 Feb 2014 EPSRC ICT Responsive Mode - Feb 2014 Announced
Summary on Grant Application Form
This project focuses on the automatic induction of grammatical structure from raw text. Automatic inference of the syntax of sentences is an old problem in natural language processing, which originates in studies attempting to build computational models for the way humans learn language.

This problem is still far from being solved. There is yet no fully-fledged computer program that takes raw text and returns a computational representation of its syntax (for example, identifying the noun phrases, the verb phrases, the prepositional phrases, and how they relate to each other in the text).

This research aims to make a major step toward building such a system. The goal is to derive a new algorithm that recovers, at least partially, the syntax of raw text. The algorithm is based on the assumption that words which frequently tend to co-occur should usually be linked, not just semantically, but also syntactically. For example, if the word "deep" often co-occurs with the word "puddle", the algorithm will assume that "deep" tends to modify the word "puddle."

The algorithm is based on a new learning paradigm developed in the machine learning community called "spectral learning". This paradigm has many advantages, most notably, its well-motivated mathematical component. This means that we can derive mathematical proofs that guarantee that the algorithm will be able to learn the syntax of a language if the algorithm is exposed to sufficiently large enough amounts of raw text.

Such proofs are important partially because they explain the learnability of language by humans. If these proofs show that we do not require much data to learn syntax, they can shed light on humans' ability to learn language from (relatively) short exposure to language through their childhood.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk