Details of Grant

EPSRC Reference:

EP/C515986/1

Title:

A Unified Model for Speech Recognition and Synthesis

Principal Investigator:

Russell, Professor M

Other Investigators:

Researcher Co-Investigators:

Project Partners:

20-20 Speech Ltd.

Department:

Electronic, Electrical and Computer Eng

Organisation:

University of Birmingham

Scheme:

Standard Research (Pre-FEC)

Starts:

14 September 2005

Ends:

13 March 2009

Value (£):

225,406

EPSRC Research Topic Classifications:

Comput./Corpus Linguistics

Human Communication in ICT

EPSRC Industrial Sector Classifications:

Creative Industries

Related Grants:

Panel History:

Summary on Grant Application Form

Speech is our natural way to communicate with people, but speech communication with computers is extraordinarily difficult. A key problem is variability: two utterances of the same word will result in different acoustic patterns. Current speech recognisers pretend that this variability is random, and try to deal with it using generic statistical techniques rather than by understanding the causes of variability. This can work well, but difficult tasks, such as recognition of fluent conversational speech, or speech in noise, expose its limitations. The most successful approaches to computer speech output involve careful joining together of short fragments of real speech, taking little account of how humans generate speech. Although they deal with different aspects of the same problem, our approaches to computer speech recognition and synthesis have little in common.In a previous EPSRC project we developed a model of speech which includes a description relating more closely to how the human vocal tract produces a particular sound - we call this an articulatory-based representation. This should provide a way of modelling the underlying factors which give rise to variability in speech. Key components of our model are the articulatory representation, the model of speech dynamics in this representation, and the mapping which converts an articulatory representation into an acoustic representation of speech. We developed the theory of such a model, so that its parameters can be learnt automatically from examples of speech, investigated alternative articulatory representations, and demonstrated that the model was viable for speech recognition. One of the key discoveries was that a set of parameters which can be used to control a particular type of speech synthesiser (called a formant synthesiser) is also a suitable articulatory representation for our model. This means that, in principle, a set of models trained for recognition of a particular individual's speech could also be used to configure a speech synthesiser to sound like that individual. We refer to this as a unified model , because it supports both recognition and synthesis.Our method has several limitations: the 'articulatory-to-acoustic' mapping is linear, but we know this is a non-linear process, the trajectories in the articulatory space are linear, with discontinuities at boundaries, and we cannot 'tune' the model to a particular speaker without a lot of speech from that individual. The proposed project has several objectives. First, we will develop the theory necessary to include non-linear articulatory-to-acoustic mappings and improved models of dynamics. In parallel we will develop adaptation techniques which enable the model to be 'tuned' to an individual given a few examples of that individual's speech. Our final goal is to demonstrate competitive performance in speech and speaker recognition, and speech synthesis. In speech and speaker recognition we aim to achieve state-of-the-art performance on standard datasets (TIMIT and Switchboard respectively). In speaker recognition we will work with the US Airforce Research Lab, and in speech synthesis we will collaborate with Dr Wendy Holmes (20/20 Speech Ltd), a world expert on formant synthesis.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.bham.ac.uk