EPSRC Reference: |
GR/J55106/01 |
Title: |
INTONATION AND DIALOGUE MODELS AS CONSTRAINTS IN SPEECH RECOGNITION |
Principal Investigator: |
Isard, Mr S |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Theoretical and Applied Linguistics |
Organisation: |
University of Edinburgh |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
01 October 1993 |
Ends: |
31 March 1997 |
Value (£): |
227,143
|
EPSRC Research Topic Classifications: |
Human Communication in ICT |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
The objective is to build a speech recognition system that will recognise dialogue speech rather than read utterances. Our particular approach is to make extensive use of intonation and its relation to dialogue structure. The project involves extracting intonational information from utterances, analysing the function that this intonation serves in signalling the dialogue role of the utterance, building a speech recogniser and integrating the three components.Progress:The project has several components that will form the basis of a speech recognition system which can perform in a dialogue situation.Intonational LabellingA substantial amount of time has been spent on producing a robust method of detecting prosodic phrase boundaries, pitch accents and boundary tones from speech (intonational events ). The current algorithm uses sophisticated feature extraction whereby a number of tracks are produced which correspond to vowelness, obstruentation, energy, pitch and differentiated pitch. These tracks are combined into a 5 dimensional vector which is used as the input to a time-delay neural net. This net produces another track, which represents a plot against time of the likelihood that a section of utterance contains an intonational event. This track is further processed to give a basic intonational description of the utterance. The technique has an accuracy of about 90%. Current work is concentrated on testing new input feature sets and in more sophisticated post-processing of the neural-net output track. Dialogue ModellingWe have completed a study of the function of intonation in single word utterances. It shows that given a specific discourse context, one can narrow the potential discourse functions of the utterance through a process of elimination based solely upon the utterance's intonation category. Extensions of this work will use the intonation transcription that we are developing in a comparison of longer utterances in dialogue and their functions. The dialogue analysis as it will be used in this project is already complete. Speech RecognitionA new speech recogniser is being developed on the project. This recogniser uses a hybrid neural-net/hidden Markov model approach. The neural net is used to produce a set of tracks (similarly to the way described above) representing various broad phonetic classes. The hidden Markov model then uses these tracks as input. The project aims to be able to recognise and process the HCRC Map Task corpus, which consists of spontaneous dialogue speech. Grammar models have been built of the corpus, and current work involves hand labelling of the corpus in order to get data for training phonetic models. Full recognition experiments on the corpus will begin soon.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.ed.ac.uk |