EPSRC Reference: |
GR/J10204/01 |
Title: |
IMPROVING OF PHONETIC DISCRIMINATION OF HIDDEN MARKOV MODEL BASED SPEECH RECOGNISERS |
Principal Investigator: |
Young, Professor SJ |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Engineering |
Organisation: |
University of Cambridge |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
01 December 1992 |
Ends: |
31 May 1996 |
Value (£): |
174,227
|
EPSRC Research Topic Classifications: |
Human Communication in ICT |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
To improve the phonetic discrimination of HMM-based systems by: the use of high-accuracy sub-phone models input transformations and discriminative training and to demonstrate their effectiveness in a working laboratory system. Progress:The first phase of the work in this project addressed the issue of how to build accurate sub-phone models. The approach has been to start from conventional context-dependent 3 state Hidden Markov Models (HMMs) and then to pool the states to form sub-phones. Initially, we investigated data-driven clustering in which states were pooled to form a sub-phone based entirely on acoustic similarity [1]. More recently, we have developed a method based on phonetic decision trees [2]. Both methods work well but the decision tree approach has the advantage that wider phonetic contexts can be included and models can be built for contexts that have not been seen in the training data.More recently the work has focused on the problem of applying transformations and discriminative training to very large vocabulary systems. To facilitate this, we have developed a method of generating lattices from a standard HMM recogniser. These can then be used to run recognition experiments quickly by rescoring rather than repeating a computationally expensive full search. They can also be used to generate alternative state alignments for a discriminative training scheme.When the project started, our focus task was medium (1000 word) vocabulary recognition as exemplified by the ARPA Resource Management Task. More recently, we have transferred our attention to large vocabulary dictation. Work from this project and EPSRC Project GR/K25380 have contributed to the building of the HTK Large Vocabulary Dictation System. This system returned the best performance of any system in the ARPA 1994 CSR Evaluation[3]. [1] Young S.J., Woodland P.C., State Clustering in HMM-based Continuous Speech Recognition. Computer Speech and Language, Vol 8, pp.369-384, 1994. [2] Young S.J., Odell J.J., Woodland P.C., Tree-Based State Tying for High Accuracy Acoustic Modelling. Proc. Human Language Technology Workshop, Morgan Kaufmann Publishers Inc, March 1994. [3] Woodland P.C., Leggetter C.J., Odell J., Valtchev V., Young S.J. The 1994 HTK Large Vocabulary Speech Recognition System. Proc ICASSP, Detroit, 1995.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.cam.ac.uk |