EPSRC logo

Details of Grant 

EPSRC Reference: GR/H53174/01
Principal Investigator: Cooke, Professor M
Other Investigators:
Brown, Professor G Green, Professor PD
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: University of Sheffield
Scheme: Standard Research (Pre-FEC)
Starts: 01 January 1993 Ends: 30 June 1996 Value (£): 108,828
EPSRC Research Topic Classifications:
Vision & Senses - ICT appl.
EPSRC Industrial Sector Classifications:
Related Grants:
Panel History:  
Summary on Grant Application Form
(i) To develop a computational model of auditory scene analysis which integrates primitive and schema-driven grouping principles.(ii) To apply this model as enabling science in hard, real-world problems such as automated minute-taking.Progress:Initial work focussed on the provision of two 'infrastructure' items associated with the grant. The first, implementation of a blackboard architecture for integrating primitive grouping with schema-based grouping, is complete [1]. The second, involving the collection of a multi-source, multi-speaker corpus, is virtually finished [2]. The corpus was collected during Q3 1995 in collaboration with ATR, Japan. It is currently being annotated at various levels and is due for release on CDROM to the speech and hearing community during Q2 1995. This corpus is the first of its kind, and will provide a valuable resource to the rapidly-expanding computational auditory scene analysis community. It will, for instance, be used by several labs within the EU Network SPHERE, which Sheffield coordinates. In parallel with these activities, we have developed a fundamentally new approach to the recognition of auditory groups [3]. Noting that such groups generally contain fragmentary evidence for the underlying acoustic source (because other sources will inevitably dominate some time-frequency regions), we have modified standard recognition architectures to handle such occluded material. Results are remarkably robust - for instance in one digit recognition experiment we have shown that it is possible to occlude 95% of the available information without appreciable performance degradation. We have also shown constructively how it is possible to learn to recognise speech in noise, an issue which is usually sidestepped in the developmental literature. We have further elaborated what we believe to be the first model of auditory perceptual induction (the ability to perceive sounds as continuing through noise), and have shown how this can be integrated into the hidden Markov model framework used for occluded speech recognition. Auditory induction is important because it defines situations where listeners are allowed to make assumptions about masked sources. These advances point towards a new model of speech perception in which fragmentary evidence is used to access speech schemas, which then confirm or deny themselves via auditory induction. We are currently working on a neural oscillator model of auditory perceptual grouping which will eventually produce (automatically) the fragmentary evidence on which schema integration is based [4]. Note that the new end date is 30.6.96 due to a break in contract. [1] Crawford, Brown, Cooke & Green, JASA, 93(4), 1993.[2] Crawford, Brown, Cooke & Green, Proc. Inst. of Acoustics, 1994.[3] Green, Cooke & Crawford, Proc. ICASSP, 1995.[4] Brown & Cooke, submitted to ICJAI Workshop on Scene Analysis, 1995.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.shef.ac.uk