Details of Grant

EPSRC Reference:

EP/H049665/1

Title:

Audio and Video Based Speech Separation for Multiple Moving Sources Within a Room Environment

Principal Investigator:

Chambers, Professor J

Other Investigators:

Lambotharan, Professor S

Researcher Co-Investigators:

Project Partners:

Department:

Electronic, Electrical & Systems Enginee

Organisation:

Loughborough University

Scheme:

Standard Research

Starts:

01 October 2010

Ends:

31 December 2013

Value (£):

300,747

EPSRC Research Topic Classifications:

Digital Signal Processing	Image & Vision Computing
Music & Acoustic Technology

EPSRC Industrial Sector Classifications:

Information Technologies

Related Grants:

EP/H050000/1

Panel History:

Panel Date	Panel Name	Outcome
16 Mar 2010	ICT Prioritisation Panel (March 10)	Announced

Summary on Grant Application Form

Human beings have developed a unique ability to communicate within a noisy environment, such as at a cocktail party. This skill is dependent upon the use of both the aural and visual senses together with sophisticated processing within the brain. To mimic this ability within a machine is very challenging, particularly if the humans are moving, such as in a teleconferencing context, when human speakers are walking around a room. In the field of signal processing researchers have developed techniques to separate one speech signal from a mixture of such signals, as would be measured by a number of microphones, on the basis of only audio information with the assumption that the humans are static and typically no more than two humans are within the room. Such approaches have generally been found to fail, however, when the human speakers are moving and when there are more than two in number. Fundamentally new approaches are therefore necessary to advance the state-of-the-art in the field. Professor Chambers and his team at Loughborough University were the first in the UK to propose a new approach on the basis of combined audio and video processing to solve the source separation problem, but their preliminary approach identified major challenges in audio-visual speaker localization, tracking and separation which must be solved to provide a practical solution for speech separation for multiple moving sources within a room environment. These findings motivate this new project in which world-leading teams at the University of Surrey, led by Professor Kittler, and at the GIPSA Lab, Grenoble, France, headed by Professor Jutten, are ready to work with Professor Chambers and his team at Loughborough University to advance the state-of-the-art in the field.In this new project, two postdoctoral researchers will be employed, one at Loughborough and another at Surrey. The first will focus on the development of fundamentally new speech source separation algorithms for moving speakers by using geometrical room acoustic (for example location and number of sources, descriptions of their movement) information provided by the second researcher. The research team at Grenoble will provide technical guidance on the basis of their considerable experience in source separation throughout the project and will work on providing an acoustic noise model for the room environment which will also aid the speech separation process. To achieve these tasks, frequency domain based beamforming algorithms will be developed which exploit microphone arrays having more microphones than speakers so that new data independent superdirective robust beamformer design methods can be exploited using mathematical convex optimization. Additionally, further geometic information will be exploited to introduce robustness to errors in the localization information describing the desired source and the interference. To improve the localization information an array of collaborative cameras will be used and both audio and visual information will be used. Advanced methods from particle filtering and probabilistic data association will be exploited for improving the tracking performance. Finally, visual voice activity detection will be used to determine the active sources within the beamforming operations. We emphasize that this work is not implementation-driven, so computational complexity for real-time realization will not be a focus; this would be the subject of a future project.All of the new algorithms will be evaluated both in terms of objective and subjective performance measures on labelled audio and visual datasets acquired at Loughbourgh and Surrey, and from the CHIL seminar room at the Karlsruhe University (UKA), Germany. To ensure this pioneering work has maximum impact on the UK and international academic and research communities all the algorithms and datasets will be made available through the project website.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

http://www.lboro.ac.uk/departments/eese/research/communications/signal-processing/

Further Information:

Organisation Website:

http://www.lboro.ac.uk