EPSRC logo

Details of Grant 

EPSRC Reference: EP/P011586/1
Title: SCRIPT: Speech Synthesis for Spoken Content Production
Principal Investigator: King, Professor S
Other Investigators:
Yamagishi, Dr J
Researcher Co-Investigators:
Dr O Watts
Project Partners:
Department: Centre for Speech Technology Research
Organisation: University of Edinburgh
Scheme: Standard Research
Starts: 01 December 2016 Ends: 30 November 2019 Value (£): 533,268
EPSRC Research Topic Classifications:
Human Communication in ICT
EPSRC Industrial Sector Classifications:
Creative Industries
Related Grants:
Panel History:
Panel DatePanel NameOutcome
20 Oct 2016 EPSRC ICT Prioritisation Panel Oct 2016 Announced
Summary on Grant Application Form

The cost of producing dynamically-updated media content - such as online video news packages - across multiple languages is very high. Maintaining substantial teams of journalists per language is expensive and inflexible. Modern media organisations like the BBC or the Financial Times need a more agile approach: they must be able to react quickly to changing world events (e.g., breaking news or emerging markets), dynamically allocating their limited resources in response to external demands. Ideally, they would like to create `pop-up' services & products in previously-unsupported languages, then to scale them up or down later.

The government has set the BBC a target of reaching a global audience of 500 million people by 2022, compared with today's 308 million. The only way to reach such a huge audience is through new language services and efficient production techniques. Text-to-speech - which automatically produces speech from text - offers an attractive solution to this challenge, and the BBC have identified computer assisted translation and text-to-speech as key technologies that will provide them with new ways of creating and reversioning their content across many languages.

This project's objectives are to push text-to-speech technology towards "broadcast quality" computer-generated speech (i.e., good enough for the BBC to broadcast) in many languages, and to make it cheap and easy to add more languages later. We will do this by combining and extending several distinct pieces of our previous basic research on text-to-speech. We will use the latest data-driven machine learning techniques, and extend them to produce much higher quality output speech. At the same time, we will enable the possibility of human control over the speech. This will allow the user (e.g., a BBC journalist) to adjust the speech to make sure the quality and the speaking style is right for their purposes (e.g., correcting the pronunciation of a difficult word, or putting emphasis in the right place).

The technology we will create for the likes of the BBC will also enable smaller companies and other organisations, state bodies, charities, and individuals to rapidly create high-quality spoken content, in whatever language or domain they are operating. We will work with other types of organisation during the project, to make sure that the technology we create has broad appeal and will be useful to a wide range of companies and individuals.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk