EPSRC logo

Details of Grant 

EPSRC Reference: EP/J016330/1
Title: DOME: Delaying and Overcoming Microprocessor Errors
Principal Investigator: Lujan, Professor M
Other Investigators:
Furber, Professor S B
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: University of Manchester, The
Scheme: Standard Research
Starts: 27 September 2012 Ends: 30 September 2016 Value (£): 589,625
EPSRC Research Topic Classifications:
Computer Sys. & Architecture
EPSRC Industrial Sector Classifications:
Electronics Information Technologies
Related Grants:
EP/J016284/1
Panel History:
Panel DatePanel NameOutcome
01 Feb 2012 EPSRC ICT Responsive Mode - Feb 2012 Announced
Summary on Grant Application Form
Modern day computer systems have benefited from being designed and manufactured using an ever-increasing budget of transistors with very reliable integrated circuits. However, moving forward such a ''free lunch'' is over and forgotten nightmares faced by computer pioneers are coming back to haunt us. Not so long ago, unreliable valves were the basic building blocks for computers and research focussed on how to successfully compute, overcoming this underlying weakness (e.g. von Neuman, 1956, ''Probabilistic logics and the synthesis of reliable organisms from unreliable components'').

State-of-the-art integrated circuit technologies have now reached the range of 40-22 nanometers, posing significant reliability challenges. Hard or permanent errors can manifest themselves at any point during a processor's lifetime. During manufacturing, errors can render a proportion of a chip incapable of computing, thus decreasing yield and profit.

As we move towards smaller and smaller components, transistors take less and less time to wearout, becoming more prone to failure in the field. Traditional reliability solutions involve applying high-cost redundancy to the hardware structures within the processor, providing backup spares for when errors occur. On the application side, solutions also involve redundancy by running multiple copies of each piece of software.

A common criticism of current reliability solutions is that they do not consider how the software and hardware can be co-designed synergistically to tackle this challenge. Redesigning and reimplementing general purpose software applications will incur an unaffordable price tag. Our hypothesis is that virtualization technologies (a layer that transparently hides the underlying platform from the application software) have an important role to play. In particular, managed runtime environments (MREs) have become pervasive for high-productivity software developers and represent a promising vehicle for providing reliability mechanisms. Within these systems, applications can be monitored and morphed without user intervention.

There are two complementary strands to our proposed research, focused around a co-designed MRE and multicore computer architecture. Firstly, we will consider wearout mitigation schemes to slow processor ageing and lengthen a chip's lifetime before a hard fault occurs. Secondly, given that an error will occur at some point during a system's life, we will develop error-tolerance approaches that maintain execution on faulty hardware.

If successful, we believe this project will be seen as a significant milestone in the development of wearout-conscious and error-tolerant multicore architectures over the next decade. This research programme will advance our understanding of the field, tackling the UK Microelectronics Grand Challenge of Moore for Less that has been signposted by EPSRC. It is also important to highlight that this proposal tackles a key aspect of the new EPSRC ICT capability priority on "Many-core architectures and concurrency in distributed and embedded systems".

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.man.ac.uk