EPSRC logo

Details of Grant 

EPSRC Reference: EP/W00576X/1
Title: ParaSol: Fine-Grained Thread-Level Parallelism for Single-Threaded Performance
Principal Investigator: Jones, Professor TM
Other Investigators:
Researcher Co-Investigators:
Project Partners:
ARM Ltd
Department: Computer Science and Technology
Organisation: University of Cambridge
Scheme: Standard Research
Starts: 07 February 2022 Ends: 06 August 2025 Value (£): 1,091,793
EPSRC Research Topic Classifications:
Fundamentals of Computing
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
21 Jun 2021 EPSRC ICT Prioritisation Panel 22-23 June 2021 Announced
Summary on Grant Application Form
Since the turn of the century, multicore processors have become commonplace in almost all computing domains. Instead of performance coming solely from the extraction of instruction-level parallelism (ILP), it now also requires software developers or compilers to break applications into multiple streams of instructions to exploit coarse-grained thread-level parallelism (TLP). Whilst extremely beneficial for a large class of programs, single-threaded performance still matters greatly, especially during sequential parts of an application where execution speed can dominate overall program performance (sometimes dubbed "Amdahl's cruel law"). In addition, improvements in single-threaded performance benefit all applications, as each thread experiences a performance uplift, thus impacting all parts of the code-sequential and parallel.

However, improving single-threaded performance is hard. The move to multicore was driven by the power limitations of complex out-of-order hardware schemes to extract ILP (caused by the failure of Dennard scaling in the underlying transistor technologies). While designers do still increase the out-of-order instruction window, unfortunately this only makes a marginal difference and future designs are expected to be limited by Pollack's rule and the fundamental limits of ILP (the ILP wall). Conversely, although many applications would see a major performance boost from taking advantage of TLP, actually extracting it remains a challenge (John Hennessy said writing parallel code is "a problem that's as hard as any that computer science has faced").

This project takes a radically different approach. Instead of going back to the future with elaborate schemes for out-of-order execution, it explores the space between ILP and the coarse-grained TLP exploited by modern multicores. In particular, it focuses on the extraction of fine-grained TLP from a single stream of instructions within and across cores. On the one hand it will investigate schemes to identify and spin-up independent short-running threads (hardware threadlets) transparently to the application, so as to boost single-threaded performance. On the other, it will research compiler techniques to indicate this parallelism, with the hardware able to exploit it within and across multiple tightly coupled cores. If successful, this project would lead to a step change in performance of high-performance cores, driven by increased utilisation of core resources and the ability to increase those resources in a scalable manner. It would also open up a broader design space, trading out-of-order pipeline complexity for ILP with increased TLP, to find better balances between area, efficiency and application-domain suitability.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.cam.ac.uk