EPSRC Reference: |
EP/W00576X/1 |
Title: |
ParaSol: Fine-Grained Thread-Level Parallelism for Single-Threaded Performance |
Principal Investigator: |
Jones, Professor TM |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Science and Technology |
Organisation: |
University of Cambridge |
Scheme: |
Standard Research |
Starts: |
07 February 2022 |
Ends: |
06 August 2025 |
Value (£): |
1,091,793
|
EPSRC Research Topic Classifications: |
Fundamentals of Computing |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
21 Jun 2021
|
EPSRC ICT Prioritisation Panel 22-23 June 2021
|
Announced
|
|
Summary on Grant Application Form |
Since the turn of the century, multicore processors have become commonplace in almost all computing domains. Instead of performance coming solely from the extraction of instruction-level parallelism (ILP), it now also requires software developers or compilers to break applications into multiple streams of instructions to exploit coarse-grained thread-level parallelism (TLP). Whilst extremely beneficial for a large class of programs, single-threaded performance still matters greatly, especially during sequential parts of an application where execution speed can dominate overall program performance (sometimes dubbed "Amdahl's cruel law"). In addition, improvements in single-threaded performance benefit all applications, as each thread experiences a performance uplift, thus impacting all parts of the code-sequential and parallel.
However, improving single-threaded performance is hard. The move to multicore was driven by the power limitations of complex out-of-order hardware schemes to extract ILP (caused by the failure of Dennard scaling in the underlying transistor technologies). While designers do still increase the out-of-order instruction window, unfortunately this only makes a marginal difference and future designs are expected to be limited by Pollack's rule and the fundamental limits of ILP (the ILP wall). Conversely, although many applications would see a major performance boost from taking advantage of TLP, actually extracting it remains a challenge (John Hennessy said writing parallel code is "a problem that's as hard as any that computer science has faced").
This project takes a radically different approach. Instead of going back to the future with elaborate schemes for out-of-order execution, it explores the space between ILP and the coarse-grained TLP exploited by modern multicores. In particular, it focuses on the extraction of fine-grained TLP from a single stream of instructions within and across cores. On the one hand it will investigate schemes to identify and spin-up independent short-running threads (hardware threadlets) transparently to the application, so as to boost single-threaded performance. On the other, it will research compiler techniques to indicate this parallelism, with the hardware able to exploit it within and across multiple tightly coupled cores. If successful, this project would lead to a step change in performance of high-performance cores, driven by increased utilisation of core resources and the ability to increase those resources in a scalable manner. It would also open up a broader design space, trading out-of-order pipeline complexity for ILP with increased TLP, to find better balances between area, efficiency and application-domain suitability.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.cam.ac.uk |