EPSRC Reference: |
EP/T01461X/1 |
Title: |
Algorithmic Support for Massive Scale Distributed Systems |
Principal Investigator: |
Shakhlevich, Dr N |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Computing |
Organisation: |
University of Leeds |
Scheme: |
Standard Research |
Starts: |
01 July 2020 |
Ends: |
30 April 2024 |
Value (£): |
1,010,659
|
EPSRC Research Topic Classifications: |
Mathematical Aspects of OR |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
Resource scheduling in massive-scale distributed systems is the process of matching demand with supply. Demand is associated with requests for resources to execute workloads, such as jobs, tasks and applications. Typical resources in a distributed computing system include servers within a data centre cluster. A scheduler aims to achieve several goals, for example, to maximise system throughput, to minimise response time, to optimise energy usage, etc. These goals may conflict (e.g. throughput versus latency), and the scheduler needs to make a suitable compromise, depending on the user's needs and objectives.
In a data centre system with hundreds of thousands of distributed servers, its massive scale is characterised by a number of factors that contribute to the system complexity:
- the number of server nodes in the cluster, interconnections between resources and heterogeneity of resources (different types of CPUs, memories, local storages);
- the number of concurrent jobs in the system and their arrival rate;
- heterogeneity of jobs (different requirements of CPU, memory and local storage; different patterns of resource usage, long-running jobs vs short-alive jobs; urgent jobs vs jobs with loose deadlines).
The key requirement for the system is its scalability - the ability of the system to sustain the required throughput level (such as operations per second) while confining the perceptional response latencies to a level similar to a small or medium size system.
In our project, we aim to address the following challenges:
(a) scheduling at scale (to make prompt scheduling decisions at a rapid rate);
(b) resource utilisation at scale (to improve utilisation of resources while maintaining high quality of service);
(c) Quality-of-Service provision at scale (to satisfy requirements of diverse workloads).
Existing scheduling algorithms developed for practical systems are often designed largely based on empirical knowledge, experience, and best effort. Due to the lack of theoretical foundation, performance of those algorithms cannot be always guaranteed. On the other hand, scheduling algorithms proposed by the theoretical community are usually based on oversimplified abstract system models. Theoretically sound algorithms, with guaranteed accuracy and time complexity, are often impractical because system models do not reflect practical complexity of real systems, and even minor adjustments of system models towards real systems make algorithms no longer applicable.
In our project, theoretical and applied experts will consolidate efforts to conduct jointly an interdisciplinary study, overcoming the shortcomings of isolated research. Overall, our project is 1) methodologically driven, attempting to extend the applicability of the most powerful techniques of mathematical optimisation; 2) application driven, where the challenges of massive-scale distributed systems invoke new developments of scheduling methodology; and 3) practice driven, where the research direction is based on hands-on experience of distributed systems specialists.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.leeds.ac.uk |