EPSRC logo

Details of Grant 

EPSRC Reference: EP/T01461X/1
Title: Algorithmic Support for Massive Scale Distributed Systems
Principal Investigator: Shakhlevich, Dr N
Other Investigators:
Erlebach, Professor TR Xu, Professor J Strusevich, Professor V
Researcher Co-Investigators:
Project Partners:
Alibaba Group Edgetic Ltd
Department: Sch of Computing
Organisation: University of Leeds
Scheme: Standard Research
Starts: 01 July 2020 Ends: 30 April 2024 Value (£): 1,010,659
EPSRC Research Topic Classifications:
Mathematical Aspects of OR
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
27 Nov 2019 EPSRC Mathematical Sciences Prioritisation Panel November 2019 Announced
Summary on Grant Application Form
Resource scheduling in massive-scale distributed systems is the process of matching demand with supply. Demand is associated with requests for resources to execute workloads, such as jobs, tasks and applications. Typical resources in a distributed computing system include servers within a data centre cluster. A scheduler aims to achieve several goals, for example, to maximise system throughput, to minimise response time, to optimise energy usage, etc. These goals may conflict (e.g. throughput versus latency), and the scheduler needs to make a suitable compromise, depending on the user's needs and objectives.

In a data centre system with hundreds of thousands of distributed servers, its massive scale is characterised by a number of factors that contribute to the system complexity:

- the number of server nodes in the cluster, interconnections between resources and heterogeneity of resources (different types of CPUs, memories, local storages);

- the number of concurrent jobs in the system and their arrival rate;

- heterogeneity of jobs (different requirements of CPU, memory and local storage; different patterns of resource usage, long-running jobs vs short-alive jobs; urgent jobs vs jobs with loose deadlines).

The key requirement for the system is its scalability - the ability of the system to sustain the required throughput level (such as operations per second) while confining the perceptional response latencies to a level similar to a small or medium size system.

In our project, we aim to address the following challenges:

(a) scheduling at scale (to make prompt scheduling decisions at a rapid rate);

(b) resource utilisation at scale (to improve utilisation of resources while maintaining high quality of service);

(c) Quality-of-Service provision at scale (to satisfy requirements of diverse workloads).

Existing scheduling algorithms developed for practical systems are often designed largely based on empirical knowledge, experience, and best effort. Due to the lack of theoretical foundation, performance of those algorithms cannot be always guaranteed. On the other hand, scheduling algorithms proposed by the theoretical community are usually based on oversimplified abstract system models. Theoretically sound algorithms, with guaranteed accuracy and time complexity, are often impractical because system models do not reflect practical complexity of real systems, and even minor adjustments of system models towards real systems make algorithms no longer applicable.

In our project, theoretical and applied experts will consolidate efforts to conduct jointly an interdisciplinary study, overcoming the shortcomings of isolated research. Overall, our project is 1) methodologically driven, attempting to extend the applicability of the most powerful techniques of mathematical optimisation; 2) application driven, where the challenges of massive-scale distributed systems invoke new developments of scheduling methodology; and 3) practice driven, where the research direction is based on hands-on experience of distributed systems specialists.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.leeds.ac.uk