EPSRC logo

Details of Grant 

EPSRC Reference: EP/P009093/2
Title: Decentralised, Large-scale Resource Management in Modern Data Centres
Principal Investigator: Kalyvianaki, Dr E
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Computer Science and Technology
Organisation: University of Cambridge
Scheme: First Grant - Revised 2009
Starts: 02 October 2017 Ends: 30 April 2019 Value (£): 74,915
EPSRC Research Topic Classifications:
Computer Sys. & Architecture Networks & Distributed Systems
Software Engineering
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:  
Summary on Grant Application Form
The backbone of modern, world-wide Information Technology (IT) and Cloud infrastructure consists of a global network of data centres (DCs) each equipped with thousands of server machines. Modern large DCs are equipped with 50,000 to 100,000 of server machines and run a diverse set of application workloads. Reports show that about three million DCs containing 12 million of server machines run all US online operations. We face a DC environment for application deployment of unprecedented scale with regards to the number of server machines and applications.

The enormous scale of servers in modern DCs dramatically affects DCs' capital and operational costs. Capital costs include all initial spending for DC equipment, including server machines and operational costs are towards the DCs' daily operation including electricity consumption and personnel salaries for management. The costs for running DC are enormous. Reports show that the 2015 world-wide spending on DC systems was $170 billion and these are expected to grow by 3% for 2016 to $175 billion.

Given the high DC expenditure it is of paramount importance that modern DCs operate in a cost-effective manner, i.e. server machines are fully utilised by running applications and applications are adequately provisioned to meet their performance goals. However, there are numerous reports showing that machines in DCs are on average only 10-15% CPU utilised. The main cause of low utilisation has been the practice of over-provisioning applications with resources to match even their most demanding application workload demands, however rare they might be. However, as workloads are typically time-varying with unknown variations, this practice has led to a dramatic under-utilisation of modern DC resources and consequently to an excess of DC expenditure. Futhermore, practitioners report that current management frameworks are inadequate to perform scalable operational tasks in large-scale environments such as the Cloud. It is therefore an open challenge how to tackle the resource management problem in modern large-scale DCs and increase the overall resource utilisation while satisfying applications' performance demands.

We propose a new decentralised resource management approach to tackle the under-utilisation problem of DCs.

We envisage a decentralised scheme where resource schedulers are distributed across the DC and each scheduler controls the resource allocation of a subset of the DC machines referred to as clusters, i.e. a cluster contains a few 100s of servers. The use of cluster schedulers aims to increase the effective utilisation of machines within a cluster in a timely fashion. Global resource planning across all DC servers is achieved through decentralised coordination of all schedulers. Schedulers communicate to exchange resource utilisation information of their clusters and application performance information for global convergence. To increase the overall utilisation, the goal is to balance the load across all clusters while avoiding hotspots and under-utiisation. The novelty of this work will be on the coordination of the distributed set of cluster schedulers for global resource planning. We aim to use a distributed optimisation and control approach.

The potential impact of this work is huge. We anticipate an impact in the Economy of the DC sector and in the domains of People and Knowledge as the proposed work will assist the development of IT administrators' skills.

The ultimate beneficiary is Society and in particular developers and end-users of Cloud and IT applications. UK currently holds the largest European data centre market. The proposed research has the potential to significantly strengthen the position of the UK in the important DC sector and impact its international position.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.cam.ac.uk