EPSRC logo

Details of Grant 

EPSRC Reference: EP/K032968/1
Title: NaaS: Network-as-a-Service in the Cloud
Principal Investigator: Pietzuch, Professor PR
Other Investigators:
Costa, Dr P Wolf, Professor A
Researcher Co-Investigators:
Project Partners:
Advanced Micro Devices Inc (AMD) Citrix Systems NetApp
Netronome
Department: Computing
Organisation: Imperial College London
Scheme: Standard Research
Starts: 01 October 2013 Ends: 31 March 2017 Value (£): 666,149
EPSRC Research Topic Classifications:
Networks & Distributed Systems
EPSRC Industrial Sector Classifications:
Communications Information Technologies
Related Grants:
EP/K031724/2 EP/K031724/1 EP/K034723/1
Panel History:
Panel DatePanel NameOutcome
27 Feb 2013 EPSRC ICT Responsive Mode - Feb 2013 Announced
Summary on Grant Application Form
Cloud computing has significantly changed the IT landscape. Today it is possible for small companies or even single individuals to access virtually unlimited resources in large data centres (DCs) for running computationally demanding tasks. This has triggered the rise of "big data" applications, which operate on large amounts of data. These include traditional batch-oriented applications, such as data mining, data indexing, log collection and analysis, and scientific applications, as well as real-time stream processing, web search and advertising.

To support big data applications, parallel processing systems, such as MapReduce, adopt a partition/aggregate model: a large input data set is distributed over many servers, and each server processes a share of the data. Locally generated intermediate results must then be aggregated to obtain the final result.

An open challenge of the partition/aggregate model is that it results in high contention for network resources in DCs when a large amount of data traffic is exchanged between servers. Facebook reports that, for 26% of processing tasks, network transfers are responsible for more than 50% of the execution time. This is consistent with other studies, showing that the network is often the bottleneck in big data applications.

Improving the performance of such network-bound applications in DCs has attracted much interest from the research community. A class of solutions focuses on reducing bandwidth usage by employing overlay networks to distribute data and to perform partial aggregation. However, this requires applications to reverse-engineer the physical network topology to optimise the layout of overlay networks. Even with perfect knowledge of the physical topology, there are still fundamental inefficiencies: e.g. any logical topology with a server fan-out higher than one cannot be mapped optimally to the physical network if servers have only a single network interface.

Other proposals increase network bandwidth through more complex topologies or higher-capacity networks. New topologies and network over-provisioning, however, increase the DC operational and capital expenditures-up to 5 times according to some estimates-which directly impacts tenant costs. For example, Amazon AWS recently introduced Cluster Compute instances with full-bisection 10 Gbps bandwidth, with an hourly cost of 16 times the default.

In contrast, we argue that the problem can be solved more effectively by providing DC tenants with efficient, easy and safe control of network operations. Instead of over-provisioning, we focus on optimising network traffic by exploiting application-specific knowledge. We term this approach "network-as-a-service" (NaaS) because it allows tenants to customise the service that they receive from the network.

NaaS-enabled tenants can deploy custom routing protocols, including multicast services or anycast/incast protocols, as well as more sophisticated mechanisms, such as content-based routing and content-centric networking.

By modifying the content of packets on-path, they can efficiently implement advanced, application-specific network services, such as in-network data aggregation and smart caching. Parallel processing systems such as MapReduce would greatly benefit because data can be aggregated on-path, thus reducing execution times. Key-value stores (e.g. memcached) can improve their performance by caching popular keys within the network, which decreases latency and bandwidth usage compared to end-host-only deployments.

The NaaS model has the potential to revolutionise current cloud computing offerings by increasing the performance of tenants' applications -through efficient in-network processing- while reducing development complexity. It aims to combine distributed computation and network communication in a single, coherent abstraction, providing a significant step towards the vision of "the DC is the computer".
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.imperial.ac.uk