In the realm of data explosion, it is usually the case that a single
computational processor is unable to store the vast amount of data needed to
do any meaningful computation. The data are generally distributed among a
large number of processors/servers who need to communicate with each other
via a network in order to perform various computational tasks. The recent
trend in big data is a case in point where the rapid acquisition of a vast
amount of data makes it impossible for a single processing unit to handle.
This problem is generally addressed via different storage architectures for
fast access and efficient software paradigms such as MapReduce, Hadoop, and
Spark.
The general bottleneck of any such system can be abstracted by the following
natural computational scenario: Suppose a computational system, consisting of
several processors, wants to perform a task where the input is distributed
among the processors. Instead of being concerned with the computational time
that is required, we are interested in the communication that the processors
need to do among themselves in order to perform the task. Apart from big
data, this problem, and many of its variants, appear frequently in practice
in many guises and in different levels of abstractions--in network protocols
where the goal is to minimize the communication (and thereby error in the
communication) between two network hubs, in VLSI circuit design where the
goal is to minimize energy used and to pack efficiently by decreasing the
number of wires required, also in data-structures, circuit complexity,
auctions and a plethora of other interesting areas of study.
Many sequential algorithms that were widely used in the past have become
greatly inefficient in practice for such distributed systems. The main goal
of the proposed research is to design (or prove the hardness of) fundamental
network algorithms and their generalizations in such distributed models of
com- putation. Among them, the model of two-party communication and query
protocols highlight different challenges in accessing information for
distributively processing data over such large networks where the complete
input is not explicitly accessible, hence we exclusively focus on them in
this project.
Our goal is to study basic graph-algorithmic problems in these models to
thoroughly understand how to overcome different communication bottlenecks. We
will study them in classical setting (deterministic and
randomized/stochastic) as well as in the quantum setting as quantum computing
is undoubtedly the model of computation of the future. Because of our
reliance on efficient network algorithms in modern day-to-day life, we
believe that such research will have a large eventual impact on other areas
of computer science and engineering and, at large, society-this is an
important ingredient of the UK government's RD roadmap of supporting
long-range, fundamental, underpinning science and research. The graph or
network problems we plan to study fall into two broad categories:
connectivity-related problems and flow-related problems. These two classes of
problems have been extremely well studied for over half a century and are
arguably the two fundamental classes of network problems with countless
applications in other areas of research (e.g., operations research,
scheduling, image segmentation, network clustering) and in modern society.
Moreover, they have seen surprising progress in recent times. The novelty of
our approach towards these well-studied graph problems is the following: We
expect, from previous experience, that the insights gained from studying the
communication and query complexity of these problems will advance our
understanding in diverse research areas such as distributed, sequential and
dynamic algorithm design. This project can be viewed as the first step toward
systematically studying such universal and cross-paradigm algorithm design
techniques.
|