EPSRC logo

Details of Grant 

EPSRC Reference: EP/K033972/1
Title: A multicriterion approach for cluster validation
Principal Investigator: Hennig, Dr C
Other Investigators:
Researcher Co-Investigators:
Project Partners:
adam&eveDDB eCommera Select Statistical Services
University of Hamburg University of Leuven University of Valladolid
Department: Statistical Science
Organisation: UCL
Scheme: Standard Research
Starts: 01 June 2013 Ends: 31 May 2017 Value (£): 98,024
EPSRC Research Topic Classifications:
Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
13 Mar 2013 Mathematics Prioritisation Panel Meeting March 2013 Announced
Summary on Grant Application Form
Cluster analysis is about finding groups in data. It has applications in various areas such as biology, medicine, marketing, computer science, psychology, archeology, sociology.

The aim of the proposed project is to address cluster validation, which is a fundamental problem in cluster analysis. Cluster validation refers to both the evaluation of the quality of a clustering and the determination of the number of clusters.

The main idea is to develop a systematic catalogue of cluster validity indexes and to explore their properties, so that a user can match the requirements of a given application of cluster analysis by an appropriate set or aggregation of criteria. This is original, because most existing literature on cluster validation advertises "one criterion fits it all"-approaches ignoring the specific aims of clustering.

Given such a catalogue, a number of clusters in a given application can be determined by specifying a set of minimum requirements or by aggregating criteria with weights depending on the clustering aim. The quality of these approaches will be investigated.

The methods will be generalised to clusterings where some data ("outliers") are not assigned to any cluster.

For benchmarking the quality of cluster analysis methods, the given criteria will be used to explain the performance of different clustering methods on benchmark data sets from the characteristics of the true known clusterings of the data sets.

The developed approaches to determine the number of clusters will be used for deciding about the number of biological species present in data sets with genetic information.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: