EPSRC Reference: |
EP/K033972/1 |
Title: |
A multicriterion approach for cluster validation |
Principal Investigator: |
Hennig, Dr C |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Statistical Science |
Organisation: |
UCL |
Scheme: |
Standard Research |
Starts: |
01 June 2013 |
Ends: |
31 May 2017 |
Value (£): |
98,024
|
EPSRC Research Topic Classifications: |
Statistics & Appl. Probability |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
Cluster analysis is about finding groups in data. It has applications in various areas such as biology, medicine, marketing, computer science, psychology, archeology, sociology.
The aim of the proposed project is to address cluster validation, which is a fundamental problem in cluster analysis. Cluster validation refers to both the evaluation of the quality of a clustering and the determination of the number of clusters.
The main idea is to develop a systematic catalogue of cluster validity indexes and to explore their properties, so that a user can match the requirements of a given application of cluster analysis by an appropriate set or aggregation of criteria. This is original, because most existing literature on cluster validation advertises "one criterion fits it all"-approaches ignoring the specific aims of clustering.
Given such a catalogue, a number of clusters in a given application can be determined by specifying a set of minimum requirements or by aggregating criteria with weights depending on the clustering aim. The quality of these approaches will be investigated.
The methods will be generalised to clusterings where some data ("outliers") are not assigned to any cluster.
For benchmarking the quality of cluster analysis methods, the given criteria will be used to explain the performance of different clustering methods on benchmark data sets from the characteristics of the true known clusterings of the data sets.
The developed approaches to determine the number of clusters will be used for deciding about the number of biological species present in data sets with genetic information.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
|