Evaluation of clustering quality in a multi-criteria context

Category: CODE-SMG
Swimlane: 2021-2022
Column: Assigned
Position: 1

Clustering is a usual task in multivariate data analysis, pattern recognition and machine learning. It consists in grouping a set of instances in such a way that the instances in the same group (called a cluster) are more similar to each other than to those in other groups. Numerous internal cluster validity measures were defined to assess the quality of clustering solutions, in particular to guide the determination of the number of clusters.

The purpose of this work is to extend such quality indicators established for classical clustering to the multi-criteria context. In this latter case, the data are characterized by features that express performances according to different criteria that have a preference for either small (e.g. a cost) or large values (e.g., accuracy). Usually, multi-criteria clustering methods generate clusters with preference relations between them.

The work will include an experimental phase to study the behaviour of the quality indicators adapted for multi-criteria clustering on artificial and real data.

Prerequisite:

Programming (Python or Matlab or R), notions in multivariate or multi-criteria data analysis.

Contact/promoters :
Profs. Christine Decaestecker (cdecaes@ulb.ac.be) and Yves De Smet (Yves.De.Smet@ulb.ac.be)