Internal quality criteria

Internal quality criteria

Internal quality criteria (Condorcet criterion, C criterion, category utility metric, cut measurements) generally measure the compactness of clusters using a similarity measure. It typically measures intra-cluster homogeneity, inter-cluster separability, or a combination of these two. It does not use external information alongside the data itself. Here are four internal quality criteria.

Condorcet criterion

Another suitable approach is to apply Condorcet's solution to the classification problem. In this case, the criterion is calculated as follows:

internal quality criteria condorcet criterion

where s (x_j, x_k) and d (x_j, x_k) measure the similarity and distance of vectors x_j and x_k.

Criterion C

Criterion C is an extension of Condorcet's criterion and is defined as (where γ is a threshold value):

internal quality criteria criterion C

Category utility metric

Category utility is defined as increasing the expected number of entity values that can be correctly predicted given a certain grouping. This metric is useful for problems that contain a relatively small number of nominal features each having a small cardinality.

Cutting measures

Dans certains cas, il est utile de représenter le problème de clustering comme un problème de coupe minimal. Dans de tels cas, la qualité est mesurée comme le rapport des poids restants aux poids coupés totaux. S’il n’y a pas de restriction sur la taille des clusters, il est facile de trouver la valeur optimale. Ainsi, la mesure min-cut est révisée pour pénaliser les structures déséquilibrées.

To share
en_GBEN
%d bloggers like this: