Internal quality criteria

Internal quality metrics usually measure the compactness of the clusters using some similarity measure. It usually measures the intra-cluster homogeneity, the inter-cluster separability or a combination of these two. It does not useany external information beside the data itself.

Condorcet’s criterion

Another appropriate approach is to apply the Condorcet’s solution to the ranking problem. In this case the criterion is calculated as following:

where s(x_j, x_k) and d(x_j, x_k) measure the similarity and distance of the vectors x_j and x_k.

The C-criterion

The C-criterion is an extension of Condorcet’s criterion and is defined as (where γ is a threshold value):

Category utility metric

The category utility is defined as the increase of the expected number of feature values that can be correctly predicted given a certain clustering. This metric is useful for problems that contain a relatively small number of nominal features each having small cardinality.

Edge cut metrics

In some cases it is useful to represent the clustering problem as an edge cut minimization problem. In such instances the quality is measured as the ratio of the remaining edge weights to the total precut edge weights. If there is no restriction on the size of the clusters, finding the optimal value is easy. Thus the min-cut measure is revised to penalize imbalanced structures.