Scatter criteria

Internal quality metrics usually measure the compactness of the clusters using some similarity measure. It usually measures the intra-cluster homogeneity, the inter-cluster separability or a combination of these two. It does not useany external information beside the data itself.

The scalar scatter criteria are derived from the scatter matrices, reflecting the within-cluster scatter, the between-cluster scatter and their summation — the total scatter matrix. For the k-th cluster, the scatter matrix may be calculated as:

The within-cluster scatter matrix is calculated as the summation of the last definition over all clusters W:

The between-cluster scatter matrix may be calculated as:

where μ is the total mean vector and is defined as:

The total scatter matrix should be calculated as:

Three scalar criteria may be derived from S_W, S_B and S_T.

The trace criterion

The trace is the sum of the diagonal elements of a matrix. Minimizing the trace of S_W is similar to minimizing SSE and is therefore acceptable. This criterion, representing the within-cluster scatter, is calculated as:

Another criterion, which may be maximized, is the between cluster criterion:

The determinant criterion

The determinant of a scatter matrix roughly measures the square of the scattering volume. Since S_B will be singular if the number of clusters is less than or equal to the dimen-sionality, or if m-c is less than the dimensionality, its determinant is not an appropriate criterion. If we assume that S_W is non singular, the determinant criterion function using this matrix may be employed:

The invariant criterion

The eigenvalues λ_1, λ_2, . . . , λ_d of S_W*S_B are the basic linear invariants of the scatter matrices. Good partitions are ones for which the nonzero eigenvalues are large. As a result, several criteria may be derived including the eigenvalues. Three such criteria are: