Internal quality metrics usually measure the compactness of the clusters using some similarity measure. It usually measures the intra-cluster homogeneity, the inter-cluster separability or a combination of these two. It does not useany external information beside the data itself.
The scalar scatter criteria are derived from the scatter matrices, reflecting the within-cluster scatter, the between-cluster scatter and their summation — the total scatter matrix. For the k-th cluster, the scatter matrix may be calculated as:
The within-cluster scatter matrix is calculated as the summation of the last definition over all clusters W:
The between-cluster scatter matrix may be calculated as:
where μ is the total mean vector and is defined as:
The total scatter matrix should be calculated as:
Three scalar criteria may be derived from S_W, S_B and S_T.
The trace criterion
The trace is the sum of the diagonal elements of a matrix. Minimizing the trace of S_W is similar to minimizing SSE and is therefore acceptable. This criterion, representing the within-cluster scatter, is calculated as:
Another criterion, which may be maximized, is the between cluster criterion:
The determinant criterion
The determinant of a scatter matrix roughly measures the square of the scattering volume. Since S_B will be singular if the number of clusters is less than or equal to the dimen-sionality, or if m-c is less than the dimensionality, its determinant is not an appropriate criterion. If we assume that S_W is non singular, the determinant criterion function using this matrix may be employed:
The invariant criterion
The eigenvalues λ_1, λ_2, . . . , λ_d of S_W*S_B are the basic linear invariants of the scatter matrices. Good partitions are ones for which the nonzero eigenvalues are large. As a result, several criteria may be derived including the eigenvalues. Three such criteria are: