Internal quality metrics typically measure the compactness of clusters using a similarity measure (such as Scattering Criteria: trace, determinant, invariance). It typically measures intra-cluster homogeneity, inter-cluster separability, or a combination of these two. It does not use external information alongside the data itself.
The scalar diffusion criteria are derived from the diffusion matrices, reflecting intra-cluster diffusion, inter-cluster diffusion and their summation - the total diffusion matrix. For the k-th cluster, the diffusion matrix can be calculated as follows:
The intra-cluster dispersion matrix is calculated as the sum of the last definition over all W clusters:
The diffusion matrix between clusters can be calculated as follows:
where μ is the total mean vector and is defined as:
The total diffusion matrix should be calculated as follows:
Three scalar criteria can be derived from S_W, S_B and S_T.
The trace is the sum of the diagonal elements of a matrix. Minimizing the trace of S_W is similar to minimizing HSE and is therefore commonly used. This criterion, representing the intra-cluster dispersion, is calculated as follows:
Another criterion, which can be maximized, is the criterion between clusters:
The determinant of a scattering matrix measures approximately the square of the scattering volume. Since S_B will be singular if the number of clusters is less than or equal to dimensionality, or if mc is less than dimensionality, its determinant is not an appropriate criterion. If we assume that S_W is not singular, the function of the determinant criterion is:
The eigenvalues λ_1, λ_2 ,. . . , λ_d of S_W * S_B are the basic linear invariants of the diffusion matrices. The good partitions are those for which the non-zero eigenvalues are large. As a result, several criteria can be derived, including eigenvalues. Three of these criteria are: