Dispersion criteria

Dispersion criteria

Internal quality metrics typically measure the compactness of clusters using a similarity measure (such as Scattering Criteria: trace, determinant, invariance). It typically measures intra-cluster homogeneity, inter-cluster separability, or a combination of these two. It does not use external information alongside the data itself.

The scalar diffusion criteria are derived from the diffusion matrices, reflecting intra-cluster diffusion, inter-cluster diffusion and their summation - the total diffusion matrix. For the k-th cluster, the diffusion matrix can be calculated as follows:

Dispersion criteria

The intra-cluster dispersion matrix is calculated as the sum of the last definition over all W clusters:

Dispersion criteria

The diffusion matrix between clusters can be calculated as follows:

Dispersion criteria

where μ is the total mean vector and is defined as:

Dispersion criteria

The total diffusion matrix should be calculated as follows:

Dispersion criteria

Three scalar criteria can be derived from S_W, S_B and S_T.

The trace

The trace is the sum of the diagonal elements of a matrix. Minimizing the trace of S_W is similar to minimizing HSE and is therefore commonly used. This criterion, representing the intra-cluster dispersion, is calculated as follows:

Trace dispersion criteria

Another criterion, which can be maximized, is the criterion between clusters:

Trace dispersion criteria

The determining

The determinant of a scattering matrix measures approximately the square of the scattering volume. Since S_B will be singular if the number of clusters is less than or equal to dimensionality, or if mc is less than dimensionality, its determinant is not an appropriate criterion. If we assume that S_W is not singular, the function of the determinant criterion is:

Decisive dispersion criteria

Invariance

The eigenvalues λ_1, λ_2 ,. . . , λ_d of S_W * S_B are the basic linear invariants of the diffusion matrices. The good partitions are those for which the non-zero eigenvalues are large. As a result, several criteria can be derived, including eigenvalues. Three of these criteria are:

Dispersion invariance criteria
To share