Sum of the quadratic error

Sum of the quadratic error

Internal quality metrics typically measure the compactness of clusters using a measure of similarity (such as Sum of Squared Error). It typically measures intra-cluster homogeneity, inter-cluster separability, or a combination of these two. It does not use external information alongside the data itself.

Sum of squared error is the simplest and most widely used criterion measure for clustering. It is calculated as:

Sum of squared error

where C_k is the set of instances of cluster k; μ_k is the vector mean of cluster k. The components of μ_k are calculated as:

Sum of squared error

where N_k = | C_k | is the number of instances belonging to cluster k.

The methods of partitioning which minimize the SSE criterion are often called minimum variance partitions, because by simple algebraic manipulation the SSE criterion can be written:

Sum of squared error

The SSE criterion function is suitable for cases where the clusters form compact clouds well separated from each other.

Additional minimum criteria for SSE can be produced by replacing the value of S_k with expressions such as:

Sum of squared error
To share
en_GBEN
%d bloggers like this: