Sum of the quadratic error
Internal quality metrics typically measure the compactness of clusters using a measure of similarity (such as Sum of Squared Error). It typically measures intra-cluster homogeneity, inter-cluster separability, or a combination of these two. It does not use external information alongside the data itself.
Sum of squared error is the simplest and most widely used criterion measure for clustering. It is calculated as:
where C_k is the set of instances of cluster k; μ_k is the vector mean of cluster k. The components of μ_k are calculated as:
where N_k = | C_k | is the number of instances belonging to cluster k.
The methods of partitioning which minimize the SSE criterion are often called minimum variance partitions, because by simple algebraic manipulation the SSE criterion can be written:
The SSE criterion function is suitable for cases where the clusters form compact clouds well separated from each other.
Additional minimum criteria for SSE can be produced by replacing the value of S_k with expressions such as: