Contents
ToggleSimilarity function
An alternative concept to that of distance is the similarity function (cosine measure, correlation of Pearson, Extended Jaccard measure, measure of the Dice coefficient) s(x_i, x_j) which compares the two vectors x_i and x_j. This function should be symmetric (i.e. s(x_i, x_j) = s(x_j, x_i)) and have a large value when x_i and x_j are somehow "similar" and are the largest value for identical vectors.
A similarity function where the target range is [0,1] is called a dichotomous similarity function. In fact, the methods of calculating "distances" in the case of binary attributes and nominal can be viewed as similarity functions rather than distances.
Cosine measurement
When the angle between the two vectors is a significant measure of their similarity, the normalized interior product can be an appropriate measure of similarity:
Pearson correlation measure
The normalized Pearson correlation is defined as (with x̄ the mean characteristic value of x over all dimensions):
Extended Jaccard measure
The extended Jaccard measure was introduced by Strehl and Ghosh in 2000 and is defined as:
Dice coefficient measurement
The measure of the Dice coefficient is similar to the extended Jaccard measure and it is defined as follows: