Similarity function

Similarity function

An alternative concept to that of distance is the similarity function (cosine measure, correlation of Pearson, Extended Jaccard measure, measure of the Dice coefficient) s(x_i, x_j) which compares the two vectors x_i and x_j. This function should be symmetric (i.e. s(x_i, x_j) = s(x_j, x_i)) and have a large value when x_i and x_j are somehow "similar" and are the largest value for identical vectors.

A similarity function where the target range is [0,1] is called a dichotomous similarity function. In fact, the methods of calculating "distances" in the case of binary attributes and nominal can be viewed as similarity functions rather than distances.

Cosine measurement

When the angle between the two vectors is a significant measure of their similarity, the normalized interior product can be an appropriate measure of similarity:

Similarity function, cosine measurement

Pearson correlation measure

The normalized Pearson correlation is defined as (with x̄ the mean characteristic value of x over all dimensions):

Pearson correlation measure similarity function

Extended Jaccard measure

The extended Jaccard measure was introduced by Strehl and Ghosh in 2000 and is defined as:

Extended Jaccard measure similarity function

Dice coefficient measurement

The measure of the Dice coefficient is similar to the extended Jaccard measure and it is defined as follows:

Similarity function measurement of the coefficient of DIce