Similarity functions

An alternative concept to that of the distance is the similarity function s(x_i, x_j) that compares the two vectors x_i and x_j. This function should be symmetrical (namely s(x_i, x_j) =s(x_j, x_i)) and have a large value when x_i and x_j are somehow “similar” and constitute the largest value for identical vectors.

A similarity function where the target range is [0,1] is called a dichotomous similarity function. In fact, the methods for calculating the “distances” in the case of binary and nominal attributes may be considered as similarity functions, rather than distances.

Cosine measure

When the angle between the two vectors is a meaningful measure of their similarity, the normalized inner product may be an appropriate similarity measure:

Pearson correlation measure

The normalized Pearson correlation is defined as (with the average feature value of x over all dimensions):

Extended Jaccard measure

The extended Jaccard measure was presented by Strehl and Ghosh in 2000 and it is defined as:

Dice coefficient measure

The dice coefficient measure is similar to the extended Jaccard measure and it is defined as: