An alternative concept to that of the distance is the similarity function s(x_i, x_j) that compares the two vectors x_i and x_j. This function should be symmetrical (namely s(x_i, x_j) =s(x_j, x_i)) and have a large value when x_i and x_j are somehow “similar” and constitute the largest value for identical vectors.
A similarity function where the target range is [0,1] is called a dichotomous similarity function. In fact, the methods for calculating the “distances” in the case of binary and nominal attributes may be considered as similarity functions, rather than distances.
When the angle between the two vectors is a meaningful measure of their similarity, the normalized inner product may be an appropriate similarity measure:
Pearson correlation measure
The normalized Pearson correlation is defined as (with x̄ the average feature value of x over all dimensions):
Extended Jaccard measure
The extended Jaccard measure was presented by Strehl and Ghosh in 2000 and it is defined as:
Dice coefficient measure
The dice coefficient measure is similar to the extended Jaccard measure and it is defined as: