Contenus

Toggle## Similarity function

An alternative concept to that of distance is the similarity function (cosine measure, correlation of Pearson, Extended Jaccard measure, measure of the Dice coefficient) s(x_i, x_j) which compares the two vectors x_i and x_j. This function should be symmetric (i.e. s(x_i, x_j) = s(x_j, x_i)) and have a large value when x_i and x_j are somehow "similar" and are the largest value for identical vectors.

A similarity function where the target range is [0,1] is called a dichotomous similarity function. In fact, the methods of calculating "distances" in the case of binary attributes and nominal can be viewed as similarity functions rather than distances.

## Cosine measurement

When the angle between the two vectors is a significant measure of their similarity, the normalized interior product can be an appropriate measure of similarity:

## Pearson correlation measure

The normalized Pearson correlation is defined as (with x̄ the mean characteristic value of x over all dimensions):

## Extended Jaccard measure

The extended Jaccard measure was introduced by Strehl and Ghosh in 2000 and is defined as:

## Dice coefficient measurement

The measure of the Dice coefficient is similar to the extended Jaccard measure and it is defined as follows: