# Similarity functions

An alternative concept to that of the distance is the similarity function s(x_i, x_j) that compares the two vectors x_i and x_j. This function should be symmetrical (namely s(x_i, x_j) =s(x_j, x_i)) and have a large value when x_i and x_j are somehow “similar” and constitute the largest value for identical vectors.

A similarity function where the target range is [0,1] is called a dichotomous similarity function. In fact, the methods for calculating the “distances” in the case of binary and nominal attributes may be considered as similarity functions, rather than distances.

## Cosine measure

When the angle between the two vectors is a meaningful measure of their similarity, the normalized inner product may be an appropriate similarity measure:

## Pearson correlation measure

The normalized Pearson correlation is defined as (with the average feature value of x over all dimensions):

## Extended Jaccard measure

The extended Jaccard measure was presented by Strehl and Ghosh in 2000 and it is defined as:

## Dice coefficient measure

The dice coefficient measure is similar to the extended Jaccard measure and it is defined as: