Distance measurements for binary attributes

Distance measurements for binary attributes

Many partitioning methods use measures of distance to determine the similarity or dissimilarity between any pair of objects (such as binary attributes). It is common to designate the distance between two instances x_i and x_j as: d (x_i, x_j). A valid distance measurement must be symmetrical and obtain its minimum value (usually zero) in the case of identical vectors. The distance measure is called the metric distance measure if it also satisfies the following properties:

Distance measurements for binary attributes

In the case of binary attributes, the distance between objects can be calculated based on a contingency table. A binary attribute is symmetrical if its two states have the same value. In this case, using the simple matching coefficient can assess the dissimilarity between two objects:

Distance measurements for binary attributes

where q is the number of attributes equal to 1 for the two objects; t is the number of attributes equal to 0 for the two objects; and s and r are the number of attributes that are unequal for the two objects.

A binary attribute is asymmetric, if its states are not equally important (the positive result is usually considered more important). In this case, the denominator ignores unimportant negative matches (t). This is called the Jaccard coefficient:

Distance measurements for binary attributes
To share
en_GBEN
%d bloggers like this: