Distance measurements for binary attributes
Many methods of partitioning use distance measures to determine the similarity or dissimilarity between any pair of objects (like binary attributes). It is common to denote the distance between two instances x_i and x_j as: d(x_i, x_j). A valid distance measure must be symmetric and obtains its minimum value (usually zero) in the case of identical vectors. The distance measure is called a metric distance measure if it also satisfies the following properties:
In the case of binary attributes, the distance between objects can be calculated based on a contingency table. A binary attribute is symmetrical if its two states have the same value. In this case, using the simple matching coefficient can assess the dissimilarity between two objects:
where q is the number of attributes equal to 1 for the two objects; t is the number of attributes equal to 0 for the two objects; and s and r are the number of attributes that are unequal for the two objects.
A binary attribute is asymmetric, if its states are not equally important (the positive result is generally considered more important). In this case, the denominator ignores unimportant negative matches (t). This is called the coefficient of Jaccard :