Distance measures for nominal attributes

Many clustering methods use distance measures to determine the similarity or dissimilarity between any pair of objects. It is useful to denote the distance between two instances x_i and x_j as: d(x_i,x_j). A valid distance measure should be symmetric and obtains its minimum value (usually zero) in case of identical vectors. The distance measure is called a metric distance measure if it also satisfies the following properties:

When the attributes are nominal, two main approaches may be used:

  1. Simple matching

where p is the total number of attributes and m is the number of matches.

2. Creating a binary attribute for each state of each nominal attribute and computing their dissimilarity.