Many clustering methods use distance measures to determine the similarity or dissimilarity between any pair of objects. It is useful to denote the distance between two instances x_i and x_j as: d(x_i,x_j). A valid distance measure should be symmetric and obtains its minimum value (usually zero) in case of identical vectors. The distance measure is called a metric distance measure if it also satisfies the following properties:
When the attributes are nominal, two main approaches may be used:
- Simple matching
where p is the total number of attributes and m is the number of matches.
2. Creating a binary attribute for each state of each nominal attribute and computing their dissimilarity.