Contents
ToggleMinkowski for numeric attributes
Many methods of partitioning use distance measures to determine the similarity or dissimilarity between any pair of objects (like Minkowski for numeric attributes). It is common to denote the distance between two instances x_i and x_j as: d(x_i, x_j). A valid distance measure must be symmetric and obtains its minimum value (usually zero) in the case of identical vectors. The distance measure is called a metric distance measure if it also satisfies the following properties:
Given two instances of dimension p, x_i = (x_i1, x_i2,…, X_ip) and x_j = (x_j1, x_2,…, X_jp), the distance between the two data instances can be calculated using the metric by Minkowski:
The Euclidean distance commonly used between two objects is reached when g = 2. Given g = 1, the sum of the absolute paraxial distances (Manhattan metric) is obtained, and with g = ∞ we obtain the greatest of the paraxial distances (metric of Chebychev).
The unit of measurement used can affect the clustering analysis. To avoid being dependent on the choice of units of measure, the data should be normalized. Standardization of measures attempts to give all variables equal weight. However, if each variable is assigned a weight according to its importance, the weighted distance can be calculated as follows: