Minkowski for numeric attributes
Many partitioning methods use measures of distance to determine the similarity or dissimilarity between any pair of objects (like Minkowski for numeric attributes). It is common to designate the distance between two instances x_i and x_j as: d (x_i, x_j). A valid distance measurement must be symmetrical and obtain its minimum value (usually zero) in the case of identical vectors. The distance measure is called the metric distance measure if it also satisfies the following properties:
Given two instances of dimension p, x_i = (x_i1, x_i2,…, X_ip) and x_j = (x_j1, x_2,…, X_jp), the distance between the two data instances can be calculated using the metric by Minkowski:
The Euclidean distance commonly used between two objects is reached when g = 2. Given g = 1, the sum of the absolute paraxial distances (Manhattan metric) is obtained, and with g = ∞ we obtain the greatest of the paraxial distances (metric of Chebychev).
The unit of measurement used can affect the clustering analysis. To avoid being dependent on the choice of units of measure, the data should be normalized. Standardization of measures attempts to give all variables equal weight. However, if each variable is assigned a weight according to its importance, the weighted distance can be calculated as follows: