Distance measurements for attributes of mixed type

Distance measurements for attributes of mixed type

Many partitioning methods use distance measures to determine the similarity or dissimilarity between any pair of objects (such as Distance Measures for attributes of mixed type). It is common to designate the distance between two instances x_i and x_j as: d (x_i, x_j). A valid distance measurement must be symmetrical and obtain its minimum value (usually zero) in the case of identical vectors. The distance measure is called the metric distance measure if it also satisfies the following properties:

Distance measurements for attributes of mixed type

In cases where the instances are characterized by attributes of mixed type, the distance can be calculated by combining different methods. For example, when calculating the distance between instances i and j using a metric such as Euclidean distance, we can calculate the difference between nominal and binary attributes like 0 or 1 ("match" or "mismatch", respectively), and the difference between numeric attributes as the difference between their normalized values. The square of each of these differences will be added to the total distance. Such a calculation is used in many clustering algorithms.

The dissimilarity d (x_i, x_j) between two instances, containing p attributes of mixed types, is defined as:

Distance measurements for attributes of mixed type

where the indicator δ = 0 if one of the values is missing. The contribution of attribute n to the distance between the two objects d ^ (n) is calculated according to its type.

If the attribute is binary or categorical:

Distance measurements for attributes of mixed type

If the attribute has a continuous value (where h goes through all non-missing objects for attribute n):

Distance measurements for attributes of mixed type

If the attribute is ordinal, the normalized values of the attribute are first calculated, then z_i, n is treated as a continuous value.

To share
en_GBEN
%d bloggers like this: