Hopkins statistic

Hopkins statistic

Before grouping together a dataset, we can test if there really are clusters. We need to test the hypothesis of the existence of patterns in the data against a uniformly distributed dataset (homogeneous distribution).

The Hopkins statistic is calculated as follows:

  1. Sample n points (p_i) from the dataset (D) uniformly and calculate the distance to their nearest neighbor (d (p_i))
  2. Generate n points (q_i) evenly distributed in the space of the dataset and calculate their distance to the nearest neighbors in D (d (q_i))
  3. Calculate the quotient H:
Hopkins statistic

If the data is evenly distributed, the value of H will be approximately 0.5.

Hopkins statistic
To share