Before clustering a dataset we can test if there are actually clusters. We have to test the hypothesis of the existence of patterns in the data versus a dataset uniformly distributed (homogeneous distribution).
The Hopkins statistic is computed as follows:
- Sample n points (p_i) from the dataset (D) uniformly and compute the distance to their nearest neighbor (d(p_i))
- Generate n points (q_i) uniformly distributed in the space of the dataset and compute their distance to nearest neighbors in D (d(q_i))
- Compute the quotient H:
If data are uniformly distributed the value of H will be around 0.5.