Main content area

A fast algorithm for computing distance correlation

Chaudhuri, Arin, Hu, Wenhao
Computational statistics & data analysis 2019 v.135 pp. 15-24
algorithms, covariance, data collection
Classical dependence measures such as Pearson correlation, Spearman’s ρ, and Kendall’s τ can detect only monotonic or linear dependence. To overcome these limitations, Székely et al. proposed distance covariance and its derived correlation. The distance covariance is a weighted L2 distance between the joint characteristic function and the product of marginal distributions; it is 0 if and only if two random vectors X and Y are independent. This measure can detect the presence of a dependence structure when the sample size is large enough. They further showed that the sample distance covariance can be calculated simply from modified Euclidean distances, which typically requires O(n2) cost, where n is the sample size. Quadratic computing time greatly limits the use of the distance covariance for large data. To calculate the sample distance covariance between two univariate random variables, a simple, exact O(nlog(n)) algorithms is developed. The proposed algorithm essentially consists of two sorting steps, so it is easy to implement. Empirical results show that the proposed algorithm is significantly faster than state-of-the-art methods. The algorithm’s speed will enable researchers to explore complicated dependence structures in large datasets.