Session S17 - Stochastic Systems: Analysis, Numerics and Applications
Friday, July 16, 12:40 ~ 13:15 UTC-3
Yule's "nonsense correlation" for Gaussian random walks
Frederi Viens
Michigan State University, United States - This email address is being protected from spambots. You need JavaScript enabled to view it.
We provide an exact formula for the second moment of the empirical correlation of two independent Gaussian random walks, as well as implicit formulas for higher moments, and some practical theorems and numerical results in the in-fill-asymptotics regime.
This empirical correlation $\rho_n$, defined for two related series of data of length $n$ using the standard Pearson correlation statistic which is appropriate for i.i.d. data with two moments, is known as Yule's "nonsense correlation" in honor of the statistician G. Udny Yule who described in 1926 the phenomenon by which random walks and other time series are not appropriate for use in this statistic to gauge independence of data series. He observed empirically that its distribution is not concentrated around 0 but diffuse over the entire interval $(-1,1)$. This well-documented effect was roundly ignored by many scientists over the decades, up to the present day, even sparking recent controversies in important areas like climate-change attribution. Since the 1960s, probability theorists wanted to close any possible ambiguity about the issue by computing the variance of the continuous-time version $\rho$ of Yule's nonsense correlation, based on the paths of two independent Brownian motions. This problem eluded the best minds until it was finally closed by Philip Ernst and two co-authors 90 years after Yule's observation, in a paper published in 2017 in the Annals of Statistics. The more practical question of what happens with $\rho_n$ in discrete time remained, which we address here by computing its moments in the case of Gaussian data, the second moment being explicit. We also relate $\rho$ and $\rho_n$, by estimating the speed of convergence of the second moment of its difference, which we find tends to zero at the rate $1/n^2$, an important result in practice since it could help justify using statistical properties of $\rho$ when devising tests for pairs of time series of moderate length.
In this presentation, we provide ideas of the proofs of our results, based on a symbolically tractable integro-differential representation formula for the moments of any order in a class of empirical correlations, which were first established and investigated in the aforementioned paper by Ernst et al., and a 2019 arXiv preprint by Ernst, L.C.G. Rogers, and Quan Zhou. It is only because we succeeded in computing moment generating functions of the various objects used in defining $\rho_n$ that we succeeded in estimating the speed of convergence of its variance. We conjecture that the speed $1/n^2$ applies because of the random-walk structure (independence of increments), while for other types of time series, such as mean-reverting ones, the speed increases to $1/n$.
This work is partially supported by the US National Science Foundation award DMS-1811779.
Joint work with Philip Ernst (Rice University, Houston, TX, USA) and Dongzhou Huang (Rice University, Houston, TX, USA).