\(\newcommand{\abs}[1]{\left\lvert#1\right\rvert}\) \(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\) \(\newcommand{\as}{\overset{a.s.}{\to}}\) \(\DeclareMathOperator*{\E}{\mathbb{E}}\)

Classical vs High-dimensional


et $n$ be # of observations, $p$ be # of variables. The classical regime allows $n$ to diverge, but assumes $p$ fixed. In contrast, the high-dimensional regime permits both $n$ and $p$ to diverge, $ p/n \to \gamma > 0$. Many of the classical results break down in that case. Here I consider eigenvalues and eigenvectors of a high-dimensional covariance matrix. This has immediate implications for covariance estimation, but also for all the statistical tools based on covariance estimates: PCA, GLS, GMM, classification, portfolio optimization, etc.

Consider a simple case   $X_i \overset{iid}{\sim} \mathcal{N}_p(\mathbf{0}, \Sigma),\quad i=1,\ldots, n.$ How to estimate $\Sigma$?

Some notation:

Sample covariance estimator   $S = \frac{1}{n}\sum_i^n X_iX_i’ = \frac{1}{n} X’X.$ Eigendecompositions   $\Sigma = ULU’ = \sum_j^p \ell_j \mathrm{u}_j \mathrm{u}_j’, \quad S = V\Lambda V’ = \sum_j^p \lambda_j \mathrm{v}_j \mathrm{v}_j’.$ Eigenvalues distinct, sorted in decreasing order. Eigenvectors chosen with the first element positive.

Clasical Regime

In a classical regime, $S$ is a very good estimator (Anderson 1963, Van der Vaart 2000): Unbiased   $\E(S) = \Sigma.$
Consistent   $S \as \Sigma$ as $n\to\infty.$
Asymptotically normal eigenvalues   \(\sqrt{n}(\lambda_i-\ell_i) \overset{d}{\to} \mathcal{N}(0,2\ell_i^2), \quad j=1,\ldots,p.\)
Is invertible.

It gets trickier in high dimensions It is especially interesting what happens to eigenvalues and eigenvectors in high dimensions. There are three key features: eigenvalue spreading, eigenvalue bias and eigenvectors inconsistency.

High-dimensional Regime

Eigenvalue spreading

Marchenko-Pastur (1967)

In high dimensions, sample eigenvalues $\lambda_j$ are more spread out than their population counterparts $\ell_j.$ In fact, the higher the dimension, the more is the spreading.

Consider the case when $\Sigma = I_p,$ i.e. $\ell_1 = \ldots = \ell_p = 1,$ and $p/n \to \gamma \le 1.$

Empirical d’n of eigenvalues of sample covariance   \(F_p(x) := \frac{1}{p} \# \{ \lambda_j\le x \}\)

Ukranian mathematicians Marchenko & Pastur (MP) showed that this empirical d’n converges $F_p(x) \to F(x),$ with the limit pdf given by:

\[f^{MP}(x) = \frac{\sqrt{(\lambda_+-x)(x-\lambda_-)}}{2\pi x \gamma}, \quad \lambda_+ = (1+\sqrt{\gamma})^2, \quad \lambda_- = (1-\sqrt{\gamma})^2.\]
\[\begin{split} F(x) = & \frac{1}{2} + \frac{1}{2\pi \gamma} \Big[\sqrt{(\lambda_+-x)(x-\lambda_-)} \\ & + (1+\gamma)\arcsin(\frac{x-1-\gamma}{2\sqrt{\gamma}}) + (1-\gamma)\arcsin(\frac{(1-\gamma)^2-(1+\gamma)x}{2x\sqrt{\gamma}})\Big]. \end{split}\]