\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\) \(\DeclareMathOperator*{\E}{\mathbb{E}}\)

**This section is under development!**

# Classical vs High-dimensional

L

et $n$ be # of observations, $p$ be # of variables.
The **classical** regime allows $n$ to diverge, but assumes $p$ fixed. In contrast, the **high-dimensional** regime permits both $n$ and $p$ to diverge, $ p/n \to \gamma > 0$. Many of the classical results break down in that case. Here I consider eigenvalues and eigenvectors of a high-dimensional covariance matrix. This has immediate implications for covariance estimation, but also for all the statistical tools based on covariance estimates: PCA, GLS, GMM, classification, portfolio optimization, etc.

Consider a simple case $X_i \overset{iid}{\sim} \mathcal{N}_p(\mathbf{0}, \Sigma),\quad i=1,\ldots, n.$ How to estimate $\Sigma$?

Some notation:

Sample covariance estimator $S = \frac{1}{n}\sum_i^n X_iX_i’ = \frac{1}{n} X’X.$ Eigendecompositions $\Sigma = ULU’ = \sum_j^p \ell_j \mathrm{u}_j \mathrm{u}_j’, \quad S = V\Lambda V’ = \sum_j^p \lambda_j \mathrm{v}_j \mathrm{v}_j’.$ Eigenvalues distinct, sorted in decreasing order. Eigenvectors chosen with the first element positive.

# Clasical Regime

In a classical regime, $S$ is a very good estimator (Anderson 1963, Van der Vaart 2000):
Unbiased $\E(S) = \Sigma.$

Asymptotically normal eigenvalues \(\sqrt{n}(\lambda_i-\ell_i) \overset{d}{\to} \mathcal{N}(0,2\ell_i^2), \quad j=1,\ldots,p.\)

*None of this holds in high dimensions*
It is especially interesting what happens to eigenvalues and eigenvectors in high dimensions. There are three key features: **eigenvalue spreading**, **eigenvalue bias** and **eigenvectors inconsistency**.

# High-dimensional Regime

## Eigenvalue spreading

#### Marchenko-Pastur (1967)

In high dimensions, sample eigenvalues $\lambda_j$ are more spread out than their population counterparts $\ell_j.$ In fact, the higher the dimension, the more is the spreading.

Consider the case when $\Sigma = I_p,$ i.e. $\ell_1 = \ldots = \ell_p = 1,$ and $p/n \to \gamma \le 1.$

Empirical d’n of eigenvalues of sample covariance \(F_p(x) := \frac{1}{p} \# \{ \lambda_j\le x \}\)

Ukranian mathematicians Marchenko & Pastur (MP) showed that this empirical d’n converges $F_p(x) \to F(x),$ with the limit pdf given by: