
× This section is under development!

A

ssume that eigenvectors are normalized to unit $\ell_2$-norm and have a nonnegative first component. The sign convention guarantees uniqueness of the eigenvector associated with an eigenvalue with geometric multiplicity one.

# Vector norms

For $\mathrm{v} = (v_i)_{i=1}^n \in \mathbb{R}^n$.

$\norm{\mathrm{v}}_p = (\sum_{i}^n\abs{v_i}^p)^{1/p},$ hence it is a norm for $p=[1,\infty].$ $\norm{\mathrm{v}}_0 =$ # of nonzero elements. $\norm{\mathrm{v}}_\infty = \underset{1\le i \le n}{\max} \abs{v_i}$.

# Matrix norms

For $A \in \mathbb{R}^{m \times n}, r = rank(A) \le \min(m,n)$, SVs sorted $s_1 \ge s_2 \ge \ldots$.

### Operator/Spectral norm

$\norm{A} = \underset{x\in\mathbb{R}^n\backslash \{0\}}{\max} \frac{\norm{Ax}_2}{\norm{x}_2} = \underset{x\in S^{n-1}}{\max} \norm{Ax}_2 = \underset{x\in S^{n-1},\ y\in S^{m-1}}{\max} \inner{Ax,y} = s_1(A) = \norm{s}_\infty$

### Frobenius/Euclidean norm

$\norm{A}_F = \sqrt{\sum_i^m\sum_j^n \abs{A_{ij}}^2} = \sqrt{\sum_i^r s_i^2(A_{i})} = \sqrt{tr(A'A)} = \sqrt{\inner{A,A}} = \norm{s}_2$

$\norm{AA'}_F = \norm{A'A}_F \le \norm{A}_F^2$ $\norm{A+B}_F^2 = \norm{A}_F^2 + \norm{B}_F^2 + 2\inner{A+B}_F$ $\norm{A}_F \le \norm{A} _* \le \sqrt{r}\norm{A}_F$ if $A = U\Sigma V'$, then $\norm{A}_F^2 = \norm{\Sigma}_F^2$ $\norm{A}_F^2 = \sum_i^n \norm{\mathbb{a}_i}^2,$ where $\mathbb{a}_i$ are columns.

### HS norm

For $X,Y \in \mathbb{R}^{n\times n}$, the matrix inner product is $\inner{X,Y} = tr(XY')$, which induces the Hilbert-Schmidt norm $\norm{X}_{HS} = \sqrt{\inner{X,X}}$.

## Inequalities

$\norm{A} \le \norm{A}_F \le \sqrt{r} \norm{A}$. $s_i \le \frac{1}{\sqrt{i}} \norm{A}_F$. $\norm{s}_\infty \le \norm{s}_2 \le \sqrt{r} \norm{s} _\infty$.

To be added a lot more!

# Asymptotic Notation

For functions $f,g$:

• $f_n = \mathcal{O}(g_n)$ means $\exists C\in (0,\infty)$ s.t. $f_n \le Cg_n$.
• $f_n = \Omega(g_n)$ means $\exists C\in (0,\infty)$ s.t. $f_n \ge Cg_n$.
• $f_n = \Theta(g_n)$ means $f_n = \mathcal{O}(g_n)$ and $f_n = \Omega(g_n)$.
• Note: no $\cdot_p$ means $C$ is deterministic, nonrandom.

#### $\mathcal{O}_p$ and $\mathcal{o}_p$

$\boxed{X_n = \mathcal{O}_p(g_n) \Longleftrightarrow \forall \epsilon>0, \, \exists M>0 \; \text{ s.t. } \; \mathbb{P}(\abs{X_n/g_n}\ge M) < \epsilon}$ $\boxed{X_n = \mathcal{o}_p(g_n) \Longleftrightarrow \forall \epsilon>0 \quad \underset{n\to\infty}{\lim} \mathbb{P}(\abs{X_n/g_n}\ge \epsilon) = 0}$

if $X_n = \mathcal{O}_p(f_n)$ and $Y_n = \mathcal{O}_p(g_n)$:

• $X_n Y_n = \mathcal{O}_p(f_ng_n)$.
• $\abs{X_n}^s = \mathcal{O}_p(f_n^s), \quad s>0$.
• $X_n + Y_n = \max\{\mathcal{O}_p(f_n), \mathcal{O}_p(g_n)\}$.

if $X_n = \mathcal{o}_p(f_n)$ and $Y_n = \mathcal{o}_p(g_n)$:

• $X_n Y_n = \mathcal{o}_p(f_ng_n)$.
• $\abs{X_n}^s = \mathcal{o}_p(f_n^s), \quad s>0$.
• $X_n + Y_n = \max\{\mathcal{o}_p(f_n), \mathcal{o}_p(g_n)\}$.

if $X_n = \mathcal{o}_p(f_n)$ and $Y_n = \mathcal{O}_p(g_n)$:

• $X_n Y_n = \mathcal{o}_p(f_ng_n)$.
• $X_n + Y_n = \mathcal{O}_p(g_n)$.

## Continuous mapping Thm

Given $f: \mathbb{R}^k \to \mathbb{R}^m$ is “almost surely continuous”.

• $X_n \overset{d}{\to} X \Longrightarrow f(X_n) \overset{d}{\to} f(X)$.
• $X_n \overset{d}{\to} X \Longrightarrow f(X_n) \overset{p}{\to} f(X)$.
• $X_n \overset{d}{\to} X \Longrightarrow f(X_n) \overset{as}{\to} f(X)$.

## Slutsky’s Lemma

if $X_n \overset{d}{\to} X$ and $Y_n \overset{d}{\to} c$:

• $X_n + Y_n \overset{d}{\to} X + c$.
• $X_nY_n \overset{d}{\to} cX$.
• $X_n/Y_n \overset{d}{\to} X/c$.

## SVD

$\underset{n\times m}{A} = \sum_i^r s_i\mathrm{u}_i\mathrm{v}_i',$ where $r=rank(A)$.

$s_i(A) = \sqrt{\lambda_i(AA')} = \sqrt{\lambda_i(A'A)}$.

If $A$ is symmetric, then also: $s_i(A) = \abs{\lambda_i(A)}$

### Courant-Fisher min-max Thm

Courant–Fischer variational representation of max eigenvalue & eigenvector:

$\mathrm{v}_1(\widehat{\Sigma}) = \underset{\norm{z}_2 = 1}{\argmax} \; z'\widehat{\Sigma}z$

Alternative equivalent variational representation is in terms of the semidefinite program (SDP):

$Z^* = \underset{Z\in\mathbb{S}^p_+, \, tr(Z)=1}{\argmax} \, tr(\widehat{\Sigma}Z)$

### Johnson–Lindenstrauss

Given a set $Q$ of $q$ points in $\mathbb{R}^N$ with $N$ typically large, we would like to embed these points into a lower-dimensional Euclidean space $\mathbb{R}^n$ while approximately preserving the relative distances between any two of these points. The question is how small can we make $n$ (relative to $q$) and which types of embeddings work?

Given $\epsilon \in (0, 1),$ for every set $Q$ of $q$ points in $\mathbb{R}^N$, if $n$ is a positive integer such that $n>n_0 = O(\ln(q)/\epsilon^2)$, there exists a Lipschitz f’n $f: \mathbb{R}^N \to \mathbb{R}^n$ s.t.
$(1-\epsilon)\norm{u-v}^2_2 \le \norm{f(u) - f(v)}^2_2 \le (1+\epsilon)\norm{u-v}^2_2, \quad \forall \; u,v\in Q.$