\(\newcommand{\abs}[1]{\left\lvert#1\right\rvert}\) \(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\) \(\newcommand{\inner}[1]{\left\langle#1\right\rangle}\) \(\newcommand{\as}{\overset{a.s.}{\to}}\) \(\newcommand{\d}{\overset{d}{\to}}\) \(\DeclareMathOperator*{\argmin}{arg\,min}\) \(\DeclareMathOperator*{\argmax}{arg\,max}\) \(\DeclareMathOperator*{\E}{\mathbb{E}}\) \newtheorem*{lemma}{Lemma} %Stars mean no numbering
Add: Bandwidth Estimation, As properties, LC, LL
Nonparametric Density Estimation
-
For discrete $X$:
\(\displaystyle \; \hat{f}(x)=\frac{n_0}{n}=\frac{1}{h}\frac{\text{# of $x_i=x$}}{n}=\frac{1}{n}\sum_{i=1}^{n} \mathbb{1} \{x_i=x \}\) -
For continuous $X$:
\(\displaystyle \hat{f}(x)=\frac{1}{h}\frac{n_0}{n}=\frac{1}{h}\frac{\text{# of $x_i \in $}(x-h/2,x+h/2)}{n}=\frac{1}{hn}\sum_{i=1}^{n} \mathbb{1} \{\frac{x_i-x}{h} \}\)
Note: \(f(x)=\frac{d}{dx} F(x)={\displaystyle\lim_{h\to0}}\frac{F(x+h/2)-F(x-h/2)}{h}= {\displaystyle\lim_{h \to 0}}\frac{P(x-h/2< X < x+h/2)}{h}\) -
$\hat{f}(x)$ is not differentiable, use kernel, Rosenblatt (1952): \(\hat{f}(x)=\dfrac{1}{nh}\sum_{i=1}^{n} \mathbb{K}(\dfrac{x_i-x}{h})\)
(i) standard normal \(\mathbb{K}(\psi)=\frac{1}{\sqrt{2\pi}}\exp^{-\frac{1}{2}\psi^2}\)
(ii) uniform \(\mathbb{K}(\psi)=(2c)^{-1}, \text{ for } -c < \psi < c\) and $0$ o/w.
Bias and Variance of $\hat{f}$
Denote $\hat{f}=\frac{1}{n}\sum_{i=1}^{n}z_i$, where $z_i = \frac{1}{h}\mathbb{K}(\frac{x_i-x}{h})$.
- \(\displaystyle \E{\hat{f}(x)} = \E{z_1} = \frac{1}{h}\int_{x_1}\mathbb{K}(\underbrace{\frac{x_1-x}{h}}_{\equiv\psi})\underbrace{f(x_1)}_{unknown}dx_1 = \frac{1}{h} \int_{\psi}\mathbb{K}(\psi)f(x+h\psi)h d\psi\)
\(\approx \int_{\psi}\mathbb{K}(\psi)\big[f(x) + h\psi f^{(1)}(x) + \frac{h^2\psi^2}{2!} f^{(2)}(x) \big]d\psi\)
\(= f(x) \times 1 + hf^{(1)}(x) \times 0 + \frac{h^2}{2}f^{(2)}(x)\underbrace{\int_{\psi}\psi^2\mathbb{K}(\psi)d\psi}_{\equiv \mu_2}\)
\(= f(x) + \frac{h^2}{2}f^{(2)}(x)\mu_2\)
\(\displaystyle \text{BIAS}(\hat{f}(x)) = \frac{h^2}{2}f^{(2)}(x)\mu_2 = O(h^2)\)
\(\displaystyle \mathbb{V}(\hat{f}(x)) = \frac{1}{n}\mathbb{V}(z_1) = \frac{1}{n}\big[ \E{z_1^2-(\E{z_1})^2} \big] =\)
- \(\displaystyle \E{z_1^2}=\frac{1}{h^2}\int_{x_1}\mathbb{K}^2(\frac{x_1-x}{h})f(x_1)dx_1
=\frac{1}{h^2} \int_{\psi}\mathbb{K}^2(\psi)f(x+h\psi)hd\psi\)
\(\stackrel{\textit{Taylor}}{\approx} \frac{1}{h} \int_{\psi}\mathbb{K}^2(\psi)\big[ f(x) + h\psi f^{(1)}(x) \big] d\psi\\ =\frac{f(x)}{h}\int_{\psi}\mathbb{K}^2(\psi)d\psi + f^{(1)}(x)\int_{\psi}\psi\mathbb{K}^2(\psi)d\psi\)
-
\(\displaystyle \stackrel{Local}{\text{MSE}}(\hat{f}(x))=\text{BIAS}^2(\hat{f}(x)) + \mathbb{V}(\hat{f}(x))\)
\(= \frac{h^4}{4}(f^{(2)}(x))^2\mu_2^2 + \frac{f(x)}{nh}\int_{\psi}\mathbb{K}^2(\psi)d\psi\)
\(= h^4\lambda_1(x)+\frac{1}{nh}\lambda_2(x)\)- \[\displaystyle \lambda_1(x) \equiv \frac{1}{4}(f^{(2)}(x))^2\mu_2^2\]
- \[\displaystyle \lambda_2(x) \equiv f(x)\int_{\psi}\mathbb{K}^2(\psi)d\psi,\]
-
\(\displaystyle \stackrel{Global}{\text{IMSE}}(\hat{f}(x))= \int_{x}\stackrel{Local}{\text{MSE}}(\hat{f}(x))dx\)
\(= h^4\int_{x}\frac{1}{4}(f^{(2)}(x))^2\mu_2^2dx + \frac{1}{nh}\int_{x}f(x)\int_{\psi}\mathbb{K}^2(\psi)d\psi dx\)
\(= h^4\frac{1}{4}\mu_2^2\int_{x}(f^{(2)}(x))^2dx + \frac{1}{nh}\int_{\psi}\mathbb{K}^2(\psi)d\psi 1\)
\(= h^4\lambda_1+\frac{1}{nh}\lambda_2\)- \[\displaystyle \lambda_1 \equiv \frac{1}{4}\mu_2^2\int_{x}(f^{(2)}(x))^2dx \leftarrow unknown\]
- \[\displaystyle \lambda_2 \equiv \int_{\psi}\mathbb{K}^2(\psi)d\psi\]
-
Choose bandwidth \(h\) to minimize IMSE
\(\displaystyle \frac{\partial \text{IMSE}}{\partial h}= 4h^3 \lambda_1 - \frac{1}{nh^2}\lambda_2 =0 \rightarrow h_{opt}=n^{-1/5}(\frac{\lambda_2}{4\lambda_1})^{1/5} \propto n^{-1/5}\) -
Substitute \(h_{opt}\) into IMSE and minimize wrt to \(K(\psi)\) s.t. \(\int K{\psi} d\psi =1\) and \(\int \psi^2 K{\psi} = 1\) to obtain
\(\displaystyle K^{opt}(\psi)=\begin{cases} \frac{3}{4}(1-\psi^2), |\psi|\le 1 \\ 0, \text{ o/w} \end{cases}, \text{ Bartlett's (Epanechnikov's) Kernel}\)
Bandwidth Estimation
Note that $\lambda_1$ is unknown because of unknown $f(x)$, hence to obtain $h$ can use one of the following methods.
-
Ad-hoc Method
Assume $f(x) \sim N(0, \sigma_x^2)$. For $K(\psi)$ normal, it can be shown that $h_{opt}=1.06\sigma_x n^{-1/5}$ - Plug-in Method
Repeat the following loop until the difference becomes small \begin{enumerate}- Start with ad-hoc $h_{opt}$ and estimate $\hat{f}(x)$
- Calculate $\hat{\lambda}_1$ using $\hat{f}^{(2)}(x)$ instead of $f^{(2)}(x)$
- Use $\hat{\lambda}1$ to calculate new $\hat{h}{new}$
- Repeat the loop using the last $\hat{h}_{new}$ obtained \end{enumerate}
-
ISE (Cross-Validation) Method
Minimize ISE($h$) wrt $h$ (use grid search)\(\displaystyle \text{ISE}(h)= \int_{x} (\hat{f}(x) - f(x))^2 dx\) \(\displaystyle = \int_{x} \hat{f}^2(x) dx +\underbrace{\int_{x} f^2(x) dx}_{\text{can be dropped}} - 2\int_{x} \hat{f}(x)f(x) dx \\ = \int_{x} \hat{f}^2(x) dx - 2\E\hat{f}(x) = \int_{x} \hat{f}^2(x) dx - 2 \frac{1}{n}\sum_{i=1}^{n}\hat{f}(x_i)\)
- \[\hat{f}(x)=\dfrac{1}{nh}\sum_{i=1}^{n} \mathbb{K}(\dfrac{x_i-x}{h})\]
- \[\hat{f}^2(x)=\dfrac{1}{n^2h^2}\sum_{i=1}^{n}\sum_{j=1}^{n} \mathbb{K}(\dfrac{x_i-x}{h}) \mathbb{K}(\dfrac{x_j-x}{h})\]
\(\displaystyle \frac{1}{n^2h^2}\sum_{i=1}^{n}\sum_{j=1}^{n} \int_{x} \mathbb{K}(\frac{x_i-x}{h})\mathbb{K}(\dfrac{x_j-x}{h})dx - \frac{2}{n^2h}\sum_{i=1}^{n}\sum_{j=1}^{n} \mathbb{K}(\frac{x_i-x_j}{h})\\ =\frac{1}{n^2h^2}\sum_{i=1}^{n}\sum_{j=1}^{n} \int_{x} \mathbb{K}(\frac{x_i-x}{h})\mathbb{K}(\dfrac{x_j-x}{h})dx - \frac{2}{n(n-1)h} \underset{i\ne j}{\sum^{n}\sum^{n}} \mathbb{K}(\frac{x_i-x_j}{h})\)
Asymptotic Properties of $\hat{f}$
Under assumptions (A1), (A4), (A8) and (A9) we have
\(\displaystyle
\frac{1}{h} \E K^r \left( \frac{x_1-x}{h} \right) = \int K^r(\psi) f(h\psi +x) d\psi
\to f(x) \int K^r (\psi) d\psi \text{ as } n \to \infty\)
-
Asymptotic Mean \(\displaystyle \E \hat{f}(x) = \E z_1 = \frac{1}{h} \E K(\frac{x_i-x}{h}) \overset{Lemma}{\to} f(x) \int_{\psi} K(\psi) d \psi = f(x) \text{ as } n\to \infty\)
-
Asymptotic Variance \(\displaystyle \mathbb{V} (\hat{f}(x)) = \mathbb{V} (\frac{1}{n} \sum_{i=1}^{n} z_i)\) \(= \frac{1}{n} (\E z_1^2 -(\E z_1)^2) = \frac{1}{nh} \left[ \frac{1}{h} \E K^2(\frac{x_i-x}{h})\right] - \frac{1}{n}(\E z_1)^2\) \(nh \mathbb{V} (\hat{f}(x)) = \frac{1}{h}\E K^2(\frac{x_i-x}{h}) - h (\E z_1)^2 \overset{Lemma}{\to} f(x) \int K^2 (\psi) d\psi - 0 \cdot f^2(x)\) \(= f(x) \int K^2 (\psi) d\psi \text{ as } n\to \infty\)
-
Weak Consistency We have that MSE\((\hat{f}(x)) \to 0\) as \(n \to \infty\). Use Chebyshev inequality to show that \(\displaystyle P[|\hat{f} - f|\ge \epsilon] \le \frac{\E(\hat{f}-f)^2}{\epsilon^2}\), and hence \(\displaystyle P[|\hat{f} - f|\le \epsilon] \to 1\), i.e. \(\displaystyle p \lim_{n \to n} \hat{f} = f\)
-
Asymptotic Normality \(\displaystyle z \equiv \frac{\hat{f}(x) - \E \hat{f}(x)}{\sqrt{\mathbb{V} (\hat{f}(x))}}\) \(= \frac{\frac{1}{n} \sum_{i=1}^{n}(z_i - \E z_i)}{\sqrt{\frac{1}{n} \mathbb{V}(z_1)}}\) \(=\sum_{i=1}^{n} L_{n,i}, \text{ where } L_{n,i} \equiv \frac{1}{n} \frac{z_i - \E z_i}{\frac{1}{n} V(z_1)}\)
Lyapunov CLT
Let \(\{ X_{n,i}\}\) be a sequence of independent (not necessarily identically distributed) RVs, with \(\E X_{n,i} = \mu_{n,i}\) and \(\mathbb{V} (X_{n,i}) = \sigma^2_n < \infty\). Denote \(\displaystyle L_{n,i} \equiv \frac{X_{n,i} - \mu_{n,i}}{\sigma_n}\).
If for some \(\delta>0\) the condition \(\lim_{n \to \infty} \sum_{i=1}^{n} \E \abs{L_{n,i}}^{2+\delta}=0\) is satisfied, then \(\displaystyle \sum_{i=1}^{n} L _{n,i} \overset{d}{\to} \mathcal{N}(0,1)\)