Newy-West adjustment provides a consistent estimation given autocorrelation in samples. The origin of Newy-West adjustment is closely related to Generalized Method of Moments. This blog includes the motivations, the definition and some properties of Generalized methods of Moments.
1. Motivations of GMM
Let $\Omega$ denote the set of sample points in the underlying probability space in some estimation problem. Let E denote the expectations operator. For a stochastic process ${x_n: n \ge 1}$ defined in this probability space, we can get a finite segment of one realization of this process, i.e., ${x_n(\omega_0): 1 \le n \le N}$. This sequence can be treated as the observable data sequence.
Let moments function $f(\cdot, \theta)$ : $R^p \xrightarrow{} R^q$ be a continuous and continuously differentiable measurable function of $\theta$ ($\in$ $C^1$). Intuitively, this function measures the difference between some representation of population and some interested values. To be more specific, the parameter $\theta$ is a $(p \times 1)$ vector, and we know $q$ moment based on the underlying distribution (or probability measure). Based on the knowledge of simple algebra, we know the system with $p$ parameters admit solution if we have $p$ constraints. However, we are facing up the situation where we have more constraints then the parameters.
Set the population moment conditions such that $E[f(x_i, \theta_0)] = 0$ (Note that $\theta_0$ is the solution of equations based on population moments, and it is a deterministic variable given information of population). The associated sample moments are given by:
$f_n(\theta) = \frac{1}{n} \sum_{i = 1}^nf(x_i, \theta)$
We want to estimate $\theta_0$ based on the solution of the equations $f_n(\theta) = 0$ (to be more specific the bias to the moments is zero). Note that we have $q - p$ additional moments (if $q > p$), and the remedy for this situation is call GMM, which was introduced by Hansen[1982]. Intuitively, if we cannot find a good enough $\theta$ to make sure the equation system $f_n(\theta) = 0$, then we should at least try to keep the sample moments are very close to zero. Then the strategy is to measure the distance between $f_n(\theta)$ and $0$.
2. The definition of GMM
Def:
Suppose function $f$ satisfied all the properties in section 1. We have an observed sample ${x_i: i = 1, 2, 3, \dots}$, and we want to estimate parameter vector $\theta$ ($p \times 1$) with true value $\theta_0$. Let $E[f(x_i, \theta_0)]$ denote a set of $q$ population moments and $f_n(\theta)$ (since the observations are given, we treated $\mathbb{x}$ as constant in this system) denote the associated sample counterparts. Define the criterion function $Q_n(\theta)$ as:
$Q_n(\theta) = f_n(\theta)^TW_nf_n(\theta)$
where $W_n$ is the weighting matrix, which converges to a positive definite matrix $W$ as n grows large. Then the GMM estimator of $\theta_0$ is given by
$\hat{\theta} = \argmin Q_n(\theta)$
Note the $Q_n(\theta)$ measure the distance between $\theta$ and $\theta_0$, since $f_n(\theta_0)$ should converge to zero with probability. In conclusion, we try to measure the distance between $\theta_0$ and $\theta$ in a metric space.
If $p = q$, then the GMM degenerates to MM.
3. The property of GMM
3.1 Assumptions
Given some assumptions, we can conclude that the GMM estimator is consistent and asymptotically normally distributed.
Assumption 1: we have more moments equations than parameters; the rank of Jacobian matrix of the moment equations evaluated at $\theta_0$ is at least p; $\theta_0$ is the unique solution of the moments equations system.
Assumption 2: The weak large number law holds, which means for any $\epsilon > 0$, we have $\lim \limits_{n \xrightarrow{} \infty }\mathbb{P}[|f_n(\theta_0) - f(\theta_0)| > \epsilon] = 0$.
Assumption 3: The sample moments should ensure central limit theorem holds, with a finite asymptotic covariance matrix $\frac{1}{n}F$.
3.2 The distribution of GMM estimator
The variance of GMM estimator is consistent and asymptotically normally distributed with asymptotic covariance matrix $V_{GMM}$
$V_{GMM} = \frac{1}{n} [G(\theta_0)^{T}WG(\theta_0)]^{-1}G(\theta_0)^{T}WFWG(\theta_0) [G(\theta_0)^{T}WG(\theta_0)]^{-1}$
where $G(\theta_0)$ is the Jacobian matrix of the population moment functions evaluated at the true parameter value $\theta_0$.
I give up to articulate the proof of this result due to the complexity (but I hope we can get some intuition from delta method) (See here for more details).
Note that the variance of GMM estimator depends on the choice of $W_n$. If we review the criterion function of GMM, we can deduce that this distance function is the sample moment’s error sum of squares. Note that, given the knowledge of conditions of a matrix ($\frac{\lambda_{max}}{\lambda_{min}}$) and Gaussian-Markov condition, it is make sense to normalize the errors in the moments by their variance. (Recall WLS and Hat Matrix )
3.3 The optimal weighting matrix
The optimal choice of the weighting matrix $W_n = F^{-1}$, the GMM estimator is a asymptotically efficient with covariance matrix:
$V_{GMM}^* = \frac{1}{n} [G(\theta_0)^TF^{-1}G(\theta_0)]^{-1}$
However, if we want to know matrix $F$ (the asymptotic covariance matrix of sample moments), we have to estimate $\theta$ first (Recall EM algorithm). Then we solve this circularity by adopting a multi-step method:
Step 1: Choose a sub-optimal weighting matrix ($I$ for example), this will give us a consistent estimation of $\theta$. Then we can estimate the matrix $F$.
Step 2: Use the new $F$, estimate $\theta_0$
Reference
- Zsochar, P. Short Introduction to the Generalized Method of Moments. HUNGARIAN STATISTICAL REVIEW.
- Hansen, L. (1982). Large Sample Properties of Generalized Method of Moments Estimators. Econometrica, 50(4), 1029. doi: 10.2307/1912775
Comments