Kalman Filter-Notes-Part1-Discrete Kalman Filter

R.E Kalman published a paper about a recursive solution to a discrete-data linear filtering problem. The Kalman filter provided us a computational method to estimate unobservable variables through minimization the posterior error. Moreover, Kalman filter enabled us to obtain the consistent parameter estimates by maximizing the likelihood function of error innovations.

1. The basis of discrete Kalman filter

Generally, Kalman filter solved the problem of estimating the unobservable state $x \in \mathcal{R}^n$ of a discrete time control process. The dynamic of the unobservable state’s ($n \times 1 $ vector) can be written as:

$x_k = Ax_{k - 1} + Bu_{k - 1}+w_{k - 1}$

The measurement variable $z_k$ follows a linear transformation of unobservable states with an linearly additive noise term $v_k$:

$z_k = Hx_k+v_k$

Note that we assume the probability measure of $w$ and $v$ follow Gaussian distribution with zero mean and covariance matrix Q , and R, which are both independent of each other and white.

I will use the same symbols as Greg Welch and Gary Bishop’s: An introduction to Kalman filter to show you the computational origin of Kalman filter.

Notations:

$\hat{x_k^{-}} \in \mathcal{R}^n$ is the estimation of unobservable states $x$ at time $k$ under the filtration $\mathcal{F}_{t-1}$.

$\hat{x_k}$ is the posteriori state estimate under the information at time $k$ (with the help of $z_k$).

Priori estimate error $e^{-}_{k}$: $x_k - \hat{x_k^{-}}$.

Posteriori estimate error $e_{k}$: $x_k - \hat{x_k}$.

Priori estimate error covariance matrix $P^{-}_{k} = \mathbb{E}[e^{-}_ke^{-T}_k]$

Posteriori estimate error covariance matrix $P_{k} = \mathbb{E}[e_ke^{T}_k]$

We assume the relationship between the posteriori estimate and the priori estimate follows:

$\hat{x_k} = \hat{x^{-}_k} + K_k(z_k - H\hat{x^{-}_k})$

Remark: If you know something about Gaussian distribution, you may find this is the BUE given observed states if we assume the noises is Gaussian distributed (it is a result of joint Gaussian distribution). I firmly believe this assumption is inspired from a probability origin. $z_k - H\hat{x^{-}_k}$ are related to many frightening names like measure residual or innovation.

As we need to minimize the posteriori error of each entry of the state vector, we should minimize the trace of $P_k$. Note that in the classical Kalman filter, the transition matrix of the system is deterministic. We only need to consider some incontrollable factors like measurement error and transition error.

$\hat{x_k} = \hat{x^{-}_k} + K(z_k - Hx^{-}_k) \leftrightarrow \hat{x_k} = \hat{x^{-}_k} + K(Hx_k + v_t - Hx^{-}_k)$

$\leftrightarrow x_k - \hat{x_k} = x_k - \hat{x^{-}_k} - K(Hx_k + v_t - Hx^{-}_k)$

$\leftrightarrow P_k = (I - K_kH_k)P_k^-(I - K_kH_k)^T+K_kR_kK_k^T$ (*)

Let’s review some tricks in matrix derivation. You can see more details about layout with wiki.

$\frac{d(tr(AB))}{dA} = B^T$

$\frac{d(tr(ACA^T))}{dA} = 2AC$

(It is trivial if you know $tr(AB) = \sum_i\sum_j a_{ij}b_{ji}$)

We can get the optimal $K_k$ (the Kalman Gains):

$K_k = P^{-}_kH^T(HP^{-}_kH^T+R)^{-1}$

With the Kalman gain and (*), we can get the posteriori estimates of state’s posteriori covariance matrix $P_k$.

$P_k = (I - K_kH_k)P_k^-$

Note that if $\lim \limits{R \xrightarrow{} 0}$ (there is no error in measurement), then the Kalman gain get the maximum value $H^{-1}$. If $\lim \limits{P_k^- \xrightarrow{} 0}$, then the Kalman gain get the minimum value 0. You can see the adaptiveness of the methods here. If we are more confidence with the transition equation of unobservable state, we should allocate relatively more weight on the priori estimates; If we are more confident with the measurement of the observable equation, then we should make the posteriori estimate more close to the observable states.

With the best estimated posteriori estimates of unobservable state, we can update the new priori estimate of unobservable variables.

$P_k^- = AP_kA^T + Q$

$\hat{x_k^-} = A\hat{x_{k-1}} + Bu_{k-1}$

Remark: In the system of Kalman filter, we know exactly the transition of unobservable states. In this situation, the Kalman filter found a solution to estimate the unobservable states $x_k$ based on information at time k. If we know nothing about the transition of unobservable states (unknown A for example), we can maximize the likelihood function based on the measurement function.

2. Discussions about the initial values

Unfortunately, we need to input four tricky initial values to the Kalman filter system: $x_0^-$, $P_0^-$, R, and Q. The more unfortunate fact is that KF is sensitive to the initial values. Based on this fact, we have to adjust hyperparameters for our model. The best statement of adjustment of initial values I have ever met is that the innovation ($z_k - H\hat{x_k^-}$) should be a white noise with zero mean if KF works well (See Optimal State Estimation Kalman-H-and Nonlinear Approaches). We have to test different hyperparameters to help the model make sense.

Thank you for reading my blog! KF is a big topic and I will dig deeper in the next parts of this Note.

Kalman Filter-Projects-Part1-N factors Gaussian model Stochastic Approximation Methods-Notes-Part 1-An Overview and Robbins Monro Algorithm

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×