Statistical Inference-Notes-Part2-Bayesian Method

Jan 10 2021

Keywords: Posterior distribution, Bayesian decision problems, James-Stein estimator, Stein’s Lemma

1. A review of Bayesian risk

The Bayesian risk of decision rule $d$ can be written as:

$r(\pi, d) = \int_{\Theta} R(\theta, d) \pi(\theta) d\theta = \int_{\Theta} \int_{\mathcal{X}} L(\theta, d(x)) f(x; \theta) dx d\theta$

$\quad = \int_{\Theta} \int_{\mathcal{X}} L(\theta, d(x)) f(x; \theta) dx \pi(\theta)d\theta = \int_{\Theta} \int_{\mathcal{X}} L(\theta, d(x)) f(x; \theta) \pi(\theta) dx d\theta$

$\quad = \int_{\mathcal{x}} f(x) \int_{\Theta} L(\theta, d(x)) \pi(\theta | x) d\theta dx$

Note that if $\theta$ is deterministic, $f(x; \theta) = f(x |\theta)$ by the definition of conditional probability (Indicator function for conditional expectation).

Since $f(x)$ is the same for every decision rule, it suffices to minimize the posterior loss, i.e. for each $x$ we choose $d(x)$ to minimize $\int_{\Theta} L(\theta, d(x)) \pi(\theta | x) d\theta$.

Some well-known decision rule is actually based on the action set and loss function. For example, for the mean-square loss function, the decision should be the mean of the posterior distribution. However, for an indicator form loss function, the optimal decision will be the MAP (in this framework, someone call the highest of posterior distribution HPD) estimator.

2. From Bayes rule to Minimax rule

in the part1 of these notes, we proved that if Bayes rule (or $\epsilon$ extended Bayes rule) is an equalizer decision rule, then this decision rule is minimax. (Equalizer is the sufficient condition of minimax principle).

Example 1: Find a minimax estimator of $\theta$ based on a single observation $X \sim \text{Bin}(n, \theta)$ with n known, under squared error loss $L(\theta, d) = (\theta - d)^2$

Solution: Although the space of prior distribution is all admissible, let’s select the conjugate distribution as the prior distribution. The conjugate distribution of Binomial distribution is Beta distribution (for multinomial distribution, the conjugate distribution is Dirichlet distribution). We can use the conclusion that the Bayesian estimator is the mean of the posterior distribution given squared error loss.

The Bayesian risk can be written as $\int_{\theta} (\theta - d(X))^2 \pi(\theta | X) d\theta$. Let $d_1$ denote the $d(x) = \int_{\theta} \theta \pi(\theta | X) d\theta$, and $d_2$ denote any other decision rule which $\neq d_1$.

$r(\pi, d_1) - r(\pi, d_2) = \mathbb{E}_{X|\theta}[(\bar{\theta} - \bar{\theta})^2] - \mathbb{E}_{X|\theta}[(\bar{\theta} - d_2)^2] = - \mathbb{E}_{X|\theta}[(\bar{\theta} - d_2)^2] \leq 0$.

Therefore, we can never find a better decision rule apart from the mean of the $\theta$ given mean-error loss function.

Given the conjugate prior, we can conclude that $\bar{\theta} = \frac{a + \sum x_i}{a + b + n}$ ($x_i = 0$ or $1$). We want to find a prior which make the decision rules an equalizer decision. Let $C$ denote $a + b + n$

The risk function can be written as (recall the definition of the equalizer decision rule):

$\mathbb{E}[(\frac{a + X}{c} - \theta)^2] = \frac{1}{c^2}\mathbb{E}[(X + a - c\theta)^2]$

$\quad = \frac{1}{c^2}(n\theta(1-\theta) + n^2 \theta^2 + 2n\theta (a - c\theta) + (a - c\theta)^2$

It is obvious that it is of the quadratic form of $\theta$. We need to select proper $a$ and $b$ to make the coefficient of $\theta$ and $\theta^2$ be zero. Therefore, the result can be written as $d(x) = \frac{\sum x_i + \sqrt{n}/2}{n + \sqrt{n}/2}$

Example 2: A review of tank problem. The tanks are identified by the ID range from 1 to an unknown number N. We observe a tank whose ID is k. What is the minimax estimation of N? The loss function is $(N - d(k))^2$.

With the same logic, the Bayesian estimator should be $\hat{d(k)} = \sum_{N=1}^{+\infty} N \text{Pr} (N |k)$. We need to find a good enough prior distribution to make sure the expectation converge. The risk function can be written $\mathbb{E}[(N - d(x))^2|N] = N^2 - 2N \mathbb{E}(d(x)|N) + \mathbb{E}(d(x)^2|N)$. Let’s assume the number of tanks is uniformly distributed over $[k, \Omega]$, we want to find the proper $\Omega$ to make the decision rule minimax.

Therefore, the decision rule can be written as $d(k) = \sum_{N=k}^{\Omega} N\frac{\text{Pr}(k|N) \text{Pr}(N)}{\text{Pr}(k)}$. Note that $\text{Pr}(k) = \sum_{N=k}^{\Omega} \text{Pr}(k|N) \text{Pr}(N) = \sum_{N=k}^{\Omega} \frac{1}{N} \frac{1}{\Omega - k}$.

Therefore $d(k) = \frac{\Omega - k}{\sum_{N=k}^{\Omega} \frac{1}{N}}$. The risk function can be written as:

$N^2 - \sum_{k=1}^N \frac{2(\Omega - k)}{\sum_{i=k}^\Omega \frac{1}{i}} + \sum_{k=1}^N \frac{(\Omega - k)^2}{(\sum_{i=k}^\Omega \frac{1}{i})^2N}$, which is almost impossible to solve the $\Omega$ analytically. Therefore, let’s consider a numerical methodology. We want to find the $\Omega$ that maximize the Bayesian risk. The classic method is the use the frequentist’s view to construct the unbiased sufficient estimator. We will further discuss this problem in the following notes.

#Statistical Inference

Statistical Inference-Notes-Part2-Bayesian Method

1. A review of Bayesian risk

2. From Bayes rule to Minimax rule

Comments

Your browser is out-of-date!