Keywords: linear operator, compact linear operator, Mercer’s theorem, eigenfunction, foundation of kernel trick
1. Eigenfunctions
An eigenfunctions of a linear operator $D$ defined on some function space is any non-zero function $f$ in that space that when acted upon by D, is equivalent to multiplied by some scaling factor called an eigenvalue, i.e.
$Df = \lambda f$
Note that the boundary conditions may greatly limit the value of $\lambda$.
The all possible values of $D$ is sometimes called its spectrum.
Moreover, given a inner-product function space, whose inner product is defined as:
$< f, g > = \int_{\Omega} f^{*}(t)g(t)dt$ ($*$ means complex conjugate)
Suppose the function space has an orthonormal basis given by the set of functions $\{u_1(t), u_2(t), u_3(t), \dots\}$
$< u_i, u_j > = \delta_{i, j}$ (Kronecker delta), such that for any functions can be written as a a linear combination of the basis functions:
$f(t) = \sum_{j = 1}^n b_j u_j(t)$ (The infinite dimension version should be written as integral, e.g. Fourier series)
Define a matrix representation of the linear operator D with elements
$A_{i, j} = < u_i, Du_j > = \int_{\Omega} u^{*}_i(t) D u_{j}(t) dt$
We can write the function $Df(t)$ either on $\{u_i\}$ or $\{Du_i\}$, i.e.
$Df(t) = \sum_{j = 1}^n c_j u_j(t) = \sum_{j = 1}^n b_j Du_j(t)$
Taking the inner product of each side of the above equation with an arbitrary basis function:
$< Df, Df >_i = \sum_{j = 1}^n c_{j} \int_{\Omega} u^{*}_i(t)u^{*}_j(t)dt = \sum_{j = 1}^n c_{j} \delta_{i, j} = c_i = \sum_{j = 1}^n b_{j} \int_{\Omega} u^{*}_i(t)Du^{*}_j(t)dt = \sum_{j = 1}^n b_j A_{i, j}$
2. Mercer’s Theorem
Suppose K is a continuous symmetric non-negative definite kernel. Then there is an orthonormal basis $\{e_i\}$ of $L^2[a, b]$ consisting of eigenfunctions of $T_k$ such that the corresponding sequence of eigenvalues $\{\lambda_j\}$ is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on $[a, b]$ and K has the representation $K(t, s) = \sum_{j = 1}^{\infty} \lambda_j e_j(s) e_j(t)$
where the convergence is absolute and uniform.
Note that $[T_{K}\phi](x) = \int_{a}^b K(x, s)\phi(s) ds$ $T_{K}$ is defined as a linear operator.
$[T_{K}(\phi_1 + \phi_2)](x) = \int_{a}^b K(x, s)(\phi_1(s) + \phi_2(s))ds = \int_{a}^b K(x, s) \phi_1(s) ds + \int_{a}^b K(x, s) \phi_2(s) ds$
In another word, the kernel function should be able to written as:
$K_{i, j} = \phi(x_i)^T \phi(x_j)$
$\phi$ is also called basis function, the state space of the mapping function can be a $\mathbb{R}^n$ space.
2.1 Mercer Kernels
Let $\{x_1, x_2, \dots, x_n\}$ be a finite set of n samples from $\mathcal{X}$. The Gram matrix of X is defined as $\mathbf{K}(X; \kappa) \in \mathbf{R}^{n \times n}$ or $\mathbf{K}$ for short, such that $(\mathbf{K})_{ij} = \kappa(x_i, x_j)$. If $\forall X \subset \mathcal{X}$, the matrix $\mathbf{K}$ is positive definite, $\kappa$ is called a Mercer Kernel, or a positive definite kernel. (Note that Mercer kernel is symmetric by definition)
According to Mercer’s theorem, the kernel should be able to be written as $\kappa(x, x^\prime) = \phi(x)^T \phi(x^\prime)$. We can also easily get the basis function through the construction of the kernel function.
Comments