Keywords: support of distribution, forward KL, reverse KL, absolute continuity (greatly influence the properties of approximation)
Keywords: support of distribution, forward KL, reverse KL, absolute continuity (greatly influence the properties of approximation)
Keywords: AD algorithm, evaluation trace, Dual number
Key words: Ito operator, Property of trace, Fokker-Planck equation, L2 adjoint, Hilbert space.
I have to solve the GPU computing toolkits problem several times since different project frequently changes between Tensorflow and Tensorflow2. This tutorial concludes the details in setting up the GPU environment for Tensorflow. I won’t introduce how to set up python environment and anything about hardware in this blog. This tutorial is for Windows OS (The logic also works in other OSs, but the links will not be available anymore).
This note includes mean-field and structured variational families. Besides VI, mean-field and other variational family can also be used in the inference of probabilistic neural network.
Compared to MCMC, variational Inference is a faster method to approximate difficult-to-compute posterior distribution. Variational Inference is an optimization problem, while MCMC is an asymptotic method.
Metric Embedding plays an important role in unsupervised machine learning and algorithm analysis. Embedding methods are also considered as one of the most important methods in the design of approximation algorithms.
Programming is the basis for a wide range of fields. This blog summarized the sufficient conditions for strong duality. Moreover, it is a summery of Mathematical Programming lecture notes (David P. Williamson).
The parameterized function with similar training error widely diverge in the generalization performance. However, the flat minima may imply a low-complexity neural network structure. Some SGD methods have shown can converge to a flatter minima, which potentially make the solution of nonconvex optimization more robust. The first part of this note is a review of Flat minima( Hochreiter and Schmidhuber, 1997). The second part contains an introduction to Gradient Descent algorithms’ properties and visualization.
Update your browser to view this website correctly. Update my browser now