Programming is the basis for a wide range of fields. This blog summarized the sufficient conditions for strong duality. Moreover, it is a summery of Mathematical Programming lecture notes (David P. Williamson).
Programming is the basis for a wide range of fields. This blog summarized the sufficient conditions for strong duality. Moreover, it is a summery of Mathematical Programming lecture notes (David P. Williamson).
The parameterized function with similar training error widely diverge in the generalization performance. However, the flat minima may imply a low-complexity neural network structure. Some SGD methods have shown can converge to a flatter minima, which potentially make the solution of nonconvex optimization more robust. The first part of this note is a review of Flat minima( Hochreiter and Schmidhuber, 1997). The second part contains an introduction to Gradient Descent algorithms’ properties and visualization.
Generally, stochastic approximation methods are a family of iterative methods. The goal of these algorithms is to recover some properties of a function depending on random variables. The application of stochastic approximation ranges from deep learning (e.g., SGD ) to online learning methods.
Update your browser to view this website correctly. Update my browser now