Notes

The parameterized function with similar training error widely diverge in the generalization performance. However, the flat minima may imply a low-complexity neural network structure. Some SGD methods have shown can converge to a flatter minima, which potentially make the solution of nonconvex optimization more robust. The first part of this note is a review of Flat minima( Hochreiter and Schmidhuber, 1997). The second part contains an introduction to Gradient Descent algorithms’ properties and visualization.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×