Thoughts and Theory
AdaHessian: a second order optimizer for deep learning
Published in
6 min readAug 9, 2021
Most of the optimizers used in deep learning are (stochastic) gradient descent methods. They only consider the gradient of the loss function. In comparison, second order methods also take the curvature of the loss function into account. With that better update steps can be computed (at least in theory). There are only a few second order…