SGD with momentum adds a "momentum" term to SGD to smooth out noisy stochastic gradients. Adam combines momentum and RMSProp approaches, storing an exponentially decaying average of past gradients mt and past squared gradients vt. It uses biased-corrected estimates m̂t and v̂t in the update rule. Adadelta is an extension of Adagrad that restricts the window of past gradients to reduce its aggressive learning rate decay.