This document summarizes a lecture on optimization and neural networks from a course on artificial intelligence at UC Berkeley. It introduces gradient ascent as a method for optimizing logistic regression and neural networks by moving parameter weights in the direction of the gradient of the log likelihood objective function. Neural networks are presented as a generalization of logistic regression that can learn features from the data automatically through multiple hidden layers of nonlinear transformations rather than relying on hand-designed features. The universal function approximation theorem is discussed, stating that a neural network with enough hidden units can approximate any continuous function. Automatic differentiation is noted as a method for efficiently computing the gradients needed for backpropagation.