Stochastic gradient descent (SGD) is an iterative optimization algorithm used in machine learning to minimize a cost function by making adjustments to a model's parameters after each training sample, thereby facilitating convergence towards a global minimum. It is faster and requires less computational resources compared to traditional gradient descent methods, although it can be noisy and less stable due to frequent updates. Various approaches like dynamic sampling and gradient aggregation can help reduce noise in SGD.