Gradient descent is an optimization algorithm used in supervised learning problems to minimize a loss function. It works by taking steps in the negative direction of the gradient of the loss function with respect to the weights at each iteration. Stochastic gradient descent is a variant that updates the weights for each training example, rather than for the full batch. The perceptron is an algorithm that updates the weights only when the prediction is incorrect. It converges when the weights correctly classify all training examples.