Batch normalization is a technique introduced in 2015 by Google researchers to address issues like internal covariate shift and vanishing gradients. It works by normalizing the inputs to each unit to have zero mean and unit variance based on the statistics of the mini-batch. This helps the network train deeper models with higher learning rates and be less sensitive to initialization. Batch normalization is applied before the activation function of each layer during both training and inference.