1. Batch normalization normalizes the activations in a network on a mini-batch basis. It standardizes the inputs to each layer by subtracting the batch mean and dividing by the batch standard deviation. 2. During training, the batch mean and variance are used to normalize the activations for each batch. Additionally, a scale gamma and shift beta are learned to scale and shift the normalized values. 3. At test time, the mean and variance of the entire training dataset are used for normalization instead of the batch statistics. This prevents dependency on batch size at test time.