Generative models aim to learn a data distribution p(x|θ) from training samples. Three common ways to measure similarity between the model distribution p and the data distribution q are: 1) Kullback-Leibler (KL) divergence, which is used in maximum likelihood estimation. 2) Jensen-Shannon (JS) divergence, which is minimized during training of generative adversarial networks (GANs). 3) Optimal transport (OT) distance, such as the 1-Wasserstein distance, which provides a smooth measure of similarity and can be applied in the form of Wasserstein GANs.