This document summarizes a survey on model compression techniques for large language models. It describes several common compression methods like pruning, knowledge distillation, low-rank factorization, and quantization. Pruning techniques include unstructured and structured pruning to remove unnecessary model parameters. Knowledge distillation transfers knowledge from a large teacher model to a smaller student model. Quantization converts floating point numbers to integers to reduce model size and computations. The document analyzes these methods and their impact on metrics like model size, compression ratio, and inference speed.