This document discusses n-gram models for natural language processing tasks. It begins by introducing n-gram models and how they are used to estimate conditional probabilities of words. Specifically, it explains how n-gram models make the assumption that the probability of a word depends only on the previous n words. The document then provides examples of unigram, bigram, and trigram probability computations. It also discusses techniques like smoothing and backoff that are used to address data sparsity issues with n-gram models. Finally, it briefly mentions some shortcomings of n-gram models.