N-gram models are statistical language models that can be used for word prediction. They calculate the probability of a word based on the previous N-1 words. Higher N-gram models require more data but capture context better. N-gram probabilities are estimated by counting word sequences in a large training corpus and calculating conditional probabilities. These probabilities provide information about syntactic patterns and semantic relationships in language.