This document discusses various text processing techniques used in information retrieval systems, including tokenization, removal of stop words, normalization, stemming/lemmatization, and handling different languages. It provides examples and issues to consider for each technique. For example, it explains tokenization may involve splitting text at punctuation or deciding if "San Francisco" is one or two tokens. For normalization, it discusses techniques like case folding, handling accents, and creating equivalence classes. The document also compares stemming versus lemmatization and notes stemming is less accurate but requires less linguistic knowledge than lemmatization.