- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.