Lex & Yacc
Prepared by:
Taha Malampattiwala 140050107052
Outline
• Lex :
• Theory
• Execution
• Example
• Yacc :
• Theory
• Description
• Example
• Lex & Yacc linking
2
Lex
• lex is a program (generator) that generates lexical analyzers,
(widely used on Unix).
• It is mostly used with Yacc parser generator.
• Written by Eric Schmidt and Mike Lesk.
• It reads the input stream (specifying the lexical analyzer ) and
outputs source code implementing the lexical analyzer in the C
programming language.
• Lex will read patterns (regular expressions); then produces C
code for a lexical analyzer that scans for identifiers.
3
Lex
• A simple pattern: letter(letter|digit)*
• Regular expressions are translated by lex to a computer program that mimics
an FSA.
• This pattern matches a string of characters that begins with a single letter
followed by zero or more letters or digits.
4
Lex
• Some limitations, Lex cannot be used to recognize nested structures such as
parentheses, since it only has states and transitions between states.
• So, Lex is good at pattern matching, while Yacc is for more challenging tasks.
5
Lex
• The input structure to Lex
……..Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
6
7Operators
“  [ ] ^ - ? . * + | ( ) $ / { } % < >
• If they are to be used as text characters, an escape
should be used
$ = “$”
 = “”
• Every character but blank, tab (t), newline (n) and
the list above is always a text character
8Precedence of Operators
• Level of precedence
o Kleene closure (*), ?, +
o concatenation
o alternation (|)
• All operators are left associative.
• Ex: a*b|cd* = ((a*)b)|(c(d*))
Lex 9
• Pattern Matching Primitives
Lex 10
• Pattern Matching examples.
Lex
• Lex predefined variables.
11
Lex
• Whitespace must separate the defining term and the associated expression.
• Code in the definitions section is simply copied as-is to the top of the generated C file and must be
bracketed with “%{“ and “%}” markers.
• substitutions in the rules section are surrounded by braces ({letter}) to distinguish them from literals.
12
13An Overview of Lex
Lex
C compiler
a.out
Lex source
program
lex.yy.c
input
lex.yy.c
a.out
tokens
14Usage
• To run Lex on a source file, type
lex scanner.l
• It produces a file named lex.yy.c which is a C program for the
lexical analyzer.
• To compile lex.yy.c, type
cc lex.yy.c –ll
• To run the lexical analyzer program, type
./a.out < inputfile
15
Any Question So Far?
Yacc - Yet Another Compiler-Compiler
• Yacc reads the grammar and generate C code for a parser .
• Grammars written in Backus Naur Form (BNF) .
• BNF grammar used to express context-free languages .
• e.g. to parse an expression, do reverse operation(reducing the expression)
• This known as bottom-up or shift-reduce parsing .
• Using stack for storing (LIFO).
16
Yacc
• Input to yacc is divided into three sections.
...definitions...
%%
...rules...
%%
... subroutines...
17
Yacc
• The definitions section consists of :
o token declarations .
o C code bracketed by “%{“ and “%}”.
• the rules section consists of :
o BNF grammar.
• the subroutines section consists of :
o user subroutines.
18
19
yacc
An Overview of YACC
(1) Parser generation time
YACC source (*.y)
y.tab.h
y.tab.c
C compiler/linker
(2) Compile time
y.tab.c a.out
a.out
(3) Run time
Token stream
Abstract
Syntax
Tree
y.output
20Lex v.s. Yacc
• Lex
oLex generates C code for a lexical analyzer, or scanner
oLex uses patterns that match strings in the input and
converts the strings to tokens
• Yacc
oYacc generates C code for syntax analyzer, or parser.
oYacc uses grammar rules that allow it to analyze tokens
from Lex and create a syntax tree.
Yacc & Lex in Together
• The grammar:
program -> program expr | ε
expr -> expr + expr | expr - expr | id
• Program and expr are nonterminals.
• Id are terminals (tokens returned by lex).
• expression may be :
o sum of two expressions .
o product of two expressions .
o Or an identifiers
21
Lex file 22
Yacc file 23
Linking lex & yacc 24
Lex Yacc
yylex() yyparse()
Lex source
(Lexical Rules)
Yacc source
(Grammar Rules)
Input
Parsed
Input
lex.yy.c y.tab.c
call
Lex & yacc

Lex & yacc

  • 1.
    Lex & Yacc Preparedby: Taha Malampattiwala 140050107052
  • 2.
    Outline • Lex : •Theory • Execution • Example • Yacc : • Theory • Description • Example • Lex & Yacc linking 2
  • 3.
    Lex • lex isa program (generator) that generates lexical analyzers, (widely used on Unix). • It is mostly used with Yacc parser generator. • Written by Eric Schmidt and Mike Lesk. • It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language. • Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers. 3
  • 4.
    Lex • A simplepattern: letter(letter|digit)* • Regular expressions are translated by lex to a computer program that mimics an FSA. • This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits. 4
  • 5.
    Lex • Some limitations,Lex cannot be used to recognize nested structures such as parentheses, since it only has states and transitions between states. • So, Lex is good at pattern matching, while Yacc is for more challenging tasks. 5
  • 6.
    Lex • The inputstructure to Lex ……..Definitions section…… %% ……Rules section…….. %% ……….C code section (subroutines)…….. 6
  • 7.
    7Operators “ [] ^ - ? . * + | ( ) $ / { } % < > • If they are to be used as text characters, an escape should be used $ = “$” = “” • Every character but blank, tab (t), newline (n) and the list above is always a text character
  • 8.
    8Precedence of Operators •Level of precedence o Kleene closure (*), ?, + o concatenation o alternation (|) • All operators are left associative. • Ex: a*b|cd* = ((a*)b)|(c(d*))
  • 9.
    Lex 9 • PatternMatching Primitives
  • 10.
    Lex 10 • PatternMatching examples.
  • 11.
  • 12.
    Lex • Whitespace mustseparate the defining term and the associated expression. • Code in the definitions section is simply copied as-is to the top of the generated C file and must be bracketed with “%{“ and “%}” markers. • substitutions in the rules section are surrounded by braces ({letter}) to distinguish them from literals. 12
  • 13.
    13An Overview ofLex Lex C compiler a.out Lex source program lex.yy.c input lex.yy.c a.out tokens
  • 14.
    14Usage • To runLex on a source file, type lex scanner.l • It produces a file named lex.yy.c which is a C program for the lexical analyzer. • To compile lex.yy.c, type cc lex.yy.c –ll • To run the lexical analyzer program, type ./a.out < inputfile
  • 15.
  • 16.
    Yacc - YetAnother Compiler-Compiler • Yacc reads the grammar and generate C code for a parser . • Grammars written in Backus Naur Form (BNF) . • BNF grammar used to express context-free languages . • e.g. to parse an expression, do reverse operation(reducing the expression) • This known as bottom-up or shift-reduce parsing . • Using stack for storing (LIFO). 16
  • 17.
    Yacc • Input toyacc is divided into three sections. ...definitions... %% ...rules... %% ... subroutines... 17
  • 18.
    Yacc • The definitionssection consists of : o token declarations . o C code bracketed by “%{“ and “%}”. • the rules section consists of : o BNF grammar. • the subroutines section consists of : o user subroutines. 18
  • 19.
    19 yacc An Overview ofYACC (1) Parser generation time YACC source (*.y) y.tab.h y.tab.c C compiler/linker (2) Compile time y.tab.c a.out a.out (3) Run time Token stream Abstract Syntax Tree y.output
  • 20.
    20Lex v.s. Yacc •Lex oLex generates C code for a lexical analyzer, or scanner oLex uses patterns that match strings in the input and converts the strings to tokens • Yacc oYacc generates C code for syntax analyzer, or parser. oYacc uses grammar rules that allow it to analyze tokens from Lex and create a syntax tree.
  • 21.
    Yacc & Lexin Together • The grammar: program -> program expr | ε expr -> expr + expr | expr - expr | id • Program and expr are nonterminals. • Id are terminals (tokens returned by lex). • expression may be : o sum of two expressions . o product of two expressions . o Or an identifiers 21
  • 22.
  • 23.
  • 24.
    Linking lex &yacc 24 Lex Yacc yylex() yyparse() Lex source (Lexical Rules) Yacc source (Grammar Rules) Input Parsed Input lex.yy.c y.tab.c call