Lex and Yacc
COP - 3402
General Compiler Infra-structure
Scanner
(tokenizer)
Parser Semantic
Routines
Analysis/
Transformations/
optimizations
Code
Generator
Program source
(stream of
characters)
Tokens
Syntactic
Structure
IR: Intermediate
Representation (1)
Assembly code
IR: Intermediate
Representation (2)
Symbol and
Attribute Tables
Lex
Yacc
Lex & Yacc
• Lex
– generates C code for the lexical analyzer (scanner)
– Token patterns specified by regular expressions
• Yacc
– generates C code for a LR(1) syntax analyzer (parser)
– BNF rules for the grammar
Lex
 lex is a program (generator) that generates lexical analyzers,
(widely used on Unix).
 It is mostly used with Yacc parser generator.
 Written by Eric Schmidt and Mike Lesk.
 It reads the input stream (specifying the lexical analyzer ) and
outputs source code implementing the lexical analyzer in the C
programming language.
 Lex will read patterns (regular expressions); then produces C
code for a lexical analyzer that scans for identifiers.
Lex predefined variables and
functions
Lex – Pattern Matching
Primitives
Lex – Pattern Matching
Examples
Example: Simple Calculator
• Computes the basic arithmetic operations
• Allows declaration of variables
• Enough to illustrate the basic structure of
Lex and Yacc programs
Lex program structure
… definitions …
%%
… rules …
%%
… subroutines …
%{
#include <stdio.h>
#include "y.tab.h"
int c;
extern int yylval;
%}
%%
" " ;
[a-z] { c = yytext[0]; yylval = c - 'a';
return(LETTER); }
[0-9]* { yylval = atoi(yytext);
return(NUMBER); }
[^a-z0-9b] { c = yytext[0]; return(c); }
Pattern Matching and Action
[a-z] { c = yytext[0]; yylval = c - 'a';
return(LETTER); }
[0-9]* { yylval = atoi(yytext);
return(NUMBER); }
Match a character
in the a-z range
Match a positive integer
(sequence of 0-9 digits)
Place the offset c – ‘a’
In the stack
Place the integer value
In the stack
Buffer
Yacc
• Grammars described by rules using a
variant of the Backus Naur Form (BNF)
– Context-free grammars
• LALR(1) parse table is generated
automatically based on the rules
• Actions are added to the rules and
executed after each reduction
Yacc Program Structure
%{
#include <stdio.h>
int regs[26];
int base;
%}
%token NUMBER LETTER
%left '+' '-‘
%left '*' '/‘
%%
list: | list stat 'n' |list error 'n' {yyerrok;} ;
stat: expr {printf("%dn",$1);}
| LETTER '=' expr {regs[$1] = $3;};
expr:
'(' expr ')' {$$ = $2;} |
expr '+' expr {$$ = $1 + $3;} |
LETTER {$$ = regs[$1];}
%%
main(){return(yyparse());}
yyerror(CHAR *s){fprintf(stderr, "%sn",s);}
yywrap(){ return(1);}
… definitions …
%%
… rules …
%%
… subroutines …
Rule Reduction and Action
stat: expr {printf("%dn",$1);}
| LETTER '=' expr {regs[$1] = $3;};
expr:
expr '+' expr {$$ = $1 + $3;} |
LETTER {$$ = regs[$1];}
Grammar rule Action
“or” operator:
For multiple RHS
Further reading…

“A Compact Guide to Lex & Yacc”,
Thomas Niemann (recommended);

“Lex & Yacc”, Doug Brown (O’Reily);

Lots of resources on the web
• Check our website for some suggestions
Conclusions
• Yacc and Lex are very helpful for building
the compiler front-end
• A lot of time is saved when compared to
hand-implementation of parser and
scanner
• They both work as a mixture of “rules” and
“C code”
• C code is generated and is merged with
the rest of the compiler code

Compiler Design Brief in Computer Application

  • 1.
  • 2.
    General Compiler Infra-structure Scanner (tokenizer) ParserSemantic Routines Analysis/ Transformations/ optimizations Code Generator Program source (stream of characters) Tokens Syntactic Structure IR: Intermediate Representation (1) Assembly code IR: Intermediate Representation (2) Symbol and Attribute Tables Lex Yacc
  • 3.
    Lex & Yacc •Lex – generates C code for the lexical analyzer (scanner) – Token patterns specified by regular expressions • Yacc – generates C code for a LR(1) syntax analyzer (parser) – BNF rules for the grammar
  • 4.
    Lex  lex isa program (generator) that generates lexical analyzers, (widely used on Unix).  It is mostly used with Yacc parser generator.  Written by Eric Schmidt and Mike Lesk.  It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.  Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers.
  • 5.
  • 6.
    Lex – PatternMatching Primitives
  • 7.
    Lex – PatternMatching Examples
  • 8.
    Example: Simple Calculator •Computes the basic arithmetic operations • Allows declaration of variables • Enough to illustrate the basic structure of Lex and Yacc programs
  • 9.
    Lex program structure …definitions … %% … rules … %% … subroutines … %{ #include <stdio.h> #include "y.tab.h" int c; extern int yylval; %} %% " " ; [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } [^a-z0-9b] { c = yytext[0]; return(c); }
  • 10.
    Pattern Matching andAction [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } Match a character in the a-z range Match a positive integer (sequence of 0-9 digits) Place the offset c – ‘a’ In the stack Place the integer value In the stack Buffer
  • 11.
    Yacc • Grammars describedby rules using a variant of the Backus Naur Form (BNF) – Context-free grammars • LALR(1) parse table is generated automatically based on the rules • Actions are added to the rules and executed after each reduction
  • 12.
    Yacc Program Structure %{ #include<stdio.h> int regs[26]; int base; %} %token NUMBER LETTER %left '+' '-‘ %left '*' '/‘ %% list: | list stat 'n' |list error 'n' {yyerrok;} ; stat: expr {printf("%dn",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: '(' expr ')' {$$ = $2;} | expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} %% main(){return(yyparse());} yyerror(CHAR *s){fprintf(stderr, "%sn",s);} yywrap(){ return(1);} … definitions … %% … rules … %% … subroutines …
  • 13.
    Rule Reduction andAction stat: expr {printf("%dn",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} Grammar rule Action “or” operator: For multiple RHS
  • 14.
    Further reading…  “A CompactGuide to Lex & Yacc”, Thomas Niemann (recommended);  “Lex & Yacc”, Doug Brown (O’Reily);  Lots of resources on the web • Check our website for some suggestions
  • 15.
    Conclusions • Yacc andLex are very helpful for building the compiler front-end • A lot of time is saved when compared to hand-implementation of parser and scanner • They both work as a mixture of “rules” and “C code” • C code is generated and is merged with the rest of the compiler code