PhD Program in Computer Science
Seminar Title:
YACC (Yet Another Compiler Compiler)
Prepared By:
Omer Salih Dawood Omer
Email:Omercomail@gmail.com
Supervised By:
Dr. Mohamed Elhafiz Mustafa Musa
2
Agenda:
 Overview and History
 What Do Compilers Do?
 The Structure of a Compiler
 Review of Parser
 How Does YACC Work?
 References.
3
Overview and History
 The first real compiler
 FORTRAN compilers of the late 1950s
 18 person-years to build
 Today, we can build a simple compiler in a few month.
 Crafting and efficient and reliable compiler is still
challenging.
 Another kind of language processor, called an
interpreter, differs from a compiler in that it executes
programs without explicitly performing a translation
 Advantages and Disadvantages of an interpreter
What Do Compilers Do?
Source
Program
Encoding
OutputInterpreter
Data
What Do Compilers Do? (Cont’d.)
 Advantage
 Modification to program during execution
 Interactive debugging
 Not for every language, e.g., Basic, Pascal
 Dynamic-typed languages
 Variable types may change at run time, e.g., LISP.
 Difficult to compile
 Better diagnostics
 Source code is available.
 Machine independence
 However, the interpreter itself must be portable.
What Do Compilers Do? (Cont’d.)
 Disadvantage
 Slower execution due to repeated examination
 Substantial space overhead
The Structure of a Compiler
 Modern compilers are syntax-directed
 Compilation is driven the syntactic structure of
programs; i.e., actions are associated with the
structures.
 Any compiler must perform two major tasks
 Analysis of the source program
 Synthesis of a machine-language program
The Structure of a Compiler
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program
(Character
Stream)
Tokens Syntactic
Structure
Intermediate
Representation
Target Machine
Code
Symbol and
Attribute
Tables
(Used by all Phases
of The Compiler)
The structure of a Syntax-Directed Compiler
The Structure of a Compiler
(Cont’d.)
 Scanner
 The scanner begins the analysis of the source program
by reading the input, character by character, and
grouping characters into individual words and symbols
(tokens)
 The tokens are encoded and then are fetched to the
parser for syntactic analysis
 For details.
 Scanner generators
regular
exp for
tokens
finite
automata
as programs
lex or scangen
The Structure of a Compiler
(Cont’d.)
 Parser
 Given a formal syntax specification (typically as a
context-free grammar [CFG]), the parse reads tokens
and groups them into units as specified by the
productions of the CFG being used.
 While parsing, the parser verifies correct syntax, and if
a syntax error is found, it issues a suitable diagnostic.
 As syntactic structure is recognized, the parser either
calls corresponding semantic routines directly or builds
a syntax tree.
grammar
yacc or llgen
parser
The Structure of a Compiler
(Cont’d.)
 Semantic Routines
 Perform two functions
 Check the static semantics of each construct
 Do the actual translation for generating IR
 The heart of a compiler
 Optimizer
 The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent
but improved IR code.
 This phase can be very complex and slow.
The Structure of a Compiler
(Cont’d.)
 One-pass compiler
 No optimization is required
 Retargetable compiler
 Many machine description files, e.g., gcc
 Compiler writing tools
 Compiler generators or compiler-compilers
 Lex and Yacc
 E.g., scanner and parser generators[1]
Review of Parser
 Yacc provides a general tool for imposing structure on
the input to a computer program. The Yacc user
prepares a specification of the input process; this
includes rules describing the input structure, code to
be invoked when these rules are recognized, and a low-
level routine to do the basic input. Yacc then generates
a function to control the input process[2].
 Parser invokes scanner for tokens.
 Parser analyze the syntactic structure according to
grammars.
 Finally, parser executes the semantic routines[3].
15
yacc
Generate a new parser code from grammar
YACC source (*.y)
y.tab.h
y.tab.c
C compiler/linker
Compile a new parser
y.tab.c a.out
a.out
Parse source code
Token stream
Abstract
Syntax
Tree
y.output
How Does YACC Work?
16
YACC Format
%{
C declarations
%}
yacc declarations
%%
Grammar rules
%%
Additional C code
17
YACC Example
%{
#include <stdio.h>
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression { printf("= %dn", $1); }
;
expression: expression '+' NUMBER { $$ = $1 + $3; }
| expression '-' NUMBER { $$ = $1 - $3; }
| NUMBER { $$ = $1; }
;
%%
int yyerror(char *s)
{
fprintf(stderr, "%sn", s);
return 0;
}
int main(void)
{
yyparse();
return 0;
}
C declarations
yacc declarations
Grammar rules
Additional C code
18
YACC Declaration Summary
`%start'
Specify the grammar's start symbol
`%union'
Declare the collection of data types that semantic values may have
`%token'
Declare a terminal symbol (token type name) with no precedence or
associativity specified
`%type'
Declare the type of semantic values for a nonterminal symbol
19
YACC Declaration Summary
`%right'
Declare a terminal symbol (token type name) that is
right-associative
`%left'
Declare a terminal symbol (token type name) that is left-associative
`%nonassoc'
Declare a terminal symbol (token type name) that is nonassociative
(using it in a way that would be associative is a syntax error)
20
YACC and Bison Command
 Yacc (AT&T)
 yacc –d xxx.y
 Bison (GNU)
 bison –d –y xxx.y
Produce y.tab.c, the
same as above yacc
command
21
Work between LEX and YACC%{
#include <stdio.h>
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]*
%%
int { return INT; }
char { return CHAR; }
float { return FLOAT; }
{id} { return ID;}
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token CHAR, FLOAT, ID, INT
%%
yacc -d xxx.y
 Produced
y.tab.h
# define CHAR 258
# define FLOAT 259
# define ID 260
# define INT 261
parser.y
scanner.l
References:
 1- Book: Crafting a Compiler with C
Author: Charles N. Fischer and Richard J. LeBlanc, Jr.The
Benjamin/Cumming Publishing Company, Inc
 2- dinosaur.compilertools.net/yacc/
 3- Ying-Hung Jiang “Introduction to Yacc “
 4- “YACC Parser Generator” ,www.csie.ntu.edu.tw/~b92006/YACC-
present-2005.ppt
22

Yacc (yet another compiler compiler)

  • 2.
    PhD Program inComputer Science Seminar Title: YACC (Yet Another Compiler Compiler) Prepared By: Omer Salih Dawood Omer Email:Omercomail@gmail.com Supervised By: Dr. Mohamed Elhafiz Mustafa Musa 2
  • 3.
    Agenda:  Overview andHistory  What Do Compilers Do?  The Structure of a Compiler  Review of Parser  How Does YACC Work?  References. 3
  • 4.
    Overview and History The first real compiler  FORTRAN compilers of the late 1950s  18 person-years to build  Today, we can build a simple compiler in a few month.  Crafting and efficient and reliable compiler is still challenging.
  • 5.
     Another kindof language processor, called an interpreter, differs from a compiler in that it executes programs without explicitly performing a translation  Advantages and Disadvantages of an interpreter What Do Compilers Do? Source Program Encoding OutputInterpreter Data
  • 6.
    What Do CompilersDo? (Cont’d.)  Advantage  Modification to program during execution  Interactive debugging  Not for every language, e.g., Basic, Pascal  Dynamic-typed languages  Variable types may change at run time, e.g., LISP.  Difficult to compile  Better diagnostics  Source code is available.  Machine independence  However, the interpreter itself must be portable.
  • 7.
    What Do CompilersDo? (Cont’d.)  Disadvantage  Slower execution due to repeated examination  Substantial space overhead
  • 8.
    The Structure ofa Compiler  Modern compilers are syntax-directed  Compilation is driven the syntactic structure of programs; i.e., actions are associated with the structures.  Any compiler must perform two major tasks  Analysis of the source program  Synthesis of a machine-language program
  • 9.
    The Structure ofa Compiler Scanner Parser Semantic Routines Code Generator Optimizer Source Program (Character Stream) Tokens Syntactic Structure Intermediate Representation Target Machine Code Symbol and Attribute Tables (Used by all Phases of The Compiler) The structure of a Syntax-Directed Compiler
  • 10.
    The Structure ofa Compiler (Cont’d.)  Scanner  The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens)  The tokens are encoded and then are fetched to the parser for syntactic analysis  For details.  Scanner generators regular exp for tokens finite automata as programs lex or scangen
  • 11.
    The Structure ofa Compiler (Cont’d.)  Parser  Given a formal syntax specification (typically as a context-free grammar [CFG]), the parse reads tokens and groups them into units as specified by the productions of the CFG being used.  While parsing, the parser verifies correct syntax, and if a syntax error is found, it issues a suitable diagnostic.  As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. grammar yacc or llgen parser
  • 12.
    The Structure ofa Compiler (Cont’d.)  Semantic Routines  Perform two functions  Check the static semantics of each construct  Do the actual translation for generating IR  The heart of a compiler  Optimizer  The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code.  This phase can be very complex and slow.
  • 13.
    The Structure ofa Compiler (Cont’d.)  One-pass compiler  No optimization is required  Retargetable compiler  Many machine description files, e.g., gcc  Compiler writing tools  Compiler generators or compiler-compilers  Lex and Yacc  E.g., scanner and parser generators[1]
  • 14.
    Review of Parser Yacc provides a general tool for imposing structure on the input to a computer program. The Yacc user prepares a specification of the input process; this includes rules describing the input structure, code to be invoked when these rules are recognized, and a low- level routine to do the basic input. Yacc then generates a function to control the input process[2].  Parser invokes scanner for tokens.  Parser analyze the syntactic structure according to grammars.  Finally, parser executes the semantic routines[3].
  • 15.
    15 yacc Generate a newparser code from grammar YACC source (*.y) y.tab.h y.tab.c C compiler/linker Compile a new parser y.tab.c a.out a.out Parse source code Token stream Abstract Syntax Tree y.output How Does YACC Work?
  • 16.
    16 YACC Format %{ C declarations %} yaccdeclarations %% Grammar rules %% Additional C code
  • 17.
    17 YACC Example %{ #include <stdio.h> %} %tokenNAME NUMBER %% statement: NAME '=' expression | expression { printf("= %dn", $1); } ; expression: expression '+' NUMBER { $$ = $1 + $3; } | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% int yyerror(char *s) { fprintf(stderr, "%sn", s); return 0; } int main(void) { yyparse(); return 0; } C declarations yacc declarations Grammar rules Additional C code
  • 18.
    18 YACC Declaration Summary `%start' Specifythe grammar's start symbol `%union' Declare the collection of data types that semantic values may have `%token' Declare a terminal symbol (token type name) with no precedence or associativity specified `%type' Declare the type of semantic values for a nonterminal symbol
  • 19.
    19 YACC Declaration Summary `%right' Declarea terminal symbol (token type name) that is right-associative `%left' Declare a terminal symbol (token type name) that is left-associative `%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error)
  • 20.
    20 YACC and BisonCommand  Yacc (AT&T)  yacc –d xxx.y  Bison (GNU)  bison –d –y xxx.y Produce y.tab.c, the same as above yacc command
  • 21.
    21 Work between LEXand YACC%{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* %% int { return INT; } char { return CHAR; } float { return FLOAT; } {id} { return ID;} %{ #include <stdio.h> #include <stdlib.h> %} %token CHAR, FLOAT, ID, INT %% yacc -d xxx.y  Produced y.tab.h # define CHAR 258 # define FLOAT 259 # define ID 260 # define INT 261 parser.y scanner.l
  • 22.
    References:  1- Book:Crafting a Compiler with C Author: Charles N. Fischer and Richard J. LeBlanc, Jr.The Benjamin/Cumming Publishing Company, Inc  2- dinosaur.compilertools.net/yacc/  3- Ying-Hung Jiang “Introduction to Yacc “  4- “YACC Parser Generator” ,www.csie.ntu.edu.tw/~b92006/YACC- present-2005.ppt 22