1. University of Dammam
Girls’ College of Science
Department of Computer Science
Compiler Engineering Lab
COMPILER
ENGINEERING
LAB # 1: INTRODUCTION & LEXICAL
ANALYSIS
2. WHAT IS A COMPILER?
• It is a program that reads a program written in one
language - the source language – and translates it
into an equivalent program in another language –
the target language-
• An important part of this translation process is that
the compiler reports to its user the presence of
errors in the source program.
Department of Computer Science -
25-29/2/12 2
Compiler Engineering Lab
3. COMPILER THEORY
Source target
Compiler
program program
error
messages
Department of Computer Science -
25-29/2/12 3
Compiler Engineering Lab
4. COMPILER ENVIRONTMENT TOOLS
• Many software tools that
manipulate source program first
perform some analysis .
• Some examples of such tools
include
Department of Computer Science -
25-29/2/12 4
Compiler Engineering Lab
5. 1- STRUCTURE EDITOR
• It takes as input a sequence of commands to build a
source program
• performs the text creation and modification function
of a text editor
• Analyze program text, putting and appropriate
hierarchical structure on the source program
• Checks that the input is correctly formed
• Can supply Keywords automatically
• Can jump from a begin or left parenthesis to its
matching end or right parenthesis
Department of Computer Science -
25-29/2/12 5
Compiler Engineering Lab
6. 2- PRETTY PRINTERS
• Analyze the program and prints it in
such a way that the structure of the
program becomes clearly visible.
Department of Computer Science -
25-29/2/12 6
Compiler Engineering Lab
7. 3- STATIC CHECKERS
• Reads a program
• Analyze it
• Discover potential bugs without
running the program
• Catch logical errors
Department of Computer Science -
25-29/2/12 7
Compiler Engineering Lab
8. 4 - INTERPRETERS
• Performs the operations implied by
the source program.
• What is the difference between a
Compiler and an Interpreter ?
Department of Computer Science -
25-29/2/12 8
Compiler Engineering Lab
9. COMPILER PHASES
Department of Computer Science -
25-29/2/12 9
Compiler Engineering Lab
10. PARTS OF COMPILATION
1. Analysis 2. Synthesis
The analysis part The synthesis part
breaks up the constructs the
source program desired target
into consistent program from the
intermediate
pieces representation.
and creates an
intermediate
representation of
the source
program.
Department of Computer Science -
25-29/2/12 10
Compiler Engineering Lab
11. PROCESSING ENDS OF A COMPILER
1. Front-End 2. Back-End
Consists of phases that Includes those portions of
depend primarily on the the compiler that depend
source language and on the target machine ,
largely independent of and do not depend on the
the target machine source language (code
(lexical – syntactic – optimization , code
symbol table – semantic generation)
– intermediate code )
Department of Computer Science -
25-29/2/12 11
Compiler Engineering Lab
12. COMPLIER PHASES
Source
Program
Lexical “Scanning”
Compiler Analysis Syntax (Hierarchical) “Parsing”
Front End
Contextual “Semantic
Analysis”
Intermediate
Back End Code
Synthesis Phases are
Object important to
Code simplify the
Machine compiler’s
Language structure
Department of Computer Science -
25-29/2/12 12
Compiler Engineering Lab
13. COMPLIER PHASES INTERACTION
(VIA DATA STRUCTURE)
Source
Program Text
Lexical
Tokens
Compiler Analysis Syntax
Abstract
Front End (Syntax Tree)
Contextual
Decorated AST
Intermediate + Symbol Table
Back End Code
Synthesis Intermediate
Code
Object
Code Object Code
Machine
Language
Department of Computer Science -
25-29/2/12 13
Compiler Engineering Lab
15. COMPILER CONSTRUCTION TOOLS
• Compiler can be written like any program
•A programmer can use software
development tools like :
• Debugger
• Version manager
• Profilers
• More specialized tools have been
developed for helping implementing various
phases of a compiler
Department of Computer Science -
25-29/2/12 15
Compiler Engineering Lab
16. 1- SCANNER GENERATORS
• Generate lexical analyzer from a
specification based on regular expression.
Department of Computer Science -
25-29/2/12 16
Compiler Engineering Lab
17. 2- PARSER GENERATORS
• Produces syntax analyzers from input that is
based on a context – free grammar.
• In early compilers ,syntax analysis consumed
a large fraction of running time and large
fraction of intellectual effort of writing
compilers.
• Using parser generator gives ability to
implement this phase in few days.
Department of Computer Science -
25-29/2/12 17
Compiler Engineering Lab
18. 3- SYNTAX–DIRECTED
TRANSLATOR ENGINE
• Produce collection of routines that walk
the parser tree generating the intermediate
code
Department of Computer Science -
25-29/2/12 18
Compiler Engineering Lab
19. 4 - AUTOMATIC CODE
GENERATOR
• Takes a collection of rules that define the
translation of each operation of the
intermediate language into the machine
language for the target machine
Department of Computer Science -
25-29/2/12 19
Compiler Engineering Lab
20. 5 - DATA FLOW ENGINE
• Much of information needed to perform
good code optimization involves “ data_
flow analysis”,
• The gathering of information about how
values are transmitted from one part of a
program to each other part
Department of Computer Science -
25-29/2/12 20
Compiler Engineering Lab
21. LEXICAL ANALYSIS
FIRST PHASE OF A COMPILER
Department of Computer Science -
25-29/2/12 21
Compiler Engineering Lab
22. INSERTING A LEXICAL ANALYZER
BETWEEN THE INPUT AND THE PARSER
Read
character
Lexical
Input Parser
Analyzer
push back pass Token and
character its attribute
23. LEXICAL ANALYZER MECHANISM
• Read the characters from the input
• Group them into lexemes
• Pass the tokens formed by the lexemes together
with their attribute values to the later stages
• In some situations the lexical analyzer has to
read some more characters ahead before it can
decide on the token to be returned to the parser
• the extra character has to be pushed back onto
the input, because it can be the beginning of the
next lexeme.
24. IMPLEMENTING THE INTERACTION
Read character
using getchar( )
Lexan() pass Token and
its attribute
Lexical
push back
Analyzer
character F
ungetc(F,stdin)
25. LEX …
• A particular tool , that has been widely used to
specify lexical analyzers for a variety of
languages
• Using such tool will allow us to show how the
specification of patterns using regular
expressions can be combined with action
26. REGULAR EXPRESSION PATTERNS
FOR TOKENS
Regular expression Token Attribute-value
ws - -
If if -
then then -
else else -
Id id Pointer to table entry
Num num Pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE
27. LEX SPECIFICATION
• A Lex program consists of three parts:
1. Declarations
2. Translation rules
3. Auxiliary procedure
28. 1- DECLARATIONS SECTION
Includes declarations of :
variables, manifest constants
and regular definitions
Manifest constant..
Is an identifier that is declared to represent
a constant
29. DEFINITION OF MANIFEST CONSTANT USED
BY THE TRANSLATION RULES
LT , LE, EQ , NE , GT , GE , IF ,
THEN , ELSE , ID , NUMBER ,
RELOP, AROP
30. REGULAR DEFINITIONS
delim [ tn]
Ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number
{digit}+(.{digit}+)?(E[+-]?{digit}+)?
31. 2-TRANSLATION RULES
are statements of the form
P1 {action1}
P2 {action2}
……………..
Pn {action n}
• where each p is a regular expression and each
{action} is a program fragment describing what
action the lexical analyzer shoud take when
pattern p matches a lexeme
32. 2- TRANSLATION RULES
Ws no action and no return
if return (IF)
then return (THEN)
else return (ELSE)
“<“ val =LT
return (RELOP)
and similarly to other relation operations
Id val = install_id( )
return(ID)
Number
val= install_num( )
return(NUM)
33. 3-AUXILIARY PROCEDURES
• Holds whatever auxiliary procedures are needed
by the action
• a lexical analyzer created by lex behaves in
concert with a parser in the following manner:
when activated by the parser the lexical analyzer
begins reading its remaining input ,one character
at a time ,until it has found the longest prefix of
the input that is matched by one of the regular
expressions P then it execute action
34. CON..
• Typically action will return control to the parser, if it
does not the lexical analyzer proceeds to find
more lexemes until an action causes control to
return to the parser
• The lexical analyzer returns a single quantity to
the parser ,the token..
• to pass an attribute value with information about
the lexeme we can set a global variable called val
35. AUXILIARY PROCEDURES
• install_id ( )
Procedure to install the lexeme
• install_num ( )
similar procedure to install a lexeme
that is a number
36. WRITING A LEXICAL ANALYZER
• Write a lexical analyzer Using C++
language.
• Write it as a function called from inside
main( )
• Call that function Lexan
• Lexan function returns the value of
Token
37. THE LEXICAL ANALYZER WILL DO..
• Read character from the user
• If the character is a blank (Space) or a (tab)
(written ‘t’) no token is returned to the parser,
exit the function
• If the character is (new line) written (‘n’) the line
numbers will be incremented ,no token is
returned
• If the character is one Digit .. Tokenval
38. MORE THAN ONE DIGIT ..
• Allow user to enter sequence of characters
• While the user entering digits after first digit the
analyzer allows him to enter more digits
• Each time the analyzer compute the Tokenval
• If the next character is not digit push back the
character
• Each time print the result from each part to see
the output
40. READING CHARACTER FROM THE USER
#include <stdio.h>
int getchar( );
• Gets character from stdin.
• getchar is a macro that returns the next
character on the named input stream stdin.
• On success , getchar returns the character read,
after converting it to an int without sign extension
using the ASCII code.
41. PUSHING BACK CHARACTERS
#include <stdio.h>
ungetc (c,stdin)
• Pushes a character back into input stream.
• ungetc pushes the character c back onto the
named input stream, which must be open for
reading. This character will be returned on the
next call to getchar for that stream. One
character can be pushed back in all situations.
• On success, ungetc returns the character pushed
back.
42. TEST CHARACTER IF (DIGIT) OR NOT
#include <ctype.h>
isdigit(t)
• Tests for decimal-digit character.
• isdigit is a macro that classifies ASCII-coded
integer values by table lookup
• isdigit returns nonzero if c is a digit.
43. QUESTIONS?
Thank you for listening
Department of Computer Science -
25-29/2/12 43
Compiler Engineering Lab