ProjectCompilers.pdf
Page 1 of 6
Project: Construction of a Simple Parser
INSTRUCTIONS:
You will illustrate the basic phases of the compilation process (lexical, syntax, and
semantic analysis) through a simple compiler for a programming language model
“NEWLANG”.
The programming language “NEWLANG” is very simple.
A. Lexical Conventions of NEWLANG
1. The keywords of the language are the following:
declare read write
All keywords are reserved, and must be written in lower case.
2. Special symbols are the following:
{ } ( ) = + - ;
3. Other tokens are NAME that represents a string of letters and numbers, starting with a
letter, and NUMBER that represents a sequence of digits.
Lower and upper case letters are distinct.
4. White space consists of blanks, newlines, and tabs. White space is ignored except it
must separate NAME’s, NUMBER’s, and keywords.
Page 2 of 6
B. Syntax and Semantics of NEWLANG
The syntax of “NEWLANG” is described by the grammar rules defined as follows:
program : { statement_list }
;
statement_list : statement ; statement_list
| ε
;
statement : declaration
| assignment
| read_statement
| write_statement
;
declaration : declare var
;
assignment : var = expression
;
read_statement : read var
;
write_statement : write expression
;
expression : term
| term + term
| term - term
;
term : NUMBER
| var
| ( expression )
;
var : NAME
;
The semantics of ‘NEWLANG” should be clear: a “NEWLANG” program consists of a
sequence o f read/write or assignment st ate men ts. There are integer-valued v a r i a b l e s
Page 3 of 6
(which need to be declared before they are used), and expressions are restricted to
addition and subtraction.
C. Example of NEWLANG source program
A simple NEWLANG program is shown below:
f
{
declare xyz;
xyz = (33+3)-35;
write xyz;
}
The output of the above program is, of course, 1.
D. Project Implementation
The project consists of three phases:
Phase I: Lexical Analysis
With the aid of the lexical analysis generator tool Flex, you will construct the lexical
analyzer, in order to transform a sequence of input characters into a sequence of tokens.
Phase II: Syntax Analysis
With the aid of the syntax analysis generator tool Bison, you will construct the syntax
analyzer, the parser, in order to check whether the sequence of tokens is grammatically
correct, according to the grammar rules that define the syntax of the source language.
Looking at the grammar rules for “NEWLANG” (see section B, above) it seems clear
that a program is syntactically correct if the structure of the tokens matches the structure
of a <program> as defined by these rules.
Phase III: Semantic Analysis
Having established that the source text is syntactically correct, the compiler must now
per ...
Unit-IV; Professional Sales Representative (PSR).pptx
ProjectCompilers.pdfPage 1 of 6 Project Con.docx
1. ProjectCompilers.pdf
Page 1 of 6
Project: Construction of a Simple Parser
INSTRUCTIONS:
You will illustrate the basic phases of the compilation
process (lexical, syntax, and
semantic analysis) through a simple compiler for a programming
language model
“NEWLANG”.
The programming language “NEWLANG” is very simple.
A. Lexical Conventions of NEWLANG
1. The keywords of the language are the following:
2. declare read write
All keywords are reserved, and must be written in lower case.
2. Special symbols are the following:
{ } ( ) = + - ;
3. Other tokens are NAME that represents a string of letters
and numbers, starting with a
letter, and NUMBER that represents a sequence of digits.
Lower and upper case letters are distinct.
4. White space consists of blanks, newlines, and tabs. White
space is ignored except it
must separate NAME’s, NUMBER’s, and keywords.
Page 2 of 6
B. Syntax and Semantics of NEWLANG
3. The syntax of “NEWLANG” is described by the grammar rules
defined as follows:
program : { statement_list }
;
statement_list : statement ; statement_list
| ε
;
statement : declaration
| assignment
| read_statement
| write_statement
;
declaration : declare var
;
assignment : var = expression
;
read_statement : read var
;
write_statement : write expression
;
4. expression : term
| term + term
| term - term
;
term : NUMBER
| var
| ( expression )
;
var : NAME
;
The semantics of ‘NEWLANG” should be clear: a “NEWLANG”
program consists of a
sequence o f read/write or assignment st ate men ts. There are
integer-valued v a r i a b l e s
Page 3 of 6
(which need to be declared before they are used), and
expressions are restricted to
addition and subtraction.
5. C. Example of NEWLANG source program
A simple NEWLANG program is shown below:
f
{
declare xyz;
xyz = (33+3)-35;
write xyz;
}
The output of the above program is, of course, 1.
D. Project Implementation
The project consists of three phases:
Phase I: Lexical Analysis
With the aid of the lexical analysis generator tool Flex, you will
construct the lexical
analyzer, in order to transform a sequence of input characters
into a sequence of tokens.
Phase II: Syntax Analysis
6. With the aid of the syntax analysis generator tool Bison, you
will construct the syntax
analyzer, the parser, in order to check whether the sequence of
tokens is grammatically
correct, according to the grammar rules that define the syntax of
the source language.
Looking at the grammar rules for “NEWLANG” (see section B,
above) it seems clear
that a program is syntactically correct if the structure of the
tokens matches the structure
of a <program> as defined by these rules.
Phase III: Semantic Analysis
Having established that the source text is syntactically correct,
the compiler must now
perform additional checks such as determining the type of
expressions and checking that
Page 4 of 6
7. all statements are correct with respect to the typing rules,
that variables have been
properly declared before they are used, etc.
This phase is carried out using information from the parse tree
and the symbol table.
In our project, very little needs to be checked, due to the
extreme simplicity of the
language. The only checks that are performed verify that a
variable has been declared
before it is used, and whether a variable has been re-declared.
E. Project Implementation Hints
Some important notes follow to help you in the implementation
of the project.
1. In the Yacc/Bison specification, you need to specify the
types of the attributes the
grammar symbols can hold. For example:
%union {
8. char *str;
int val;
}
. . .
%type <str> NAME var
%type <val> NUMBER
. . .
2. For the symbol table implementation, you need to define a
data structure (e.g., NODE)
with the appropriate members. For example, you may need to
have a member of type
string to hold the name of the identifier, a member of type
integer to hold the
kind of the identifier (e.g., READ, WRITE, or NAME),
and a member of type
integer to mark a symbol in the table as declared.
Further, you may declare an array of type NODE, with a
maximum size, say 100, or
you may create a dynamic linked list (the latter is
considered a more effective
9. solution than the former one).
Page 5 of 6
Also, for the symbol table management, you need to define two
important operations:
The insert and lookup (find) operations.
The insert function will create a new entry in the symbol table,
whenever a new
identifier is declared.
The lookup (find) operation will search for a specific identifier
in the symbol
table and return whether it has found it or not.
Moreover, you may need to insert the keywords read, write, and
declare in the
symbol table, before the parsing begins (before the call of the
yyparse() function
in the main()), in order not to be used as normal variables
(identifiers).
10. F. Error Messages
A simple “NEWLANG” program with errors is shown below:
f
{
read x;
x = x+2; y
= x+3;
write y;
declare z;
z = 3- 2;
declare z;
}
When you run your compiler, the following messages should
appear:
$ ./out test-errors.nl
error no 1: Variable "x" not declared (line 2)
error no 2: Variable "x" not declared (line 3)
11. error no 3: Variable "x" not declared (line 3)
error no 4: Variable "x" not declared (line 4)
error no 5: Variable "y" not declared (line 4)
error no 6: Variable "y" not declared (line 5)
error no 7: Variable "z" already declared (line 8)
README.txt
> bison -vd project.y
> flex project.fl
> gcc -o out project.tab.c lex.yy.c -lfl
> ./out myprog.nl
test.nl
{
declare a;
declare b;
declare c;
12. declare d;
read a;
read b;
c = (a+3)-2;
d=c;
write (b+d);
}
test-errors.nl
{
read x;
x = x+2;
y = x+3;
write y;
declare z;
z = 3- 2;
declare z;
}