module 4.pptx

LEX
• Lex is a programming tool which is very useful
for pattern matching.
• Using this toll the lexical analyzer can be
generated.

 LEX program consisting of declaration section, rule section
and C-code section (discussed in next section) with extension
“.l ". This LEX program is input to the LEX compiler.
 The LEX compiler always generates a C program with file
name "lex.yy.c"
 Using C compiler, compile the program "lex.yy.c" and generate
the execute file "a.out".
 a.out which is output of the compiler is the lexical analyzer
which when executed accepts the input stream and generates
sequence of tokens

STRUCTURE OF LEX PROGRAM
DEFINITION SECTION
%%
RULE SECTION
%%
CODE SECTION

 %% is a delimiter to the mark the beginning of the Rule
section.
 The second %% is optional, but the first is required to
mark the beginning of the rules. The definitions and the
code or subroutines are often omitted

A SIMPLE LEX PROGRAM
Program to identify numbers and letters
%{
#include<stdio.h>
%}
%%
[0-9]+ printf(“found an integer:%dn”, yytext);
[A-Za-z0-9]+ printf(“fount an string %sn”, yytext);
%%
void main()
{
yylex();
}

Output:
$ vi pg1.l
$ lex pg1.l
$ gcc lex.yy.c –ll
$ ./a.out
12
Found an integer: 12
$ ./a.out
avengers
Found an string: avengers

Program to recognize whether a given sentence is simple or
compound.
%{
int f=0;
%}
%%
"because"|"but"|"so"|"also"|"and"|“or”|“hence” {f++;}
%%
main()
{
printf(" enter the sentencen");
yylex();
if(f==1)
printf("compound sentencen");
else
printf(" simple sentencen");
}

Output
$vi pg2.l
$lex pg2.l
$gcc lex.yy.c –ll
$./a.out
Enter the sentence: Jack and rose
Compound sentence
$./a.out
Enter the sentence: Titanic is love story
Simple sentence

PARSER AND LEXER
COMMUNICATION
 When you use a lex scanner and a yacc parser together, the parser
is the higher level routine.
 It calls the lexer yylex() whenever it needs a token from the input.
 The lexer then scans through the input recognizing tokens. As soon
as it finds a token of interest to the parser, it returns to the parser,
returning the token's code as the value of yylex().
 Not all tokens are of interest to the parser—in most programming
languages the parser doesn't want to hear about comments and
whitespace for example.
 For these ignored tokens, the lexer doesn't return so that it can
continue on to the next token without bothering the parser.

A YACC PARSER
 YACC provides a general tool for imposing structure on the
input to a computer program.
 The input specification is a collection of grammar rules.
 Each rule describes an allowable structure and gives it a
name.
 YACC prepares a specification of the input process.
 YACC generates a function to control the input process. This
function is called a parser.
 The name is an acronym for “Yet Another Compiler Compiler”.
 YACC generates the code for the parser in the C programming
language

Steps in writing YACC Program:
1st step: Using gedit editor create a file with extension y. For
example: prg1.y for yacc file and prg1.l for lex file
2nd Step: yacc –d prg1.y
3rd Step: lex prg1.l
4th Step: cc y.tab.c lex.yy.c -ll
5th Step: /a.out
 When we run YACC, it generates a parser file y.tab.
 To obtain tokens, YACC calls yylex.
 Function yylex has a return type of int, and returns the token.
 Values associated with the token are returned by lex in
variable yylval.

STRUCTURE OF YACC
SOURCE PROGRAM
DEFINITION SECTION
%%
RULE SECTION
%%
CODE SECTION

DEFINITION SECTION
Type Description
%union
Itdefines the StacktypefortheParser.Itis
A unionofvariousdata/structures/objects
%token
These are the terminals returned by the
yylexfunctiontotheYACC.Atokencanalso have
type associated with it for good type checking
andsyntax directedtranslation. A typeofatoken
canbespecifiedas%token
<stack member>tokenName. Ex: %token
NAME NUMBER

Type Description
%type
The type of a non-terminal symbol in the
Grammar rule can be specified with this. The
format is %type <stackmember>non-
terminal.
%noassoc
Specifiesthatthereisnoassociativelyofa
terminal symbol.
%left
SpecifiestheleftassociativelyofaTerminal
Symbol
%right
Specifies the right associatively of a
Terminal Symbol.
%start
Specifies the L.H.S non-terminal symbol of a
production rule which should be taken as the
startingpoint of the grammarrules.
%prec
Changes the precedence level associated
with a particular rule to that of the following
token name or literal

RULE SECTION
 The rules section simply consists of a list of grammar rules.
 A grammar rule has the form:
A: BODY
 ‘A’ represents a nonterminal name, the colon(:) and the
semicolon(;) are YACC punctuation
 ‘BODY’ represents names and literals.
 The names used in the body of a grammar rule may represent
tokens or nonterminal symbols. The literal consists of a character
enclosed in single quotes.
 Names representing tokens must be declared as follows in the
declaration sections:
%token name1 name2…

 Every name not defined in the declarations section is assumed to
represent a non- terminal symbol.
 Every non-terminal symbol must appear on the left side of at least
one rule. Of all the no terminal symbols, one, called the start symbol
has a particular importance.
 The parser is designed to recognize the start symbol.
 By default the start symbol is taken to be the left hand side of the first
grammar rule in the rules section.
 With each grammar rule, the user may associate actions to be.
 These actions may return values, and may obtain the values returned
by the previous actions.
 Lexical analyzer can return values for tokens, if desired.
 An action is an arbitrary C statement. Actions are enclosed in curly
brace.

RUNNING LEX AND YACC
PROGRAMS
Program to recognize a valid variable, which starts with a letter,
followed by any number of letters or digits.
Step 1: create lex file to recognize tokens(filename.l)
Step 2: create yacc file to recognize the pattern(filename.y)
Step 3: compile both the lex and yacc files
lex filename.l
yacc –d filename.y
gcc lex.yy.c y.y.tab.c –ll
Step 4: execute the file
./a.out

Lex file: Pg2.l
%{
#include "y.tab.h"
%}
%%
[a-zA-Z] {return L;}
[0-9] {return D;}
[tn] {return 0;}
. {printf("Invalid Variable"); exit(0);}
%%

Yacc file: pg2.y
%{
#include<stdio.h>
%}
%token L D
%%
S:V {printf("Valid Variablen");
exit(0); }
;
V : L | V L | V D
;
%%
main()
{
printf("Enter Variablen");
yyparse();
}
yyerror()
{
printf("Invalid Variablen");
}
int yywrap()
{
return 1;
}

Output :
$ vi pg2.l
$ vi pg2.y
$ lex pg2.l
$ yacc -d pg2.y
$ gcc lex.yy.c y.tab.c -ll
$ ./a.out
Enter variable
Bond007
Valid variable
$./a.out
Enter variable
007james
Invalid variable
$

REGULAR EXPRESSION
 It is used to describe the pattern.
 It is widely used to in lex.
 It uses meta language.
 The character used in this meta language are part of
the standard ASCII character set.
 An expression is made up of symbols.
 Normal symbols are characters and numbers, but
there are other symbols that have special meaning
in Lex.

Character Meaning
A-Z, 0-9, a-z Characters and numbers that form part of the pattern.
. Matches any character except n.
-
Used to denote range. Example:A-Zimplies all charactersfromAto
Z.
[ ]
Acharacter class. Matches any character in thebrackets. If thefirst
characteris ^then it indicatesa negationpattern. Example:[abC]
matches either of a, b, and C.
* Match zero or more occurrences of the preceding pattern.
+
Matches one or more occurrences of the preceding pattern.(no
empty string)
Ex: [0-9]+ matches “1”,”111” or “123456” but not an empty string.

Character Meaning
?
Matches zero or one occurrences of the preceding pattern.
Ex: -?[0-9]+ matches a signed number including an optional leading
minus.
$ Matches end of line as the last character of the pattern.
{ }
1) Indicates how many times a pattern can be present.
2) If they contain name, they refer to a substitution by that
name.

Used to escape meta characters.Also used to remove the special
meaning of characters as defined in this table.
^ Negation.
|
Matches either the preceding regular expression or the following
regular expression. Ex: cow|sheep|pig matches any of the three
words.

Character Meaning
"< symbols>" Literal meanings of characters. Meta characters hold.
/
Lookahead. Matches the preceding pattern only if followed by
the succeeding expression. Example:A0/1 matchesA0 only if
A01 is the
input.
( )
Groups a series of regular expressions together into a new
regular expression. Ex: (01) represents the character sequence
01.
Parentheses are useful when building up complex patterns with
*,+ and |

A WORD COUNT PROGRAM
Program to count the number of characters, words, spaces and
lines in a given input file.
 To count and recognize character we use variable ‘c’
 To count and recognize word we use variable ‘w’
 To count and recognize space we use variable ‘s’
 To count and recognize lines we use variable ‘l’
 Create a text file to hold the input.

%{
int c=0,w=0,s=0,l=0;
%}
%%
[ t] {s++;w++;}
[n] {l++;w++;}
[a-z A-Z 0-9] {c++;}
%%
int main(int argc,char *argv[])
{
if(argc==2)
{
yyin=fopen(argv[1],"r");
yylex();
printf(" no of charcters are:%dn",c);
printf(" no of words are:%dn",w);
printf(" no of lines are:%dn",l);
printf(" no of spaces are:%dn",s);
}
else
{
printf(" insufficient argumentn");
return 0;
}
Pg3.l

Text.txt
hi welcome to word count program
Lets begin

Output:
[student@localhost ~]$ gedit pg3.l
[student@localhost ~]$ lex pg.l
[student@localhost ~]$ vi text.txt
[student@localhost ~]$ cc lex.yy.c
[student@localhost ~]$ ./a.out text.txt
The number of lines=2
The number of spaces=8
The number of words=8
The number of characters are=36
[student@localhost ~]$ cat text.txt
hi welcome to word count program
Lets begin

PROGRAM TO EVALUATE AN ARITHMETIC EXPRESSION INVOLVING
OPERATORS +, -, * AND /.
 To identify numbers, lex rule is
[0-9]+|([0-9]*.[0-9]+) this will retrun NUM
 To identify the operators from the runtime input we have lex
action as [-+*/()] and in pattern part we return the data in
yytext
 to identify the invalid Expression we use lex rule . In action
part
 Lex file will only identify the numbers and operators.
 Yacc file will perform the necessary arithmetic operations
and will give the file answer of the expression

Lex part : lab1b.l
%{
#include"y.tab.h"
%}
%%
[0-9]+|([0-9]*.[0-9]+) {yylval.dval = atof(yytext); return NUM;}
[-+*/()] {return (*yytext);}
n { return (*yytext); }
. { printf("Invalid Expressionn"); }
%%

Yacc Part: lab1b.y
%{
#include<stdio.h>
%}
%union
{
double dval;
}
%token<dval> NUM
%left '+''-'
%left '*''/'
%left '('')'
%type<dval>expr
%%

line : expr"n" { printf("valid expression and value is =%gn",$1); exit(0); }
;
expr : expr'+'expr {$$=$1+$3;}
| expr'-'expr {$$=$1-$3;}
| expr'*'expr {$$=$1*$3;}
| expr'/'expr {if($3==0) { printf("Divide by zero errorn"); exit(0); }
else {$$=$1/$3; } }
| '('expr')' {$$=$2;}
| NUM {$$=$1;}
;
%%

main( )
{
printf("n enter the arithmetic expression:n");
yyparse();
}
yyerror( )
{
printf("n invalid expressionn");
exit(0);
}
int yywrap( )
{
return 1;
}

OUTPUT:
[student@localhost ~]$ lex lab1b.l
[student@localhost ~]$ yacc -d lab1b.y
[student@localhost ~]$ gcc lex.yy.c y.tab. -ll
[student@localhost ~]$ ./a.out
Enter the arithmetic expression 2.0+3.5*4.0
Valid expression and the value is 16
[student@localhost ~]$ ./a.out
Enter expression a+b
invalid expression
[student@localhost ~]

PROGRAM TO RECOGNIZE STRINGS ‘AAAB’, ‘ABBB’, ‘AB’
AND ‘A’ USING THE GRAMMAR (AN BN , N>= 0).
Lex part : lab5b.l
%{
#include"y.tab.h"
%}
%%
a return A;
b return B;
[ ]+ return empty;
n return *yytext;
. { printf( "Invalid String n" ); exit(0); }
%%

Yacc Part : lab5b.y
%{
#include<stdio.h>
%}
%token A B empty
%%
S1 : S 'n' { printf( "Valid String n" ); exit(0); }
| S2'n' { printf( "Valid String n" ); exit(0); }
;
S : A B
| A S B
;
S2 : empty
;
%%

main( )
{
printf("Enter string:n"); yyparse();
}
yyerror( )
{
printf("Invalid String n ");
}
int yywrap( )
{
return 1;
}

OUTPUT:
[root@localhost ~]# lex exp5b.l
[root@localhost ~]# yacc -d exp5b.y
[root@localhost ~]# cc lex.yy.c y.tab.c
[root@localhost ~]# ./a.out
enter the string aaabbb
valid string
[root@localhost ~]# ./a.out
enter the string
aab
invalid string
[root@localhost ~]#

A WORKING INTRODUCTION TO
SHIFT-REDUCE PARSING
 YACC uses shift-reduce parsing methodology to parse the given input.
The shift-reduce parser is essentially a push down automaton . It consists
of a finite state machine with a stack. The stack is used to hold terminal
and/or non-terminal symbols. The following is a gentle introduction to
shift-reduce parsing.
 A shift-reduce parser is initialized in the following configuration.
STACK: $ I/P BUFFER: <Input to be parsed> $
 The input to be parsed, which is a sequence of terminal symbols, is
stored in an input buffer with '$' symbol at the end (used as an end-
marker). The stack is initialized to contain just the symbol '$'.

The parser works by repeatedly performing the following actions :
1. Read the next terminal symbol from the input and push it into the
stack and removing it from the input. This operation is called a shift. (The
shift operation will be explained in detail later.)
2. Do some conditional operations on the stack. These operations are
called reductions. Not every iteration may involve reductions.
(Reductions will be explained in detail later.)
3. Until an error is encountered or the input is successfully parsed.

 Parsing ends successfully when the input buffer is empty (except for the
end-marker '$') and the stack contains nothing but the '$' followed by
the start symbol of the grammar. Error condition occurs when the input
does not belong to the language of the grammar and the parser detects
the same. We will look at error conditions later.
 Consider the following context free grammar. This will be used as a
running example for this section.
expr : expr '+' expr (Production 1)
| expr '*' expr (Production 2)
| '(' expr ')' (Production 3)
| '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' (Production 4)
;

Let us consider parsing of the input 2+2*3 using this grammar. When
the parsing process begins, the contents of the stack and the input
buffer would be as follows:
STACK: $ I/P BUFFER: 2 + 2 $
STACK: $ expr I/P BUFFER: $

At each step of parsing, the parser takes an action resulting in a
configuration change. A shift-reduce parser can take four
possible parser-actions:
1. Shift is the parser-action of removing the next unread terminal
from the input buffer and pushing it into the stack. (The input
terminal gets “shifted” to the stack).
2. Reduce is the parser-action of replacing one or more grammar
symbols from the top of the stack that matches a body of a
production, with the corresponding production head. The
contents on top of the stack which matches the right side of a
production is called a handle. The process of replacing a handle
with the corresponding production head is called a reduction.
3. Accept is the parser-action indicating that the entire input has
been parsed successfully. The parser executes an accept action
only if it reaches the accepting configuration – one in which the
input buffer is empty and the stack contains just the start variable
followed by '$'
STACK: $ <start_variable> I/P BUFFER: $

THE PARSER'S ITERATION STEPS
Input : 2 + 3 * ( 4 + 5 ) $
Stack Input Buffer
$ 2 + 3 * ( 4 + 5 ) $
$2 + 3 * ( 4 + 5 ) $
$expr + 3 * ( 4 + 5 ) $
$expr+ 3 * ( 4 + 5 ) $
$expr+3 * ( 4 + 5 ) $
$ expr + expr * ( 4 + 5 ) $
$ expr + expr* ( 4 + 5 ) $
$ expr + expr * ( expr +5) $
$ expr + expr * ( expr + expr )$
$ expr + expr * ( expr ) $
$expr + expr * expr $
$expr + expr $
$expr $

module 4.pptx

More Related Content

Similar to module 4.pptx

Recently uploaded

module 4.pptx