Ppcrslidesannotated

Solving Difﬁcult LR Parsing Conﬂicts by
Postponing Them

Luis Garcia-Forte and Casiano Rodriguez-Leon
Universidad de La Laguna

CoRTA2010 September 09-10, 2010

Outline

Introduction

Context Dependent Parsing

Nested Parsing

Conclusions

Yacc Limitations

Yacc-like LR parser generators provide ways to solve
shift-reduce mechanisms based on token precedence.

Yacc Limitations

No mechanisms are provided for the resolution of
reduce-reduce conflicts or difficult shift-reduce conflicts.

Yacc Limitations

No mechanisms are provided for the resolution of
reduce-reduce conflicts or difficult shift-reduce conflicts.
To solve such kind of conflicts the language designer has
to modify the grammar.

Postponed Conﬂict Resolution Approach


The strategy presented here extends yacc conﬂict
resolution mechanisms with new ones, supplying ways to
resolve conﬂicts that can not be solved using static
precedences.


The strategy presented here extends yacc conﬂict
resolution mechanisms with new ones, supplying ways to
precedences.
The algorithm for the generation of the LR tables remains
unchanged, but the programmer can modify the parsing
tables during run time.

PPCR Approach: Identify the Conﬂict


Identify the conﬂict: What LR(0)-items/productions and
tokens are involved?.


Tools must support that stage, as for example via the
.output ﬁle generated by eyapp.


Suppose we identify that the participants are the two
LR(0)-items


LR(0)-items
A → α↑


LR(0)-items
A → α↑
and


LR(0)-items
A → α↑
and
B → β↑


LR(0)-items
A → α↑
and
B → β↑
when the lookahead token is


LR(0)-items
A → α↑
and
B → β↑
when the lookahead token is
@.


The software must allow the use of symbolic labels to refer
by name to the productions involved in the conﬂict.


The software must allow the use of symbolic labels to refer
by name to the productions involved in the conﬂict.
In Eyapp names and labels can be explicitly given using
the directive %name, using a syntax similar to this one:

%name :rA A → α
%name :rB B → β


Give a symbolic name to the conﬂict. In this case we
choose isAorB as name of the conﬂict.


Give a symbolic name to the conflict. In this case we
choose isAorB as name of the conflict.
Inside the body section of the grammar, mark the points of
conflict using the new reserved word %PREC followed by
the conflict name:

%name :rA A → α %PREC IsAorB
%name :rA B → β %PREC IsAorB

Conflict Identification: Shift-Reduce Conflicts

The procedure is similar for shift-reduce conflicts.


Let us assume we have identiﬁed a shift-reduce conﬂict
between LR-(0) items


A → α↑


A → α↑
and


A → α↑
and
B→β↑γ


A → α↑
and
B → β ↑ γ for some token ’@’.


A → α↑
and
B → β ↑ γ for some token ’@’.
Again, we must give a symbolic name to A → α and mark
with the new %PREC directive the places where the conﬂict
occurs:

%name :rA A → α %PREC IsAorB
B →β %PREC IsAorB γ

PPCR Approach: The Conﬂict Handler


Introduce a %conflict directive inside the head section
of the translation scheme to specify the way the conﬂict will
be solved.


be solved.
The directive is followed by some code - known as the
conﬂict handler - whose mission is to modify the parsing
tables.


be solved.
The directive is followed by some code - known as the
conﬂict handler - whose mission is to modify the parsing
tables.
This code will be executed each time the associated
conﬂict state is reached.


This is the usual layout of the conﬂict handler for a
reduce-reduce conﬂict:
%conflict IsAorB {
if (is_A) {
$self-YYSetReduce(’@’, ’:rA’ );
}
else {
$self-YYSetReduce(’@’, ’:rB’ );
}
}


reduce-reduce conﬂict:
%conflict IsAorB {
if (is_A) {
}
else {
$self-YYSetReduce(’@’, ’:rB’ );
}
}
Inside a conﬂict code handler the Perl default variable $_
refers to the input and $self refers to the parser object.


shift-reduce conﬂict:
%conflict IsAorB {
if (is_A) {
}
else {
$self-YYSetShift(’@’);
}
}

The method YYSetReduce receives a list of tokens and a
production label, like :rA. The call

$self-YYSetReduce(’@’, ’:rA’ )

sets the parsing action for the state associated with the
conﬂict IsAorB to reduce by the production :rA when the
current lookahead is @.

The method YYSetReduce receives a list of tokens and a
production label, like :rA. The call

$self-YYSetReduce(’@’, ’:rA’ )

conﬂict IsAorB to reduce by the production :rA when the
current lookahead is @.
The method YYSetShift receives a token (or a reference
to a list of tokens). The call

$self-YYSetShift(’@’)

conﬂict IsAorB to shift when the current lookahead is @.


%conflict IsAorB {
if (is_A) { $self-YYSetReduce(’@’, ’:rA’ );
}
else { $self-YYSetReduce(’@’, ’:rB’ ); }
}
The call to is_A represents the context-dependent
dynamic knowledge that allows us to take the right
decision.


%conflict IsAorB {
if (is_A) { $self-YYSetReduce(’@’, ’:rA’ );
}
else { $self-YYSetReduce(’@’, ’:rB’ ); }
}
The call to is_A represents the context-dependent
dynamic knowledge that allows us to take the right
decision.
It is usually a call to a nested parser to check that A → α
applies but it can also be any other contextual information
we have to determine which one is the right production.

A Simple Example: Context Dependant Associativity

$ cat input_for_dynamicgrammar.txt
2-1-1 # left: 0 = (2-1)-1


2-1-1 # left: 0 = (2-1)-1

RIGHT


2-1-1 # left: 0 = (2-1)-1

RIGHT

2-1-1 # right: 2 = 2-(1-1)


2-1-1 # left: 0 = (2-1)-1

RIGHT

2-1-1 # right: 2 = 2-(1-1)

LEFT


2-1-1 # left: 0 = (2-1)-1

RIGHT

2-1-1 # right: 2 = 2-(1-1)

LEFT

3-1-1 # left: 1 = (3-1)-1


2-1-1 # left: 0 = (2-1)-1

RIGHT

2-1-1 # right: 2 = 2-(1-1)

LEFT

3-1-1 # left: 1 = (3-1)-1

RIGHT
3-1-1 # right: 3 = 3-(1-1)

A Simple Example: Body
p: c * {}
;

p: c * {}
;
c: $expr { print $exprn }
| RIGHT { $reduce = 0}
| LEFT { $reduce = 1}
;

p: c * {}
;
;
expr: ’(’ $expr ’)’ { $expr }
| NUM

p: c * {}
;
;
| NUM

| %name :M
expr.left %PREC leftORright
’-’ expr.right %PREC leftORright
{ $left - $right }

p: c * {}
;
;
| NUM

| %name :M
expr.left %PREC leftORright
’-’ expr.right %PREC leftORright
{ $left - $right }
;
%%

A Simple Example: Head Section
$ cat dynamicgrammar.eyp


%{
my $reduce = 1;
%}


%{
my $reduce = 1;
%}
Automatically
%whites /(s*(?:#.*)?s*)/ Generated
%token NUM = /(d+)/ Lexical Analyzer


%{
my $reduce = 1;
%}

%whites /(s*(?:#.*)?s*)/
%token NUM = /(d+)/

%conflict leftORright {
if ($reduce) { $self-YYSetReduce(’-’, ’:M’) }
else { $self-YYSetShift(’-’) }
}


%{
my $reduce = 1;
%}

%whites /(s*(?:#.*)?s*)/
%token NUM = /(d+)/

%conflict leftORright {
if ($reduce) { $self-YYSetReduce(’-’, ’:M’) }
else { $self-YYSetShift(’-’) }
}

%expect 1

%%

Enumerated and Subrange Types in Pascal

The problem is taken from the Bison manual.
type subrange = lo .. hi;
type enum = (a, b, c);


The Pascal standard allows only numeric literals and
constant identiﬁers for the subrange bounds (lo and hi),
but Extended Pascal and many other implementations
allow arbitrary expressions there.


type subrange = (a) .. b;
type enum = (a);


Can't decide up to
type subrange = (a) .. b;
here
type enum = (a);
To aggravate the conﬂict we have added the C comma
operator to the language
type subrange = (a, b, c) .. (d, e);

Enumerated vs Range: Full Grammar
%token ID = /([A-Za-z]w*)/
%token NUM = /(d+)/

%token NUM = /(d+)/
%left ’,’
%left ’-’ ’+’
%left ’*’ ’/’

%token NUM = /(d+)/
%left ’,’
%left ’-’ ’+’
%left ’*’ ’/’
%%
type_decl : ’TYPE’ ID ’=’ type ’;’ ;

%token NUM = /(d+)/
%left ’,’
%left ’-’ ’+’
%left ’*’ ’/’
%%
type : ’(’ id_list ’)’ | expr ’..’ expr ;

%token NUM = /(d+)/
%left ’,’ These are the
%left ’-’ ’+’ productions
%left ’*’ ’/’ involved in the
%% conflict
type : ’(’ id_list ’)’ | expr ’..’ expr ;
id_list : ID | id_list ’,’ ID ;
expr : ID
| NUM | ’(’ expr ’)’
| expr ’,’ expr /* new */
| expr ’+’ expr | expr ’-’ expr
| expr ’*’ expr | expr ’/’ expr
;

Identifying the Conﬂict
$ eyapp -v pascalenumeratedvsrange.eyp
2 reduce/reduce conflicts

Identifying the Conﬂict
$ eyapp -v pascalenumeratedvsrange.eyp
2 reduce/reduce conflicts

State 11:
id_list - ID . (Rule 4)
expr - ID . (Rule 12)

’)’ [reduce using rule 12 (expr)]
’)’ reduce using rule 4 (id_list)
’,’ [reduce using rule 12 (expr)]
’,’ reduce using rule 4 (id_list)

’*’ reduce using rule 12 (expr)
’+’ reduce using rule 12 (expr)
’-’ reduce using rule 12 (expr)
’/’ reduce using rule 12 (expr)

Enumerated vs Range: Body

type_decl : ’type’ ID ’=’ type ’;’ ;



type :
%name ENUM
Lookahead ’(’ id_list ’)’
| %name RANGE
Lookahead expr ’..’ expr
;



type :
%name ENUM
Lookahead ’(’ id_list ’)’
| %name RANGE
Lookahead expr ’..’ expr
;
Lookahead: /* empty */
{ $is_range = $_[0]-YYPreParse(’Range’); }
;

Nested Parsing: Future Work

$ cat Range2.eyp

%from pascalnestedeyapp2 import expr
%%
range: expr ’..’ expr ’;’
;
%%

id_list :
%name ID:ENUM
ID %PREC rangeORenum
| id_list ’,’ ID
;

id_list :
%name ID:ENUM
| id_list ’,’ ID
;
expr : ’(’ expr ’)’ { $_[2] } /* bypass */
| %name ID:RANGE
| %name PLUS expr ’+’ expr
| %name MINUS expr ’-’ expr
| %name TIMES expr ’*’ expr
| %name DIV expr ’/’ expr
| %name COMMA expr ’,’ expr
| %name NUM NUM
;

%%

Enumerated vs Range: Header
%{
my $is_range = 0;
%}
%conflict rangeORenum {
if ($is_range)
{ $self-YYSetReduce([’,’, ’)’], ’ID:RANGE’ ); }
else
{ $self-YYSetReduce([’,’, ’)’], ’ID:ENUM’ ); }
}

%{
my $is_range = 0;
%}
if ($is_range)
else
}
%token NUM = /(d+)/

%{
my $is_range = 0;
%}
if ($is_range)
else
}
%token NUM = /(d+)/
%left ’,’
%left ’-’ ’+’
%left ’*’ ’/’

%{
my $is_range = 0;
%}
if ($is_range)
else
}
%token NUM = /(d+)/
%left ’,’
%left ’-’ ’+’
%left ’*’ ’/’
%expect-rr 2
%%

Enumerated vs Range: Execution

$ eyapp -P Range.eyp



$ eyapp -TC pascalnestedeyapp2.eyp




$ ./pascalnestedeyapp2.pm -t -i -m 1
-c ’type e = (x, y, z);’




$ ./pascalnestedeyapp2.pm -t -i -m 1
-c ’type e = (x, y, z);’

typeDecl_is_type_ID_type(
TERMINAL[e],
ENUM(
idList_is_idList_ID(
idList_is_idList_ID(ID(TERMINAL[x]),
TERMINAL[y]),
TERMINAL[z]
)))

Conclusions

The new conﬂict resolution mechanisms provide ways to
precedences.

Conclusions

precedences.
They also provides ﬁner control over the conﬂict resolution
process than existing alternatives, like GLR and
backtracking LR.

Conclusions

precedences.
backtracking LR.
There are no limitations to PPCR parsing, since the conﬂict
handler is implemented in a universal language and it then
can resort to any kind of nested parsing algorithm.

Conclusions

precedences.
backtracking LR.
The conﬂict resolution mechanisms presented here can be
introduced in any LR parsing tools.

Conclusions

precedences.
backtracking LR.
The conﬂict resolution mechanisms presented here can be
introduced in any LR parsing tools.
It certainly involves more programmer work than branching
methods like GLR or backtracking LR.

http://parse-eyapp.googlecode.com/svn/branches/PPCR

http://parse-eyapp.googlecode.com/svn/branches/PPCR

Thanks!

Ppcrslidesannotated

Recommended

Recommended

More Related Content

Similar to Ppcrslidesannotated

Similar to Ppcrslidesannotated (20)

Recently uploaded

Recently uploaded (20)

Ppcrslidesannotated