Ch08

LR Parsing
제 8 장
LR 구문 분석
컴파일러 입문

LR Parsing 2
8.1 LR Parsers
8.2 The Canonical Collection of LR(0) Items
Construction of LR Parsing Tables
8.3 SLR Method
8.4 CLR Method
8.5 LALR Method
8.6 Deterministic Parsing of Ambiguous Grammars
8.7 Compaction of LR Parsing Tables
8.8 Implementation of an LR Parser
차 례

8.1 LR Parsers
 efficient Bottom-up parsers for a large and useful class of context-
free grammars.
 the "L" stands for left-to-right scan of the input;
the "R" for constructing a Rightmost derivation in reverse.
 The attractive reasons of LR parsers
(1) LR parsers can be constructed for most programming languages.
(2) LR parsing method is more general than LL parsing method.
(3) LR parsers can detect syntactic errors as soon as possible.
But,
 it is too much work to implement an LR parser by hand for a typical
programming-language grammar.
=====>  Parser Generator
LR Parsing 3

 Parser Generating Systems
 The driver routine is the same for all LR parsers;
 only the parsing table changes from one parser to
another.
Grammar
<BNF Notations>
PGS Parsing Table
Parsing
Table
Input Output
Driver
Routine
LR Parsing 4

 LR parser
 Stack : S0X1S1X2 ••• XmSm, where Si : state and Xi  V.
 Configuration of an LR parser :
(S0X1S1 ••• XmSm, aiai+1 ••• an$)
stack contents unscanned input
Sm
a1
stack
Parsing
Table
Driver
Routine
… ai … an $ : input
LR Parsing 5

 Parsing Table (ACTION table + GOTO table)
 The LR parsing algorithm
::= same as the shift-reduce parsing algorithm.
 Four Actions : 1. shift
2. reduce
3. accept
4. error
symbols
states
<Terminals> <Nonterminals>
…
…
…
ACTION Table GOTO Table
LR Parsing 6

1. ACTION[Sm,ai] = shift S
::= (S0X1S1  XmSm, aiai+1  an$)
 (S0X1S1  XmSmaiS, ai+1  an$)
2. ACTION[Sm,ai] = reduce A  α and |α| = r
::= (S0X1S1  XmSm, aiai+1  an$)
 (S0X1S1  Xm-rSm-r, aiai+1  an$), GOTO(Sm-r , A) = S
 (S0X1S1  Xm-rSm-rAS, aiai+1  an$)
3. ACTION [Sm,ai] = accept, parsing is completed.
4. ACTION [Sm,ai] = error, the parser has discovered an error
and calls an error recovery routine.
LR Parsing 7

ex) G: 1. LIST  LIST , ELEMENT
2. LIST  ELEMENT
3. ELEMENT  a
Parsing Table :
where, sj means shift and stack state j,
ri means reduce by production numbered i,
acc means accept, and blank means error.
5 r1 r1
4 5s3
3 r3 r3
2 r2 r2
1 s4 acc
0 1 2s3
symbols
states
LIST ELEMENTa , $
LR Parsing 8

 Input :  = a, a
 Parsing Configuration :
(0 , a,a$)
(0 a 3 , ,a$)
(0 ELEMENT 2 , ,a$)
(0 LIST 1 , ,a$)
(0 LIST 1 , 4 , a$)
(0 LIST 1 , 4 a 3 , $)
(0 LIST 1 , 4 ELEMENT 5 , $)
(0 LIST 1 , $)  accept
LR Parsing 9
s3
r3 GOTO2
r2 GOTO1
s4
s3
r3 GOTO5
r1 GOTO1
initial
configuration
1. LIST  LIST , ELEMENT
2. LIST  ELEMENT
3. ELEMENT  a
r1r15
5s34
r3r33
r2r22
accs41
21s30
ELEMENTLIST$,a
GOTO TableACTION TableSymbol
states

 The techniques for producing LR parsing tables
 Simple LR(SLR) - LR(0) items, FOLLOW
 Canonical LR(CLR) - LR(1) items
 Lookahead LR(LALR) - LR(1) items
LR(0), Lookahead
CLR
LALR
SLR
LR Parsing 10
구현은 쉽지만 강력하지 못함
(주어진 문법으로 부터 parsing table을 구축 못함)
가장 강력하지만 만들기가 매우 어렵다
일반적으로 사용하는 방법임

 The method for constructing an LR parsing table from a grammar
 Collection of LR(0) Items의 canonical collection인 C0 요구
① SLR
② LALR
③ CLR
 Definition : an LR(0) item
 a production with a dot at some position of the right side.
ex) A  XYZ  P,
[A  .XYZ] [A  X.YZ] [A  XY.Z] [A  XYZ.]
A → X.YZ 의미 : X는 이미 읽었고 YZ는 앞으로 읽을 기호들을 나타냄,
YZ를 다 읽으면 A → XYZ로 reduce 됨
.
LR Parsing 11
8.2 The Canonical Collection of LR(0) Items
LR(0) items

 mark symbol ::= the symbol after the dot if it exists.
 예 : LR(0) Item [A → X.YZ]에서 mark symbol은 Y이고
[A → XYZ.]은 mark symbol을 갖고 있지 않다.
 kernel item ::= [A  α.] if α,or A = S‘ .
 closure item ::= [A .α] 인 경우처럼 dot symbol이 처음에 있는 item
 the result of performing the CLOSURE operation
 reduce item ::== [A → .]와 같이 production rule 끝에 dot symbol이 있
는 item
Introduction to Compiler Design Theory Page 12

 [Aα.β] means that
 an input string derivable from α has just been seen,
 if next seeing an input string derivable from β,
we may be able to reduce by the production A  αβ.
 Definition : Augmented Grammar
G = (VN, VT, P, S)
 G' = (VN  {S’},VT, P  {S'  S}, S')
where, S' is a new start symbol not in VN.
 The purpose of this new starting production is to indicate to
the parser when it should stop parsing and announce acceptance
of the input. That is, acceptance occurs when and only when
the parser is about to reduce by S'  S.
LR Parsing 13

 If S αAωαβ1β2ω, then αβ1 : viable prefix.
"viable prefix : 주어진 grammar로 부터 우측 유도 과정 도중에 만들어지는 우문장
형태의 prefix로서 아직 handle까지 확장되지 않은 state를 의미하며, 이미 syntax
analysis되어 parsing stack에 들어 있는 symbol과 같다.
 We say item [Aβ1.β2] is valid for a viable prefix
if there is a derivation S  αAω  αβ1β2ω,
“In general, an item will be valid for many viable prefixes.”
 LR(0) item [A→1.2]이 valid하다는 의미
 stack의 내용이 1일 때 shift할 것인가 또는 reduce할 것인가를 결정하게
 2   : 아직까지 handle이 stack top부분에 있지 않은 것이기 때문에 shift
 2 =  : [A→1.]의 형태가 되어 1이 handle이므로 이 production rule로 reduce
 임의의 LR grammar에서 모든 viable prefix에 관해 valid한 LR(0) item의 집합을
모으면 주어진 string을 syntax analysis 할 수 있는 LR parser (parsing table)을
만들 수 있다.
LR Parsing 14
*
rm rm
rm rm
*

 Canonical collection of LR(0) items
::= the set of valid items for each viable prefix that can appear on the stack
of an LR parser.
 Computation : CLOSURE & GOTO function
 The CLOSURE operation
 Definition :
CLOSURE(I)
= I  {[B  . ] | [A  .B]  CLOSURE(I), B    P}
 Meaning :
[A  .B] in CLOSURE(I) indicates that, at some point in the
parsing process, we next expect to see a substring derivable from B
as input.
If B   is a production, we would also expect to see a substring
from  at this point. For this reason, we also include [B  . ] in
CLOSURE(I).
Page 15

 CLOSURE
 Valid한 LR(0) item들의 집합을 수집하기 위해 주어진 grammar에
production rule S'→S 를 추가하고 이 production rule로 부터 시작하
여 marking을 하며 차례로 LR(0)item의 집합을 구한다. 이때 mark
symbol이 [A → .B] 와 같이 non-terminal인 경우는 이 non-
terminal을 lhs로 갖는 LR(0) item도 같은 집합에 속해야 하는데 이를
clousure라 함
 같은 집합에 속한 LR(0) item들이 한 개의 state가 된다.
 I를 정의된 grammar의 item 집합이라 하면 , I의 CLOSURE
CLOSURE(I) = I ∪ {[B → .] | [A → .B] ∈ CLOSUR(I), B→ ∈P}.
 CLOSURE는 먼저 자신을 포함하고 mark symbol이 non-terminal인
경우 이 production rule 들을 closure item으로 만들어 추가
 CLOSURE에 의해 구해진 item들의 mark symbol이 non-terminal이
면, 그 item에 반복적으로 CLOSURE 계산이 적용됨
LR Parsing 16

 Computing Algorithm:
Algorithm CLOUSURE(I) ;
begin
CLOUSURE := I ;
repeat
if [A  .B ]  CLOSURE and B    P then
if [B  .]  CLOSURE then
CLOSURE := CLOSURE ∪ {[B  .]}
fi
fi
until no change
end.
LR Parsing 17

ex) E'  E
E  E + T | T
T  T  F | F
F  (E) | id
 CLOSURE ({[E' .E]})
= {[E' .E], [E .E+T], [E .T], [T .TF], [T .F],
[F .(E)], [F .id]}.
 CLOSURE({[E  E.+T]}) = { [E  E.+T] }.
ex) S  AS | b
A  SA | a
 CLOSURE({[S  A.S]}) = {[S  A.S], [S .AS], [S .b],
[A .SA], [A .a]}.
LR Parsing 18
교과서 348쪽
연습문제 8.6
• Mark symbol이 terminal일 때, CLOSURE는 자기 자신
• CLOSURE함수는 한 state에서 보아야 되는 모든 LR(0)
item을 구하는 것
• 한 state에서 다음 state로 가기 위해서는 GOTO함수 이용
• 현재 marking한 위치에서 handle을 찾기 위해 mark symbol을
차례로 읽어 나가는 것을 의미

 The GOTO operation
 Mark symbol을 parsing하여 이동한 다음 state를 구하는 GOTO함수
 Definition : GOTO(I,X)
= CLOSURE({[A   X. ] | [A  .X]  I, X ∈ V}).
 Meaning :
 If I is the set of items that are valid for some viable prefix , then
GOTO(I,X) is the set of items that are valid for the viable prefix X.
 I state에서 X를 parsing하여 이동한 state는mark symbol이 X인 LR(0)
item들을 모두 고려하여 dot symbol을 mark symbol 다음으로 위치 시킨
LR(0) item들의 CLOSURE연산을 행한 결과
ex) I = {[E'  E.], [E  E.+T]}
GOTO(I,+) = CLOSURE({[E  E+.T]})
= {[E  E+.T], [T .TF], [T .F], [F .(E)], [F .id]}
I = {[E → .T], [T → .T * F], [T → .F], [F→.(E)], [F.id]} 일 때,
GOTO(I, T) = CLOSURE({[E → T.], [T → T .* F]})
= {[E → T.], [T → T .* F]}
LR Parsing 19

 Canonical Collection
C0 = {CLOSURE ({[S' .S]})} ∪ {GOTO(I,X) | I ∈ C0, X ∈ V}
 추가된 production rule (S'→S)에서부터 차례로 CLOSURE 함수와
GOTO 함수를 적용하여 모든 타당한 LR(0) item의 집합들을 구할 수
있으며 이들을 원소로 갖는 집합
 주어진 grammar로 부터 C0를 구성하는 방법
 Start state는 추가된 production rule의 LR(0) item [S'→.S]의 CLOSURE
 Start state가 I0 state, GOTO 함수를 이용해서 다음 state를 구하여 I1 state
를 만듦
 이 같은 과정으로 각 mark symbol에 따라 GOTO 함수를 적용하여 state를
만들어 나가며 새로 만든 state가 기존 state와 다르면 새로운 state로 추가
 각 state에서 이와 같은 방법을 고려하여 새로운 state가 더 이상 만들어지
지 않을 때까지 계속한다.
 하나의 state는 LR(0) item의 집합, C0는 state들의 집합
C0 = {I0, I1, ···, In}
LR Parsing 20

 We are now ready to give the algorithm to construct C0, the
canonical collection of sets of LR(0) items for an augmented
grammar; the algorithm is the following:
Algorithm Canonical_Collection;
begin
C0 := { CLOSURE({[S' . S]}) };
repeat
for I ∈ C0 do
Closure := CLOSURE(I);
for each X ∈ MARK SYMBOL of Closure do
J := GOTO(I,X);
if  Ji = J then GOTO[I,X] := Ji
else GOTO[I,X] := J;
C0 := C0 ∪ {J}
fi
end for
end for
until no change
end.
LR Parsing 21

22
 예 : 다음 grammar에 대한 C0 구성.
LIST → LIST, ELEMENT
LIST → ELEMENT
ELEMENT → a
 (1) 추가된 grammar :
ACCEPT → LIST
LIST → LIST, ELEMENT
LIST → ELEMENT
ELEMENT → a
(2) C0
 I0 : CLOSURE({[ACCEPT→.LIST]}) = {[ACCEPT→.LIST],
[LIST→.LIST, ELEMENT], [LIST→.ELEMENT], [ELEMENT→.a] }.
 GOTO(I0, LIST) = I1 ={[ACCEPT→LIST.], [LIST→LIST., ELEMENT]}.
 GOTO(I0, ELEMENT) = I2 = {[LIST→ELEMENT.]}.
 GOTO(I0, a) = I3 = {[ELEMENT→a.]}.
 GOTO(I1, ,) = I4 = {[LIST→LIST, .ELEMENT], [ELEMENT→.a]}.
 GOTO(I4, ELEMENT) = I5 = {[LIST→LIST, ELEMENT.]}.
 GOTO(I4, a) = I3 .

 GOTO graph
::= a directed graph in which the nodes are labeled by the
sets of items and the edges by grammar symbol.
Ex)
I0
I1
I2
I3
I4 I5
ELEMENTELEMENT
LIST ,
aa
LR Parsing 23

 예 : 다음 grammar에 대한 C0 를 GOTO graph를 이용하여 구성.
먼저 추가된 production rule의 LR(0) item [P'→.P] 에서 시작하고
CLOSURE함수와 GOTO 함수를 적용하여
더 이상 새로운 state가 만들어지지 않을 때까지 계속.
P → b D ; S e
D → d ; D | d
S → s ; S | s

I1
I0
[P' P.]
[P' .P]
[P .bD;Se]
[P bD.;Se]
I3
[P bD;.Se]
[S .s;S]
[S .s]
I5
[P bD;S.e]
I7
[P bD;Se.]
I10
[P b.D;Se]
[D .d;D]
[D .d]
I2
[S s.;S]
[S s.]
I8
[S s;.S]
[S .s;S]
[S .s]
I11
[D d.;D]
[D d.]
I4
[D d;.D]
[D .d;D]
[D .d]
I6
[D d;D.]
I9
[S s;S.]
I12
S
e
s
S
D
;
;
P
b
D d
d
s
;
-production에 대한
LR(0) item [A → .]은
closure item인 동시에
reduce item이 된다.

Construction of LR Parsing Tables
 Three methods
 SLR(simple LR) - C0, Follow
 CLR(Canonical LR) - C1
 LALR(Lookahead LR)  C1
 C0. Lookahead
 Parsing Table
symbols
states
VT Ｕ {$} VN
0
1
2
3
shift
reduce
accept
error
GOTO
…
LR Parsing 25

 State i is constructed from Ii, where Ii ∈ C0.
 The size of parsing table depends on the number of
states of C0.
But, |C0| << |C1| .
 The size of parsing table :
SLR : |V| x |C0|
CLR : |V| x |C1|
LALR : |V| x |C0|
LR Parsing 26

::= The method constructing the SLR parsing table from the C0.
 Constructing Algorithm: C0 = {I0,I1,I2,...,In}
1. ACTION[i,a] := "shift j"
if [A  .a ] ∈ Ii and GOTO(Ii,a) = Ij.
2. ACTION[i,a] := "reduce A  α", for all a ∈ FOLLOW(A)
if [A  .] ∈ Ii .
3. ACTION[i,$] := "accept" if [S'  S.] ∈ Ii .
4. GOTO[i,A] := j if GOTO(Ii, A) = Ij.
5. "error" for all undefined entries and initial state is i if [S' .S] ∈ Ii .
 reduce item에 대해 FOLLOW를 사용하여 resolve.
LR Parsing 27
8.3 Constructing an SLR parsing table

 SLR 방법의 특징
 Reduce item에 대하여 그 production rule의 lhs의 FOLLOW symbol을 보
고 reduce action을 한다는 것
 한 state에 item [A→.]가 있다는 의미는  를 parsing하여 그 state로 이동한
것이기 때문에 그 state에서 production rule A→로 reduce action을 해야 함
 이때 A의 FOLLOW symbol에 대하여 parsing table의 entry를 채운다는 것
 SLR 방법에서 FOLLOW 심벌을 보고 reduce 한다는 것은 주어진 string을
syntax analysis하기 위해 production rule의 non-terminal 다음에 나오는
symbol을 보고 reduce action을 결정하는 것으로 이미 본 symbol들은 stack
의 top에 위치
 그러면 non-terminal의 FOLLOW symbol을 보고 reduce할 production rule 을
선택할 수 있으며 rhs의 길이만큼 stack에서 심벌들을 제거하고 lhs의 non-
terminal로 대치
 이와 같은 과정을 거쳐 시작 symbol에 이르면 주어진 string이 올바른 문장
임을 나타내고 그러지 않으면 틀린 문장.

ex) G : 0. A  L (A : ACCEPT, L : LIST, E : ELEMENT)
1. L  L , E
2. L  E
3. E  a
LR Parsing 29
[L E.]
[A .L]
[L .L,E]
[L .E]
[E .a]
I2
[E a.]
I3
[A L.]
[L L.,E]
I1
[L L,.E]
[E .a]
I4
[L L,E.]
I5
I0
E
L
, a
E
a
FOLLOW(A) = {$}
FOLLOW(L) = {,,$}
FOLLOW(E) = {,,$}

 Parsing Table :
LR Parsing 30
r1r15
5s34
r3r33
r2r22
accs41
21s30
ELEMENTLIST$,a
GOTO TableACTION TableSymbol
states
[L E.]
[A .L]
[L .L,E]
[L .E]
[E .a]
I2
[E a.]
I3
[A L.]
[L L.,E]
I1
[L L,.E]
[E .a]
I4
[L L,E.]
I5
E
L
, a
E
a
FOLLOW(A) = {$}
FOLLOW(L) = {,,$}
FOLLOW(E) = {,,$}
G : 0. A  L
1. L  L , E
2. L  E
3. E  a

 ex) G: 1. S  L = R
2. S  R 4. L  id
3. L   R 5. R  L
 C0 :
 Consider I2 :
 ACTION[2,=] := “shift 6 ”
 ACTION[2,=] := “reduce RL ” (∵ = ∈ FOLLOW(R))
LR Parsing 31
[S .S]
[S .L=R]
[S .R]
[L .R]
[L .id]
[R .L]
[S S.]
I1 I0
[L id.]
[S L.=R]
[R L.]
[S L=.R]
[R .L]
[L .R]
[L .id]
[L .R]
[R .L]
[L .R]
[L .id] [L R.]
[S L=R.] [R L.]
I2
I6
I9
I4
I5
I7
I8
[S R.]
I3
S
id

L
R


id
R
id
L
R
=
 shift/reduce conflict
 Not SLR(1)
FOLLOW(S) = {$}
FOLLOW(L) = {=,$}
FOLLOW(R) = {$,=}

예 : 다음 grammar를 위한 SLR parsing table 구성.
E → E + T | T
T → T * F | F
F → (E) | id
(1) 추가된
production rule :
0. S' → E
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → (E)
6. F → id
[S'→.E]
[E→.E+T]
[E→.T]
[T→.T*F]
[T→.F]
[F→.(E)]
[F→.id]
[S'→E.]
[E→E.+T]
[E→T.]
[T→T.*F]
[T→F.]
[S'→.E]
[E→.E+T]
[E→.T]
[T→.T*F]
[T→.F]
[F→.(E)]
[F→.id]
[F→id.]
[E→E+.T]
[T→.T*F]
[T→.F]
[F→.(E)]
[F→.id]
[T→T*.F]
[F→.(E)]
[F→.id]
[F→ (E.)]
[E→E.+T]
[E→E+T.]
[T→T.*F]
[T→T*F.]
[F→(E).]
I1I0
I3
I5
I7
I10
I2
I8 I11
I4
I6 I9
E
T
F
T
F
F
F
E
T
id id
id
(
(
(
(
)
+
+
* *

FOLLOW(E) = {$, +, )}
FOLLOW(T) = {*, +, ), $}
FOLLOW(F) = {*, +, ), $}
[S'→.E]
[E→.E+T]
[E→.T]
[T→.T*F]
[T→.F]
[F→.(E)]
[F→.id]
[S'→E.]
[E→E.+T]
[E→T.]
[T→T.*F]
[T→F.]
[S'→.E]
[E→.E+T]
[E→.T]
[T→.T*F]
[T→.F]
[F→.(E)]
[F→.id]
[F→id.]
[E→E+.T]
[T→.T*F]
[T→.F]
[F→.(E)]
[F→.id]
[T→T*.F]
[F→.(E)]
[F→.id]
[F→ (E.)]
[E→E.+T]
[E→E+T.]
[T→T.*F]
[T→T*F.]
[F→(E).]
I1I0
I3
I5
I7
I10
I2
I8 I11
I4
I6 I9
E
T
F
T
F
F
F
E
T
id id
id
(
(
(
(
)
+
+
* *
state ACTION table GOTO table
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

 string a*a+a를 위한 구문 분석 과정
Page 34accept$0E121
GOTO 1$0E20
1reduce 1$0E1+6T919
GOTO 9$0E1+6T18
4reduce 4$0E1+6F317
GOTO 3$0E1+6F16
6reduce 6$0E1+6a515
shift 5a$0E1+614
shift 6+a$0E113
GOTO 1+a$0E12
2reduce 2+a$0T211
GOTO 2+a$0T10
3reduce 3+a$0T2*7F109
GOTO 10+a$0T2*7F8
6reduce 6+a$0T2*7a57
shift 5a+a$0T2*76
shift 7*a+a$0T25
GOTO 2*a+a$0T4
4reduce 4*a+a$0F33
GOTO 3*a+a$0F2
6reduce 6*a+a$0a51
shift 5a*a+a$00
outputSyntax analysis 내용Input symbolstackstep

 Shift-Reduce conflict
 SLR parsing table을 만들 때, 한 state에서 mark symbol과 reduce item의
FOLLW가 같으면 shift를 해야 할지 또는 reduce해야 할지를 결정하지 못하는
경우
 예 : 그림과 같이 두 개의 item이 존재할 때, a가 FOLLOW(A)에 속하면, input
symbol A를 보고 A→로 reduce 해야 할지 shift 해야 할지 결정할 수 없다.
 Reduce-Reduce conflict
 Reduce item이 한 state에서 여러 개가 나타날 수 있는데 만일 각 reduce item
의 FOLLOW symbol이 같다면 FOLLOW symbol을 보고 어느 production rule
로 reduce할 것인지 결정할 수 없게 되는 state
 예 : 어떤 특정한 state에 그림과 같은 item들이 존재하고 , a가 FOLLOW(A)에
속하고 동시에 FOLLOW(B)에 속하면, input symbol a를 보고 어느 production
rule로 reduce해야 할지 결정할 수 없게 된다.
[A→.]
[B→.a]
a
[A→.]
[B→.]

 Shift-reduce conflict나 reduce-reduce conflict 발생
 SLR grammar이 될 수 없다.
 Syntax analyzer가 결정적으로 분석할 수 없다.
 Grammar가 모호하기 때문에 conflict가 발생
 SLR 방법이 FOLLOW symbol로 reduce action을 결정 하기 때문에 발생
 한 state에서 non-terminal 다음에 나올 수 있는 exact context에 대하여 reduce action을
결정하면 해결할 수 있는 경우도 있다.
 SLR 방법의 파싱 테이블 구성
 FOLLOW 에 속하는 심벌에 대해서 reduce action을 만듬
 어떤 상태에서는 특정 FOLLOW 심벌이 나올 수가 없어 그 심벌에 대한 reduce action은
틀린 것이 됨
 즉, Item [A→.]이 상태 i에 속해 있다면 FOLLOW(A)에 대해서 reduce action을 만드는데
FOLLOW(A)에 속하지만 그 상태에서 A 다음에 나올 수 없는 심벌이 존재, 이와 같은 심벌에 대
한 reduce action은 틀린 것을 의미
 A 다음에 나올 수 있는 심벌에 대해서만 reduce action을 하는 것이 타당

 Key point of SLR
In the SLR method, if [A  .]  Ii, then M[i,a] := reduce A  
for all a  FOLLOW(A). But in some situations, a cannot be a
follow symbol of A in State i. Thus, the reduction by A  
would be invalid on a in that state. To solve this problem, we
must carry more information that will allow us to rule out some
of these invalid reductions by A  .This is called the lookahead
of the item that is a state-dependent FOLLOW symbol.
LR Parsing 37
8.4 Constructing CLR Parsing Tables

38
CLR Parsing Table 구성 방법
 Lookahead를 state-의존적인 FOLLOW로 생각
 FOLLOW : 모든 문장 형태에서 non-terminal 다음에 나올 수 있는 terminal
symbol 들의 집합
 Lookahead : 어떤 특정한 state에서 한 item의 lhs 다음에 나올 수 있는
terminal symbol들의 집합
 CLR(Canonical LR)
 Reduce item에 대한 reduce action을 lookahead symbol에 대해서 만드는 방법
 Lookahead 집합이 FOLLOW 집합보다 작기 때문에 SLR 방법보다 더 큰 종류
의 grammar에 대한 결정적인 parsing table을 만들 수 있는 방법.
 CLR 방법으로 parsing table을 만들기 위해서는 lookahead 정보가 필요
 따라서 각 item은 그에 해당하는 lookahead를 갖고 다님
 LR(0) item에 lookahead 정보를 보강한 것이 LR(1) item
Lookahead 집합
FOLLOW 집합

 LR(1) item은 [A→., a]의 형태를 이루며 여기서 A→∈P이고
a∈VT∪{$}
 첫 번째 부분 A→. 를 core라고 부르며, LR(0) item과 같은 의미를 갖는다.
 두 번째 부분 a를 item의 lookahead라 부르며, reduce item일 때 그 symbol에 대하
여 reduce action을 하는 것을 뜻한다.
 CLR 방법으로 parsing table을 작성
 타당한 LR(1) item 집합의 canonical collection인 C1을 구성
 CLOSURE 함수와 GOTO 함수가 필요
 GOTO 함수는 C0를 구성할 때와 동일, CLOSURE 함수만 lookahead 정보 때문에 다
름

 I가 LR(1) item의 집합이라면, CLOSURE 정의는 다음과 같다.
CLOSURE(I) = I∪{[B→., b] | [A→.B, a] ∈ CLOSURE(I),
B→. ∈ P, b ∈ FIRST(a)}
 LR(1) item 집합의 CLOSURE는 LR(0) item 집합의 CLOSURE와 유사하며
lookahead를 구해서 첨가 하는 것만이 차이점
 [A→.B, a]에서 mark symbol B다음에 오는 의 FIRST가 item [B→. ]의 lookahead
 만일 가 을 유도 할 수 있으면, item [A→.B]의 lookahead인 a도 lookahead가 된다.
 Item [B→. ]의 lookahead는 a의 FIRST가 된다.
CLOSURE({[A→.B, a]}) = {[A→.B, a]}∪{[B→., b] | b∈FIRST(a)}.

 CLOSURE operation of LR(1) item:
CLOSURE(I) = I  {[B .,b]|[A  .B,a]  CLOSURE(I),
B    P, b  FIRST(a)}.
ex) G : S'  S
S  CC
C  cC
C  d
CLOSURE({[S' .S,$]}) = {[S' .S,$], [S .CC,$],
[C .cC,c/d], [C .d,c/d]}.
 We use the notation [C .cC,c/d] as a shorthand for the two
items [C .cC,c] and [C .cC,d].
 CLOSURE({[A  .B,a]})
= {[A  .B,a]}  {[B  .,b] | b  FIRST(a)}.
LR Parsing 41
FIRST(S’)=FIRST(S)=FIRST(C)={c,d}
FOLLOW(S’) = FOLLOW(S) ={$}
FOLLOW(C) = {c,$}

 ex)
 I6 differs from I3 only in second components.
LR Parsing 42
[S' → .S, $]
[S → .CC, $]
[C → .cC, c/d]
[C → .d, c/d]
[S' → S.,$]
[C → c.C, c/d]
[C → .cC, c/d]
[C → .d, c/d]
[C → d., c/d ]
[S → C.C, $]
[C → .cC, $]
[C → .d, $]
[C → d., $]
[C → cC., c/d]
[S → CC., $]
[C → c.C, $]
[C → .cC, $]
[C → .d, $]
[C → cC., $]
I1I0
I3
I5
I7
I2
I8I4
I6
I9
S
C
C
C
C
c
cc
c
d
d
d
d

 Construction of CLR parsing table
::= same as SLR except that
ACTION[i,a] := reduce A   if [A  .,a]  Ii.
LR Parsing 43
상
태
ACTION table GOTO table
c d $ S C
0 s3 s4 1 2
1 acc
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
[S' → .S, $]
[S → .CC, $]
[C → .cC, c/d]
[C → .d, c/d]
[S' → S.,$]
[C → c.C, c/d]
[C → .cC, c/d]
[C → .d, c/d]
[C → d., c/d ]
[S → C.C, $]
[C → .cC, $]
[C → .d, $]
[C → d., $]
[C → cC., c/d]
[S → CC., $]
[C → c.C, $]
[C → .cC, $]
[C → .d, $]
[C → cC., $]
I1I0
I3
I5
I7
I2
I8I4
I6
I9
S
C
C
C
C
c
cc
c
d
d
d
d

 string cdd의 구문 분석
accept$0S111
GOTO 1$0S10
1reduce 1$0C2C59
GOTO 5$0C2C8
3reduce 3$0C2d77
shift 7d$0C26
GOTO 2d$0C5
2reduce 2d$0c3C84
GOTO 8d$0c3C3
3reduce 3d$0c3d42
shift 4dd$0c31
shift 3cdd$00
outputSyntax analysis actionInput symbolstackStep

ex) G : S  L = R | R G' : 0) S'  S
augmented
L   R | id =========> 1) S  L = R
R  L 2) S  R
3) L  R
4) L  id
5) R  L
 C1 :
LR Parsing 45
I0 : [S' .S,$]
[S  .L=R,$]
[S  .R,$]
[L  .R,=]
[L  .id,=]
[R  .L,$]
I0  S = I1 : [S'  S.,$]
I0  L = I2 : [S  L.=R,$]
[R  L.,$]
I0  R = I3 : [S  R.,$]
I0   = I4 : [L .R,=]
[R .L,=]
[L .R,=]
[L .id,=]
I0  id = I5 : [L  id.,=]

LR Parsing 46
I2  = = I6 : [S  L=.R,$]
[R .L,$]
[L .R,$]
[L .id,$]
I4  R = I7 : [L  *R.,=]
I4  L = I8 : [R  L.,=]
I4   = I4
I4  id = I5
I6  R = I9 : [S  L=R.,$]
I6  L = I10 : [R  L.,$]
I6   = I11 : [L  .R,$]
[R .L,$]
[L .R,$]
[L .id,$]
I6  id= I12 : [L  id.,$]
I11  R = I13 : [L  R.,$]
I11  L = I10
I11   = I11
I11  id = I12
I11  id = I12

13 r3
12
11 s11
10
9
8 r5
7 r3
6
s12
s12
5 r4
4
3
2 s6
1
0 s4
symbols
states
= *
s5
s5
id
r4
r5
r1
r2
r5
acc
$
1
S
10 13
10 9
8 7
2 3
L R
LR Parsing 47
 Parsing Table

LR Parsing 48
...
 LR(1) Parsing
( S0,  id = id $ )
S4 ===> ( S0  S4, id = id $ )
S5 ===> ( S0  S4 id S5, = id $ )
r4,Goto8 ===> ( S0  S4 L S8, = id $ )
r5,Goto7 ===> ( S0  S4 R S7, = id $ )
r3,Goto7 ===> ( S0 L S2, = id $ )
S6 ===> ( S0 L S2 = S6, id $ )

 LALR (LookAhead LR)
 Item의 lookahead 정보를 이용하기 때문에 SLR 방법보다 강력
 Parsing table의 크기는 CLR에서 core가 같은 item들을 묶음으로써
SLR과 같은 크기로 구성 가능
 모호하지 않은 context-free grammar로 표현된 거의 모든 언어를 인식
 최근 대부분의 Parser Generating System은 LALR 방법 사용
 LALR parsing table 작성 방법
 C1에서 작성하는 방법
 이론적으로 쉽게 설명
 C1 의 크기가 너무 커져서 실질적인 방법이 되지 못함
 Lookahead에 따라 상태수가 매우 커지고 시간이 오래 걸리기 때문
 C0에서 작성하는 방법
 이론적으로 복잡하고 어려움
 시간과 기억 공간이 작아지는 실질 적인 방법
LR Parsing 49
8.5 Constructing LALR Parsing Tables

 The C1 method
 LR(1) item : [A   ., a ]
core lookahead
 The general idea of the algorithm is to construct C1 and if no
conflicts arise, merge sets with common cores.
 In general, a core is a set of LR(0) item for the grammar at hand.
Thus SLR and LALR tables for a grammar always have the same
number of states.
 EX.
I3 + I6  I36: {[Cc.C,c/d/$],[C.cC,c/d/$],[C.d,c/d/$]}.
I4 + I7  I47: {[C  d.,c/d/$]}.
I8 + I9  I89: {[C  cC.,c/d/$]}.
 Parsing table
LR Parsing 50
상
태
ACTION TABLE GOTO TABLE
c d $ S C
0 s36 s47 1 2
1 acc
2 s36 s47 5
36 s36 s47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2

 The merging of states with common cores can never produce
a shift-reduce conflict that was not present in one of the original
states, because shift actions depend on the core , not the
lookahead. It is possible, however, that a merger will produce a
reduce-reduce conflict.
 shift/reduce conflict : can not decide whether to shift
or to reduce
reduce/reduce conflict : can not decide which of several
reductions to make.
LR Parsing 51

 The C0 method
 complex but smaller time & space.
 the C1 method : simple but time & space consuming method.
 C0, lookahead
 Efficient Computation of Lookahead Sets
 Definition : LA(p, [A  . ]) = {a | a  FIRST(),
S'  A  ,  accesses p}.
 가 state p를 access한다는 것은 start state로부터 만큼 보고 state p로
이동을 하였다는 의미
 State p에서 [A→.] item의 lookahead는  만큼 올라간 state에서 non-
terminal A 다음에 나오는 symbol들의 FIRST가 된다.
 LA(p, [A→.])의 의미는 p state에서 non-terminal A 다음에 나올 수 있는
terminal들의 집합
LR Parsing 52

 Computing formula :
LA(p, [A   .])
=   FIRST(2)  LA(q, [B  1.A2]).
qPRED(p,) [B 1.A2]q
 PRED(p, ) = {q | p  GOTO (q, )}.
 lookahead of augmented rule: LA(I0,[S' .S]) = {$}.
 Computing Lookahead Sets by Recursive Calls.
function LALR(p:state; I : item) : set of VT ;
assume I = [A  .];
LALR := {};
if A <> S' then
for q  PRED(p, ) do
for [B  1.A 2]  q do
LALR := LALR  FIRST(2);
if   FIRST(2) and MAP(q, [B  1.A 2]) then
LALR := LALR  LALR(q, [B  1.A 2])
fi
end for
end for
end function
LR Parsing 53

LR Parsing 54
[S L.=R]
[R L.]
I2
I0
[S .S]
[S .L=R]
[S .R]
[R .L]
[L .*R]
[L .id]
[S R.]
I3
[S  S.]
I1
[L id.]
I5
[L *.R]
[R .L]
[L .*R]
[L .id]
I4
[S L=.R]
[R .L]
[L .*R]
[L .id]
I6
[L *R.]
I7
[S  L=R.]
I9
[R  L.]
I8
S
L
R
*
*
id
id
L
R
=
*
R
L
ex)
.
.
.
.
id
교과서 323쪽
[예 15]

 LA ( I2, [R L.] )
 = FIRST()  LA ( I0, [S .R] ) = LA ( I0, [S .R] )
 = FIRST()  LA ( I0, [S' .S] ) = {$}
 LA(I5, [L → id.])
 1. I0 state에서 :
(FIRST()  LA(I0, [R→.L]))∪(FIRST(=R)  LA(I0, [S→.L=R]))
= LA(I0, [R→.L])∪{=} = FIRST()  LA(I0, [S → .R])∪{=}
= LA(I0, [S → .R])∪{=} = {$}∪{=} = {$, =}
 2. I4 state에서 : LA(I4, [L→*.R]) = {$, =}∪{$, =}∪{$} = {$, =}
 뒷장 참조
 3. I6 state에서 : I4의 ⓒ와 같으므로 lookahead는 {$}
 LA(I5, [L→id.]) = {$, =}∪{$, =}∪{$} = {$, =}
LR Parsing 55

 I4 에서 *를 보고 올라간 predecessor state는 {I0, I4, I6}
 ⓐ I0 에서 L다음에 유도되는 심벌의 FIRST는 과 같으므로
lookahead는 {$, =}이다.
 ⓑ I4 에서 L다음에 유도되는 심벌의 FIRST는 의 ⓐ와 같으므로
lookahead는 {$, =}이다.
 ⓒ I6에서 : FIRST()  LA(I6, [R→.L]) = LA(I6, [R→.L])
= FIRST()  LA(I6, [S→L=.R])
= LA(I6, [S→L=.R]) = FIRST()  LA(I0, [S'→.S])
= {}  {$} = {$}
 Construction of LALR parsing tables
 same as SLR method except that
ACTION[p,a] := reduce A  
for all a∈LA(p,[A  .]).
LR Parsing 56

 id=id의 syntax analysis 과정 :
Step stack Input symbol Syntax analysis action output
0 0 id=id$ shift 5
1 0id5 =id$ reduce 5 5
2 0L =id$ GOTO 2
3 0L2 =id$ shift 6
4 0L2=6 id$ shift 5
5 0L2=6id5 $ reduce 5 5
6 0L2=6L $ GOTO 8
7 0L2=6L8 $ reduce 3 3
8 0L2=6R $ GOTO 9
9 0L2=6R9 $ reduce 1 1
10 0S $ GOTO 1
11 0S1 $ accept

 Every ambiguous grammar fails to be LR. So ambiguous grammars
always arise the conflicts, shift-reduce or reduce-reduce. But some
ambiguous grammars are quite useful in the specification of languages.
And also they can reduce the speed of a parser.
 shift-reduce or reduce-reuce conflicts can be resolved using the
precedence and associativity information.
 Precedence : higher  shift
lower  reduce
 Associativity : left  reduce
right  shift
LR Parsing 58
8.6 Deterministic Parsing of Ambiguous Grammars

Ex) E  E + E | E  E | (E) | id
LR Parsing 59
+
I0
[E .E]
[E .E+E]
[E .EE]
[E .(E)]
[E .id]
[E id.]
I3
I6
[E (E.)]
[E E.+E]
[E E.E]
I1
[E E.]
[E E.+E]
[E E.E]
I4
[E E+.E]
[E .E+E]
[E .EE]
[E .(E)]
[E .id]
I5
[E .EE]
[E .E+E]
[E .EE]
[E .(E)]
[E .id]
I7
[E E+E.]
[E E.+E]
[E E.E]
I8
[E EE.]
[E E.+E]
[E E.E]
[E (E).]
I9
E
id
(
id
(
*
)
*
E
+
*
+
E
*
(
(
E
id
id
 I7,I8 : states id + * ( ) $ E
I7 r1,s4 r1,s5 r1 r1
I8 r2,s4 r2,s5 r2 r2
I2
[E (.E)]
[E .E+E]
[E .EE]
[E .(E)]
[E .id]

 Dangling-else 문제
 중첩된 if 문장에서 뒤에 나오는 else가 어떤 if문장과 결합되는가의 문제
 예 : 다음 grammar에 대한 parsing table을 구성하고 string iiaea의 syntax
analysis 과정을 살펴보자.
S → iSeS | S → iS | a
 (1) 추가된 production rule :
0. S' → S
1. S → iSeS
2. S → iS
3. S → a
(2) C0 및 GOTO graph :
[S'→.S]
[S→.iSeS]
[S→.iS]
[S→.a]
[S'→S.]
[S→a.]
[S→i.SeS]
[S'→i.S]
[S→.iSeS]
[S→.iS]
[S→.a]
[S→iSe.S]
[S→.iSeS]
[S→.iS]
[S→.a]
[S→iS.eS]
[S→iS.]
[S→iSeS.]
I1I0
I3
I5
I2
I4
I6
a a
a
e
S
S
S
i
i
i

(3) Parsing table :
 state I4에서 shift-reduce 충돌이 발생
I4 state에서 S 다음에 나올 수 있는 symbol이 e, $이므로 symbol
e를 보고 shift할 것인지 reduce할 것인지 action을 선택
 Dangling-else문제에서 else는 일반적으로 가장 가까운 if문장과
연결되므로 우측 결합을 만족하고 따라서 I4에서의 syntax
analysis action은 shift해야 한다.
[S'→.S]
[S→.iSeS]
[S→.iS]
[S→.a]
[S'→S.]
[S→a.]
[S→i.SeS]
[S'→i.S]
[S→.iSeS]
[S→.iS]
[S→.a]
[S→iSe.S]
[S→.iSeS]
[S→.iS]
[S→.a]
[S→iS.eS]
[S→iS.]
[S→iSeS.]
I1I0
I3
I5
I2
I4
I6
a a
a
e
S
S
S
i
i
i
상
태
ACTION table GOTO
i e a $ S
0 s2 s3 1
1 acc
2 s2 s3 4
3 r3 r3
4 s5 r2
5 s2 s3 6
6 r1 r1

 Parsing table
 the size of parsing table : |states |  |V |
 a typical P.L grammar : |V| = 100, |states| = 300
the size of P.T. = 30,000 entries
symbols
states
Action Table GOTO Table
0
1
2
3
shift
reduce
accept
error
GOTO
…
LR Parsing 62
8.7 Compaction of LR Parsing Tables

 Compaction methods
(1) Identical action entries can be represented by one entry
and pointers can be used.
ex)
LR Parsing 63
s2 s3
acc
r3 r3
s5 r2
0
1
2
3
4
5
6
1 2 3 4
r1 r1
state ACTION

(2) By creating a list for the actions of each state, further
space efficiency can be achieved.
ex) state 0,2,5 : (i,s2), (a,s3), (any,error)
state 1 : ($,acc), (any,error)
state 3 : (any,r3)
state 4 : (e,s5), ($,r2), (any,error)
state 6 : (any,r1)
(3) Encoding the GOTO field.
 form : GOTO[current-state,A] = next-state, where A ∈ VN.
 making a list of pairs for each nonterminal.
LR Parsing 64

S : (0,1), (any,error)
L : (0,2), (4,8), (6,8), (any,error)
R : (0,3), (4,7), (6,9), (any,error)
(remarks) ======>  Representation of sparse matrix.
 Use the dynamic storage.
LR Parsing 65
4 5 6 7
8 8
7 9
0 1 2 3
1
2
3
8 9states
VN
S
L
R
ex) [그림 8.10] --- Text p.326
(3) Encoding the GOTO field.
 form : GOTO[current-state,A] = next-state, where A ∈ VN.
 making a list of pairs for each nonterminal.

 Parser Generator System
Driver
Routine
Token
stream
Result of
parsing
Parsing table
PGSGrammar
LR Parsing 66
8.8 Implementation of an LR parser
 Parsing Table
ptbl[S,X] = shift : > 0
reduce : < 0
accept : NO_RULES + 1
error : 0
symbols
states

 Parsing stack
 Parsing stack은 병렬로
운행되는 Symbol stack과
State stack으로 구성
- Symbol stack : 문법 심벌 저장
- State stack : 상태 저장
 LR parser for Mini C(Text pp341-346)
 Mini C Grammar(Text. pp. 578-581)
(1) number of rules : 97
(2) number of symbols : 85
(3) number of states : 153
.
.
.
X
Y
Z
.
.
.
Sm-2
Sm-1
Sm
sp
LR Parsing 67
AST를 위한 문법

Ch08

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ch08

Similar to Ch08 (12)

More from Hankyo

More from Hankyo (20)

Ch08