Extended Context-Free Grammars Parsing with
Generalized LL
Author: Artem Gorokhov
Saint Petersburg University
Programming Languages and Tools Lab, JetBrains
March 4,2017
Artem Gorokhov (SPbU) March 4,2017 1 / 15
Artem Gorokhov (SPbU) March 4,2017 2 / 15
Extended Context-Free Grammar
S = a M*
M = a? (B K)+
| u B
B = c | 𝜀
Artem Gorokhov (SPbU) March 4,2017 3 / 15
=⇒
Artem Gorokhov (SPbU) March 4,2017 4 / 15
Artem Gorokhov (SPbU) March 4,2017 5 / 15
Artem Gorokhov (SPbU) March 4,2017 5 / 15
Existing solutions
ANTLR, Yacc, Bison
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, Bison
Can’t use ECFG without transformation
Admit only subclass of Context-Free languages (LL(k), LR(k))
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, Bison
Can’t use ECFG without transformation
Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, Bison
Can’t use ECFG without transformation
Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
No tools
LL(k), LR(k)
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, Bison
Can’t use ECFG without transformation
Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
No tools
LL(k), LR(k)
Generalized LL
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, Bison
Can’t use ECFG without transformation
Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
No tools
LL(k), LR(k)
Generalized LL
Admit arbitrary CFG (including ambiguous)
Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, Bison
Can’t use ECFG without transformation
Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
No tools
LL(k), LR(k)
Generalized LL
Admit arbitrary CFG (including ambiguous)
Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Automata and ECFGs
Grammar G0
S = a*S b? | c =⇒
RA for grammar G0
cS
S
a
b
ε
ε
Artem Gorokhov (SPbU) March 4,2017 7 / 15
Recursive Automata Minimization
Grammar G1
S = K K K K K K |K a K K K K
K = S K | a K | a
Automaton for G1
K
a
K K KK
S
K
K K K K
a
K
K
K
S
a
Minimized automaton for G1
a S
a
K
K
S
K K K K
K
K
Artem Gorokhov (SPbU) March 4,2017 8 / 15
Derivation Trees for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Derivation trees:
S,0,4
b,3,4a,0,1 a,1,2
c,2,3
S,2,3
S,0,4
b,3,4a,0,1
a,1,2
c,2,3
S,2,3
S,1,3
S,0,4
b,3,4
a,0,1
a,1,2
c,2,3
S,1,4
S,2,3
Artem Gorokhov (SPbU) March 4,2017 9 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
Input processing
Descriptors queue
Descriptor (G, i, U, T) uniquely defines parsing process state
G - position in grammar
i - position in input
U - stack node
T - current parse forest root
Artem Gorokhov (SPbU) March 4,2017 11 / 15
Input processing
Descriptors queue
Descriptor (G, i, U, T) uniquely defines parsing process state
G - position in grammar state of RA
i - position in input
U - stack node
T - current parse forest root
Artem Gorokhov (SPbU) March 4,2017 11 / 15
Input processing
Input : bc
Grammar:
S = (a | b | S) c?
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : bc
Grammar:
S = a C_opt
| b C_opt
| S C_opt
C_opt = 𝜀 | c
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = ∙ a C_opt
| b C_opt
| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = a C_opt
| ∙ b C_opt
| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = a C_opt
| b C_opt
| ∙ S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = ∙ a C_opt
| b C_opt
| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = a C_opt
| ∙ b C_opt
| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt
| b ∙ C_opt
| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
b,0,1
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt
| b C_opt
| S C_opt
C_opt = ∙ 𝜀 | c
Descriptors queue
C_opt = ∙𝜀, 1, . . . , . . .
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt
| b C_opt
| S C_opt
C_opt = 𝜀 | ∙ c
Descriptors queue
C_opt = ∙c, 1, . . . , . . .
C_opt = ∙𝜀, 1, . . . , . . .
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt
| b C_opt
| S C_opt
C_opt = 𝜀 | ∙ c
Descriptors queue
C_opt = ∙c, 1, . . . , . . .
C_opt = ∙𝜀, 1, . . . , . . .
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : bc∙
Grammar:
S = a C_opt
| b C_opt
| S C_opt
C_opt = 𝜀 | c ∙
Descriptors queue
C_opt = ∙c, 1, . . . , . . .
C_opt = ∙𝜀, 1, . . . , . . .
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
C_opt,1,2
c,1,2
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : bc
Automaton :
ca
b
S
S
Artem Gorokhov (SPbU) March 4,2017 13 / 15
Input processing
Input : ∙ bc
Automaton :
ca
b
S
S Descriptors queue
S, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 13 / 15
Input processing
Input : ∙ bc
Automaton :
ca
b
S
S b,0,1
S,0,1
b,0,1
Artem Gorokhov (SPbU) March 4,2017 13 / 15
Evaluation
Grammar G1
S = K K K K K K |K a K K K K
K = S K | a K | a
RA for grammar G1
a S
a
K
K
S
K K K K
K
K
Experiment results for input a40
Memory usage
Time,secDescriptors Stack Edges SPPF Nodes
Grammar 7,940 6,974 111,127,244 81
RA 5,830 4,234 74,292,078 54
Ratio 27% 39% 33 % 35 %
Artem Gorokhov (SPbU) March 4,2017 14 / 15
Applicability
Graph parsing: all input strings in one graph
abcd
abfd
=⇒
ba
c
d
f
Graph parsing results
Memory usage
Time, minDescriptors Stack Edges Stack Nodes
Grammar 21,134,080 7,482,789 2,731,529 02.26
RA 9,153,352 2,792,330 839,148 01.25
Ratio 57% 63% 69 % 45 %
Artem Gorokhov (SPbU) March 4,2017 15 / 15

TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL

  • 1.
    Extended Context-Free GrammarsParsing with Generalized LL Author: Artem Gorokhov Saint Petersburg University Programming Languages and Tools Lab, JetBrains March 4,2017 Artem Gorokhov (SPbU) March 4,2017 1 / 15
  • 2.
    Artem Gorokhov (SPbU)March 4,2017 2 / 15
  • 3.
    Extended Context-Free Grammar S= a M* M = a? (B K)+ | u B B = c | 𝜀 Artem Gorokhov (SPbU) March 4,2017 3 / 15
  • 4.
    =⇒ Artem Gorokhov (SPbU)March 4,2017 4 / 15
  • 5.
    Artem Gorokhov (SPbU)March 4,2017 5 / 15
  • 6.
    Artem Gorokhov (SPbU)March 4,2017 5 / 15
  • 7.
    Existing solutions ANTLR, Yacc,Bison Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 8.
    Existing solutions ANTLR, Yacc,Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 9.
    Existing solutions ANTLR, Yacc,Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 10.
    Existing solutions ANTLR, Yacc,Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 11.
    Existing solutions ANTLR, Yacc,Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 12.
    Existing solutions ANTLR, Yacc,Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Admit arbitrary CFG (including ambiguous) Can’t use ECFG without transformation Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 13.
    Existing solutions ANTLR, Yacc,Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Admit arbitrary CFG (including ambiguous) Can’t use ECFG without transformation Artem Gorokhov (SPbU) March 4,2017 6 / 15
  • 14.
    Automata and ECFGs GrammarG0 S = a*S b? | c =⇒ RA for grammar G0 cS S a b ε ε Artem Gorokhov (SPbU) March 4,2017 7 / 15
  • 15.
    Recursive Automata Minimization GrammarG1 S = K K K K K K |K a K K K K K = S K | a K | a Automaton for G1 K a K K KK S K K K K K a K K K S a Minimized automaton for G1 a S a K K S K K K K K K Artem Gorokhov (SPbU) March 4,2017 8 / 15
  • 16.
    Derivation Trees forRecursive Automata Input: aacb Automaton: cS S a b a Derivation trees: S,0,4 b,3,4a,0,1 a,1,2 c,2,3 S,2,3 S,0,4 b,3,4a,0,1 a,1,2 c,2,3 S,2,3 S,1,3 S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,1,4 S,2,3 Artem Gorokhov (SPbU) March 4,2017 9 / 15
  • 17.
    SPPF for RecursiveAutomata Input: aacb Automaton: cS S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  • 18.
    SPPF for RecursiveAutomata Input: aacb Automaton: cS S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  • 19.
    SPPF for RecursiveAutomata Input: aacb Automaton: cS S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  • 20.
    SPPF for RecursiveAutomata Input: aacb Automaton: cS S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  • 21.
    Input processing Descriptors queue Descriptor(G, i, U, T) uniquely defines parsing process state G - position in grammar i - position in input U - stack node T - current parse forest root Artem Gorokhov (SPbU) March 4,2017 11 / 15
  • 22.
    Input processing Descriptors queue Descriptor(G, i, U, T) uniquely defines parsing process state G - position in grammar state of RA i - position in input U - stack node T - current parse forest root Artem Gorokhov (SPbU) March 4,2017 11 / 15
  • 23.
    Input processing Input :bc Grammar: S = (a | b | S) c? Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 24.
    Input processing Input :bc Grammar: S = a C_opt | b C_opt | S C_opt C_opt = 𝜀 | c Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 25.
    Input processing Input :∙ bc Grammar: S = ∙ a C_opt | b C_opt | S C_opt C_opt = 𝜀 | c Descriptors queue S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 26.
    Input processing Input :∙ bc Grammar: S = a C_opt | ∙ b C_opt | S C_opt C_opt = 𝜀 | c Descriptors queue S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 27.
    Input processing Input :∙ bc Grammar: S = a C_opt | b C_opt | ∙ S C_opt C_opt = 𝜀 | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 28.
    Input processing Input :∙ bc Grammar: S = ∙ a C_opt | b C_opt | S C_opt C_opt = 𝜀 | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 29.
    Input processing Input :∙ bc Grammar: S = a C_opt | ∙ b C_opt | S C_opt C_opt = 𝜀 | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 30.
    Input processing Input :b ∙ c Grammar: S = a C_opt | b ∙ C_opt | S C_opt C_opt = 𝜀 | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . b,0,1 Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 31.
    Input processing Input :b ∙ c Grammar: S = a C_opt | b C_opt | S C_opt C_opt = ∙ 𝜀 | c Descriptors queue C_opt = ∙𝜀, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 32.
    Input processing Input :b ∙ c Grammar: S = a C_opt | b C_opt | S C_opt C_opt = 𝜀 | ∙ c Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙𝜀, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 33.
    Input processing Input :b ∙ c Grammar: S = a C_opt | b C_opt | S C_opt C_opt = 𝜀 | ∙ c Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙𝜀, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 34.
    Input processing Input :bc∙ Grammar: S = a C_opt | b C_opt | S C_opt C_opt = 𝜀 | c ∙ Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙𝜀, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . C_opt,1,2 c,1,2 Artem Gorokhov (SPbU) March 4,2017 12 / 15
  • 35.
    Input processing Input :bc Automaton : ca b S S Artem Gorokhov (SPbU) March 4,2017 13 / 15
  • 36.
    Input processing Input :∙ bc Automaton : ca b S S Descriptors queue S, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 13 / 15
  • 37.
    Input processing Input :∙ bc Automaton : ca b S S b,0,1 S,0,1 b,0,1 Artem Gorokhov (SPbU) March 4,2017 13 / 15
  • 38.
    Evaluation Grammar G1 S =K K K K K K |K a K K K K K = S K | a K | a RA for grammar G1 a S a K K S K K K K K K Experiment results for input a40 Memory usage Time,secDescriptors Stack Edges SPPF Nodes Grammar 7,940 6,974 111,127,244 81 RA 5,830 4,234 74,292,078 54 Ratio 27% 39% 33 % 35 % Artem Gorokhov (SPbU) March 4,2017 14 / 15
  • 39.
    Applicability Graph parsing: allinput strings in one graph abcd abfd =⇒ ba c d f Graph parsing results Memory usage Time, minDescriptors Stack Edges Stack Nodes Grammar 21,134,080 7,482,789 2,731,529 02.26 RA 9,153,352 2,792,330 839,148 01.25 Ratio 57% 63% 69 % 45 % Artem Gorokhov (SPbU) March 4,2017 15 / 15