2. Instaparse is a clojure library for building
parsers from context-free grammars
3. What is a parser?
● Program that takes some input data (usually a string), and produces a
data-structure (usually a parse tree), based on some grammar (usually a
context-free grammar)
4. What’s a context-free grammar?
Formal definition:
V = finite set of non-terminals or variables. Each variable represents a clause or a
phrase, or a syntactic category
𝚺 = finite set of terminals. The set of terminals is the alphabet of the language
R = finite relation from V to (V U 𝚺)*. Each member of R is a rewrite rule or
production
S = the starting symbol, must be an element of V
Adapted from https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions
5. What’s a context-free grammar?
The “context-free” bit means that the rules can always be applied, regardless of
the rest of the string (context).
There are other kinds of grammars, some more or less powerful than CFG’s. See
Chomsky Hierarchy for more
7. A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Non-terminals Non-terminals
8. A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Terminals
9. A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions
10. A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Starting symbol
11. Productions
Each production is a rule.
You can replace the symbol on the left with
symbol(s) on the right.
‘+’ means “one or more”; ‘*’ means “zero or
more”
Non-terminals can be recursively defined, and
appear on left- and right-side of rules
Terminals only appear on the right side of a rule
If you imagine a tree, non-terminals are interior
nodes, terminals are leaf nodes (we’ll see this
more later)
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
17. Some other strings that this grammar can generate
aaabbbbababaaaabbbb
abbbbbbbbb
abababababababab
aabbaabbbbbaaaaaab
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
18. Running a grammar “forwards”
generates strings that conform to a
grammar
19. Running a grammar “backwards” over a
string tells us if that string is valid,
according to the grammar
20. Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
21. Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
22. Running the CFG backwards
A bbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
23. Running the CFG backwards
A B A B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
24. Running the CFG backwards
AB AB S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
25. Running the CFG backwards
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
26. Running the CFG backwards
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
VALID!
28. Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
29. Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
30. Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
31. Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
S
32. Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
S
Parse Tree
33. What is a parser?
● Program that takes some input data (usually a string), and produces a
data-structure (usually a parse tree), based on some grammar (usually a
context-free grammar)
34. Instaparse is a clojure library for building
parsers from context-free grammars
35. A grammar that recognizes runs of a’s and b’s
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
37. Hello, instaparse
instaparse-talk.core=> (require '[instaparse.core :as insta])
nil
instaparse-talk.core=> (def as-and-bs
#_=> (insta/parser
#_=> "S = AB*
#_=> AB = A B
#_=> A = 'a'+
#_=> B = 'b'+"))
#'instaparse-talk.core/as-and-bs
38. Hello, instaparse
instaparse-talk.core=> (as-and-bs "aaabbbaabb")
[:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]]
instaparse-talk.core=> (pprint *1)
[:S
[:AB [:A "a" "a" "a"] [:B "b" "b" "b"]]
[:AB [:A "a" "a"] [:B "b" "b"]]]
nil
instaparse-talk.core=> (insta/visualize (as-and-bs "aaabbbaabb"))
nil
39.
40. Walking the parse tree
Parse trees are just clojure data! We have a TON of great ways to handle them
● Recursive or iterative processing using case or core.match (pattern
matching)
● Zippers (functional navigation and “editing” of trees)
● insta/transform
● Seq/tree-seq
● Enlive (CSS-style selectors for clojure data structures)
● Any other way that you want to walk nested vectors in clojure!
41. Example: replacing a node with a zipper
instaparse-talk.core=> (-> (zip/vector-zip (as-and-bs "aaaabbbbaabbabbb"))
pprint)
[[:S
[:AB [:A "a" "a" "a" "a"] [:B "b" "b" "b" "b"]]
[:AB [:A "a" "a"] [:B "b" "b"]]
[:AB [:A "a"] [:B "b" "b" "b"]]]
nil]
nil
49. Wrapping up
● Parsers turn text into trees
● Clojure is great at walking through trees
● Instaparse makes it easy to parse things
○ Programming languages
○ Config files
○ Data
○ Lots more!
The docs for instaparse are amazing. A lot of my examples were lifted straight
from it. Read the docs. They’re great. Everyone on the project did a fantastic job
https://github.com/Engelberg/instaparse