SlideShare a Scribd company logo
1 of 51
Download to read offline
Instaduction to Instaparse
CAP-CLUG - 2019-03-13
Instaparse is a clojure library for building
parsers from context-free grammars
What is a parser?
● Program that takes some input data (usually a string), and produces a
data-structure (usually a parse tree), based on some grammar (usually a
context-free grammar)
What’s a context-free grammar?
Formal definition:
V = finite set of non-terminals or variables. Each variable represents a clause or a
phrase, or a syntactic category
𝚺 = finite set of terminals. The set of terminals is the alphabet of the language
R = finite relation from V to (V U 𝚺)*. Each member of R is a rewrite rule or
production
S = the starting symbol, must be an element of V
Adapted from https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions
What’s a context-free grammar?
The “context-free” bit means that the rules can always be applied, regardless of
the rest of the string (context).
There are other kinds of grammars, some more or less powerful than CFG’s. See
Chomsky Hierarchy for more
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Non-terminals Non-terminals
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Terminals
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Starting symbol
Productions
Each production is a rule.
You can replace the symbol on the left with
symbol(s) on the right.
‘+’ means “one or more”; ‘*’ means “zero or
more”
Non-terminals can be recursively defined, and
appear on left- and right-side of rules
Terminals only appear on the right side of a rule
If you imagine a tree, non-terminals are interior
nodes, terminals are leaf nodes (we’ll see this
more later)
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
AB S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
A B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
aaaaaa B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
aaaaaabb S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Some other strings that this grammar can generate
aaabbbbababaaaabbbb
abbbbbbbbb
abababababababab
aabbaabbbbbaaaaaab
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running a grammar “forwards”
generates strings that conform to a
grammar
Running a grammar “backwards” over a
string tells us if that string is valid,
according to the grammar
Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
A bbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
A B A B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
AB AB S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
VALID!
Let’s see that again, but not overwrite
the string
Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
S
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
S
Parse Tree
What is a parser?
● Program that takes some input data (usually a string), and produces a
data-structure (usually a parse tree), based on some grammar (usually a
context-free grammar)
Instaparse is a clojure library for building
parsers from context-free grammars
A grammar that recognizes runs of a’s and b’s
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
ABNF notation for instaparse
Hello, instaparse
instaparse-talk.core=> (require '[instaparse.core :as insta])
nil
instaparse-talk.core=> (def as-and-bs
#_=> (insta/parser
#_=> "S = AB*
#_=> AB = A B
#_=> A = 'a'+
#_=> B = 'b'+"))
#'instaparse-talk.core/as-and-bs
Hello, instaparse
instaparse-talk.core=> (as-and-bs "aaabbbaabb")
[:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]]
instaparse-talk.core=> (pprint *1)
[:S
[:AB [:A "a" "a" "a"] [:B "b" "b" "b"]]
[:AB [:A "a" "a"] [:B "b" "b"]]]
nil
instaparse-talk.core=> (insta/visualize (as-and-bs "aaabbbaabb"))
nil
Walking the parse tree
Parse trees are just clojure data! We have a TON of great ways to handle them
● Recursive or iterative processing using case or core.match (pattern
matching)
● Zippers (functional navigation and “editing” of trees)
● insta/transform
● Seq/tree-seq
● Enlive (CSS-style selectors for clojure data structures)
● Any other way that you want to walk nested vectors in clojure!
Example: replacing a node with a zipper
instaparse-talk.core=> (-> (zip/vector-zip (as-and-bs "aaaabbbbaabbabbb"))
pprint)
[[:S
[:AB [:A "a" "a" "a" "a"] [:B "b" "b" "b" "b"]]
[:AB [:A "a" "a"] [:B "b" "b"]]
[:AB [:A "a"] [:B "b" "b" "b"]]]
nil]
nil
Example: replacing a node with a zipper
Example: infix to postfix using case statements
1 + 2 * 3 - 4 / 5 1 2 3 * + 4 5 / -
Example: infix to postfix using case statements
Example: infix to postfix using case statements
Example: infix to postfix using case statements
Example: insta/transform
Apply this fn
To nodes that
match
magic!
Wrapping up
● Parsers turn text into trees
● Clojure is great at walking through trees
● Instaparse makes it easy to parse things
○ Programming languages
○ Config files
○ Data
○ Lots more!
The docs for instaparse are amazing. A lot of my examples were lifted straight
from it. Read the docs. They’re great. Everyone on the project did a fantastic job
https://github.com/Engelberg/instaparse
Thanks!
A cool way to visualize a CFG is with a railroad
diagram

More Related Content

Similar to Instaduction to instaparse

Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Theory of computation Lecture Slide(Chomsky Normal Form).pptx
Theory of computation Lecture Slide(Chomsky Normal Form).pptxTheory of computation Lecture Slide(Chomsky Normal Form).pptx
Theory of computation Lecture Slide(Chomsky Normal Form).pptxcustomersupport14
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneCarl Brown
 
Regular expression in javascript
Regular expression in javascriptRegular expression in javascript
Regular expression in javascriptToan Nguyen
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern miningAshis Chanda
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 
Abap 7 02 new features - new string functions
Abap 7 02   new features - new string functionsAbap 7 02   new features - new string functions
Abap 7 02 new features - new string functionsCadaxo GmbH
 
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...PROIDEA
 
JSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notJSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notChengHui Weng
 
Java/Scala Lab: Slava Schmidt - Introduction to Reactive Streams
Java/Scala Lab: Slava Schmidt - Introduction to Reactive StreamsJava/Scala Lab: Slava Schmidt - Introduction to Reactive Streams
Java/Scala Lab: Slava Schmidt - Introduction to Reactive StreamsGeeksLab Odessa
 
DATA STRUCTURES
DATA STRUCTURESDATA STRUCTURES
DATA STRUCTURESbca2010
 
bca data structure
bca data structurebca data structure
bca data structureshini
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseArangoDB Database
 

Similar to Instaduction to instaparse (20)

Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Theory of computation Lecture Slide(Chomsky Normal Form).pptx
Theory of computation Lecture Slide(Chomsky Normal Form).pptxTheory of computation Lecture Slide(Chomsky Normal Form).pptx
Theory of computation Lecture Slide(Chomsky Normal Form).pptx
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
 
Lex analysis
Lex analysisLex analysis
Lex analysis
 
RubyConf Argentina 2011
RubyConf Argentina 2011RubyConf Argentina 2011
RubyConf Argentina 2011
 
Regular expression in javascript
Regular expression in javascriptRegular expression in javascript
Regular expression in javascript
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
Abap 7 02 new features - new string functions
Abap 7 02   new features - new string functionsAbap 7 02   new features - new string functions
Abap 7 02 new features - new string functions
 
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
 
Haskell Jumpstart
Haskell JumpstartHaskell Jumpstart
Haskell Jumpstart
 
JSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notJSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why not
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Cleancode
CleancodeCleancode
Cleancode
 
T5 2017 database_searching_v_upload
T5 2017 database_searching_v_uploadT5 2017 database_searching_v_upload
T5 2017 database_searching_v_upload
 
Java/Scala Lab: Slava Schmidt - Introduction to Reactive Streams
Java/Scala Lab: Slava Schmidt - Introduction to Reactive StreamsJava/Scala Lab: Slava Schmidt - Introduction to Reactive Streams
Java/Scala Lab: Slava Schmidt - Introduction to Reactive Streams
 
DATA STRUCTURES
DATA STRUCTURESDATA STRUCTURES
DATA STRUCTURES
 
bca data structure
bca data structurebca data structure
bca data structure
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph Database
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Instaduction to instaparse

  • 2. Instaparse is a clojure library for building parsers from context-free grammars
  • 3. What is a parser? ● Program that takes some input data (usually a string), and produces a data-structure (usually a parse tree), based on some grammar (usually a context-free grammar)
  • 4. What’s a context-free grammar? Formal definition: V = finite set of non-terminals or variables. Each variable represents a clause or a phrase, or a syntactic category 𝚺 = finite set of terminals. The set of terminals is the alphabet of the language R = finite relation from V to (V U 𝚺)*. Each member of R is a rewrite rule or production S = the starting symbol, must be an element of V Adapted from https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions
  • 5. What’s a context-free grammar? The “context-free” bit means that the rules can always be applied, regardless of the rest of the string (context). There are other kinds of grammars, some more or less powerful than CFG’s. See Chomsky Hierarchy for more
  • 6. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 7. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Non-terminals Non-terminals
  • 8. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Terminals
  • 9. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Productions
  • 10. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Starting symbol
  • 11. Productions Each production is a rule. You can replace the symbol on the left with symbol(s) on the right. ‘+’ means “one or more”; ‘*’ means “zero or more” Non-terminals can be recursively defined, and appear on left- and right-side of rules Terminals only appear on the right side of a rule If you imagine a tree, non-terminals are interior nodes, terminals are leaf nodes (we’ll see this more later) S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 12. Productions example S S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 13. Productions example AB S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 14. Productions example A B S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 15. Productions example aaaaaa B S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 16. Productions example aaaaaabb S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 17. Some other strings that this grammar can generate aaabbbbababaaaabbbb abbbbbbbbb abababababababab aabbaabbbbbaaaaaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 18. Running a grammar “forwards” generates strings that conform to a grammar
  • 19. Running a grammar “backwards” over a string tells us if that string is valid, according to the grammar
  • 20. Running the CFG backwards aaaaaabbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 21. Running the CFG backwards aaaaaabbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 22. Running the CFG backwards A bbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 23. Running the CFG backwards A B A B S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 24. Running the CFG backwards AB AB S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 25. Running the CFG backwards S S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 26. Running the CFG backwards S S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ VALID!
  • 27. Let’s see that again, but not overwrite the string
  • 28. Running the CFG backwards aaaaaabbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 29. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B
  • 30. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B AB AB
  • 31. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B AB AB S
  • 32. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B AB AB S Parse Tree
  • 33. What is a parser? ● Program that takes some input data (usually a string), and produces a data-structure (usually a parse tree), based on some grammar (usually a context-free grammar)
  • 34. Instaparse is a clojure library for building parsers from context-free grammars
  • 35. A grammar that recognizes runs of a’s and b’s S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 36. ABNF notation for instaparse
  • 37. Hello, instaparse instaparse-talk.core=> (require '[instaparse.core :as insta]) nil instaparse-talk.core=> (def as-and-bs #_=> (insta/parser #_=> "S = AB* #_=> AB = A B #_=> A = 'a'+ #_=> B = 'b'+")) #'instaparse-talk.core/as-and-bs
  • 38. Hello, instaparse instaparse-talk.core=> (as-and-bs "aaabbbaabb") [:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]] instaparse-talk.core=> (pprint *1) [:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]] nil instaparse-talk.core=> (insta/visualize (as-and-bs "aaabbbaabb")) nil
  • 39.
  • 40. Walking the parse tree Parse trees are just clojure data! We have a TON of great ways to handle them ● Recursive or iterative processing using case or core.match (pattern matching) ● Zippers (functional navigation and “editing” of trees) ● insta/transform ● Seq/tree-seq ● Enlive (CSS-style selectors for clojure data structures) ● Any other way that you want to walk nested vectors in clojure!
  • 41. Example: replacing a node with a zipper instaparse-talk.core=> (-> (zip/vector-zip (as-and-bs "aaaabbbbaabbabbb")) pprint) [[:S [:AB [:A "a" "a" "a" "a"] [:B "b" "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]] [:AB [:A "a"] [:B "b" "b" "b"]]] nil] nil
  • 42. Example: replacing a node with a zipper
  • 43. Example: infix to postfix using case statements 1 + 2 * 3 - 4 / 5 1 2 3 * + 4 5 / -
  • 44. Example: infix to postfix using case statements
  • 45.
  • 46. Example: infix to postfix using case statements
  • 47. Example: infix to postfix using case statements
  • 48. Example: insta/transform Apply this fn To nodes that match magic!
  • 49. Wrapping up ● Parsers turn text into trees ● Clojure is great at walking through trees ● Instaparse makes it easy to parse things ○ Programming languages ○ Config files ○ Data ○ Lots more! The docs for instaparse are amazing. A lot of my examples were lifted straight from it. Read the docs. They’re great. Everyone on the project did a fantastic job https://github.com/Engelberg/instaparse
  • 51. A cool way to visualize a CFG is with a railroad diagram