Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Monadic parser combinators
in C#
Speaker: Alexey Golub @Tyrrrz
Speaker: Alexey Golub @Tyrrrz
name: Alexey Golub
primary_occupation: Open Source Developer
pays_the_bills:
position: Senio...
Agenda
• What is a parser and what does it do?
• Formal theory of language and grammar
• Structural representation of cont...
What is a parser?
Speaker: Alexey Golub @Tyrrrz
“123 456,93”
What we see:
123 456,93
numeric literals
thousands separator
...
What does a parser do?
Speaker: Alexey Golub @Tyrrrz
Input
“<foo><bar/></foo>”
“<foo></bar>”
“hello world”
Parser
grammar ...
What are parsers used for?
• Data deserialization (JSON, XML, YAML)
• Static code analysis (ReSharper, TSLint)
• Syntax hi...
Formal language theory
Speaker: Alexey Golub @Tyrrrz
Language
Alphabet
set of allowed
characters
Words
set of valid
combin...
Formal grammar
Regular grammar
A → a, where A is non-terminal and a is
terminal
A → aB, where A and B are non-terminals
an...
Rule of thumb
Contains recursive
grammar
Context-free
Regular
Speaker: Alexey Golub @Tyrrrz
Syntax trees
• Context-free languages are structurally represented using syntax trees
• Syntax trees are used to make sens...
Example AST produced by C-like code
while (a != 0)
{
if (a > b)
{
a = a - b;
}
else
{
b = b - a;
}
}
return a;
Speaker: Al...
Loop/stack-based manual parsers
• Loop through all characters in the input
• Maintain context on a stack
Pros:
• Performan...
Parser generators
• Define grammar in a specialized language
• Generate consuming code in one of the supported languages
P...
Parser combinators
• Define grammar using higher-order functions
• Build complex parsers by combining simpler ones
Pros:
•...
Parsers vs combinators
Parser<T>:
(success, result, length) = f(input, offset=0)
Examples: Char('a'), String("foo"), Digit...
Parser combinators illustrated
Input: 10 + 5
Parser:
Number:
AtLeastOne(Digit)
THEN
Sign:
Many(WhiteSpace)
Or(‘+’, ‘-’, ‘*...
Live-coding time
Let’s develop a basic JSON parser using Sprache in C#
Speaker: Alexey Golub @Tyrrrz
Links
• JSON parser from earlier – https://github.com/Tyrrrz/DotNetFest2019
• Sprache – https://github.com/sprache/Sprache...
Thank you!
Speaker: Alexey Golub @Tyrrrz
.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой способ написания парсеров для сложных языков)
Upcoming SlideShare
Loading in …5
×

.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой способ написания парсеров для сложных языков)

31 views

Published on

Все когда-либо писали парсеры, но многие так и не знают как спарсить HTML без регулярных выражений. Очень длительное время концепция парсинга языков с рекурсивной грамматикой для меня была черной магией, а люди которые занимаются разработкой компиляторов и предметно-ориентированных языков вовсе казались волшебниками. Но это оказалось не так и сложно. В моем докладе я хочу вам рассказать о том что такое парсеры в целом, зачем они нужны и какие они бывают, а самое главное -- покажу как перейти от традиционных методик их написания к более удобному и понятному функциональному способу. В ходе презентации мы также напишем рабочий JSON парсер в качестве proof of concept.

Published in: Education
  • Be the first to comment

.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой способ написания парсеров для сложных языков)

  1. 1. Monadic parser combinators in C# Speaker: Alexey Golub @Tyrrrz
  2. 2. Speaker: Alexey Golub @Tyrrrz name: Alexey Golub primary_occupation: Open Source Developer pays_the_bills: position: Senior Software Developer company: Svitla Systems tech_stack: - C# - .NET Core - Azure/AWS links: - https://github.com/tyrrrz - https://twitter.com/tyrrrz - https://tyrrrz.me
  3. 3. Agenda • What is a parser and what does it do? • Formal theory of language and grammar • Structural representation of context-free grammars • Different ways to build a parser • The concept of “parser combinators” • Live-coding session (writing a JSON parser) Speaker: Alexey Golub @Tyrrrz
  4. 4. What is a parser? Speaker: Alexey Golub @Tyrrrz “123 456,93” What we see: 123 456,93 numeric literals thousands separator decimal separator numeric literal What we understand: What computer sees: byte[10] { 49, 50, 51, 32, 52, 53, 54, 44, 57, 51 } What we want computer to understand: new SyntacticComponents[] { new NumericLiteral(123), new ThousandsSeparator(), new NumericLiteral(456), new DecimalSeparator(), new NumericLiteral(93) }
  5. 5. What does a parser do? Speaker: Alexey Golub @Tyrrrz Input “<foo><bar/></foo>” “<foo></bar>” “hello world” Parser grammar + context Rejected invalid input Unexpected token “</bar>” expected “</foo>” Unexpected token “hello world” Domain objects new XElement(“foo”) { new XElement(“bar”) }
  6. 6. What are parsers used for? • Data deserialization (JSON, XML, YAML) • Static code analysis (ReSharper, TSLint) • Syntax highlighting (VS Code, Highlight.js) • Compilers, transpilers, interpreters (Roslyn, Markdig, Babel, SQL) • Template engines (Razor, Liquid, Scriban) • Natural language processing (Spellchecking, Translation) Speaker: Alexey Golub @Tyrrrz
  7. 7. Formal language theory Speaker: Alexey Golub @Tyrrrz Language Alphabet set of allowed characters Words set of valid combinations of characters or other words Grammar set of rules that define how words are generated
  8. 8. Formal grammar Regular grammar A → a, where A is non-terminal and a is terminal A → aB, where A and B are non-terminals and a is terminal Context-free grammar A → ⍺, where A is non-terminal and ⍺ is a string of terminals and/or non-terminals Speaker: Alexey Golub @Tyrrrz
  9. 9. Rule of thumb Contains recursive grammar Context-free Regular Speaker: Alexey Golub @Tyrrrz
  10. 10. Syntax trees • Context-free languages are structurally represented using syntax trees • Syntax trees are used to make sense of the input text Root Terminal node Non-terminal node Terminal node Terminal node Speaker: Alexey Golub @Tyrrrz
  11. 11. Example AST produced by C-like code while (a != 0) { if (a > b) { a = a - b; } else { b = b - a; } } return a; Speaker: Alexey Golub @Tyrrrz
  12. 12. Loop/stack-based manual parsers • Loop through all characters in the input • Maintain context on a stack Pros: • Performance • Fine-tuning • Debugging Cons: • Hard to write/read/maintain • Code is not expressive Speaker: Alexey Golub @Tyrrrz
  13. 13. Parser generators • Define grammar in a specialized language • Generate consuming code in one of the supported languages Pros: • Expressive • Language-agnostic Cons: • Overhead of an extra language • Can’t leverage the power of C# to write grammar Speaker: Alexey Golub @Tyrrrz
  14. 14. Parser combinators • Define grammar using higher-order functions • Build complex parsers by combining simpler ones Pros: • Expressive • Easy to write/read/maintain • Everything is in C# Cons: • Performance • Debugging Speaker: Alexey Golub @Tyrrrz
  15. 15. Parsers vs combinators Parser<T>: (success, result, length) = f(input, offset=0) Examples: Char('a'), String("foo"), Digit Combinator<T>: Parser<T> = f(parser1, parser2) Examples: Or(p1, p2), Many(p), DelimitedBy(p1, p2) Speaker: Alexey Golub @Tyrrrz
  16. 16. Parser combinators illustrated Input: 10 + 5 Parser: Number: AtLeastOne(Digit) THEN Sign: Many(WhiteSpace) Or(‘+’, ‘-’, ‘*’, ‘/’) Many(WhiteSpace) THEN Number: AtLeastOne(Digit) Speaker: Alexey Golub @Tyrrrz -> “10” -> ‘1’, ‘0’ -> “ + “ -> “ “ -> ‘+’ -> “ “ -> “5” -> ‘5’ Number (5)Number (10) PlusOperator
  17. 17. Live-coding time Let’s develop a basic JSON parser using Sprache in C# Speaker: Alexey Golub @Tyrrrz
  18. 18. Links • JSON parser from earlier – https://github.com/Tyrrrz/DotNetFest2019 • Sprache – https://github.com/sprache/Sprache • Parsing in C# by Federico Tomassetti – https://tomassetti.me/parsing-in-csharp • Formal grammar on Wikipedia – https://en.wikipedia.org/wiki/Formal_grammar Other .NET parser-combinator libraries: Superpower (C#), Pidgin (C#), FParsec (F#) Speaker: Alexey Golub @Tyrrrz
  19. 19. Thank you! Speaker: Alexey Golub @Tyrrrz

×