Successfully reported this slideshow.
Your SlideShare is downloading. ×

Alexey Golub - Writing parsers in c# | 3Shape Meetup

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 18 Ad

Alexey Golub - Writing parsers in c# | 3Shape Meetup

Download to read offline

Thorough introduction to language parsing in C#, overview of different approaches, and live-coding session that showcases how to build a working JSON parser using Sprache

Thorough introduction to language parsing in C#, overview of different approaches, and live-coding session that showcases how to build a working JSON parser using Sprache

Advertisement
Advertisement

More Related Content

Similar to Alexey Golub - Writing parsers in c# | 3Shape Meetup (20)

Advertisement

Recently uploaded (20)

Advertisement

Alexey Golub - Writing parsers in c# | 3Shape Meetup

  1. 1. Writing parsers in C# (“Projecting arbitrary character streams into C# objects using monadic parser combinators”) Speaker: Alexey Golub @Tyrrrz
  2. 2. What is a parser? • To parse — to resolve text into logical syntactic components • i.e. IEnumerable<T> Parse(IEnumerable<char> text) • e.g. double.Parse, XDocument.Parse
  3. 3. Where are parsers used? • Data deserialization (JSON, XML, YAML) • Static code analysis (ReSharper, TSLint) • Syntax highlighting (VS Code, Highlight.js) • Compilers, transpilers, interpreters (Roslyn, Markdig, Babel, SQL) • Template engines (Razor, Liquid, Scriban) • Natural language processing (Spellchecking, Translation)
  4. 4. What do parsers do? • Disambiguate text into domain objects • Assert that the text is well-formed 123 456,93 numeric literals thousands separator decimal separator numeric literal
  5. 5. Formal language theory • Alphabet – set of allowed characters • Language – set of words made from characters in alphabet • Grammar – set of rules that define how words are generated
  6. 6. Grammar types • Regular grammar – RHS of a production rule is a terminal or a terminal plus non-terminal • Context-free grammar – RHS of a production rule is a finite sequence of terminals and/or non-terminals
  7. 7. Rules of thumb • If a language has recursive grammar rules – it’s not regular • Regular grammar can be represented with regular expressions • Context-free grammar cannot be directly represented with regular expressions (in .NET)
  8. 8. Syntax trees • Primary goal of a parser is to break down text into syntactic components • Syntactic structure of context-free languages is represented by a syntax tree • Program can then further evaluate the syntax tree as required Root Terminal node Non-terminal node Terminal node Terminal node
  9. 9. Example AST produced by C-like code
  10. 10. Approaches • Loop/stack-based manual parsers • Loop through all characters in the input • Maintain context on a stack • Parser generators • Custom language that defines grammar • Compiles into code that you can execute • Parser combinators • Each parser is a delegate • Parsers can be combined into higher-order parsers
  11. 11. Example from JSON.net (manual parser)
  12. 12. ANTLR (parser generator)
  13. 13. Sprache (parser combinator)
  14. 14. Parser combinators • Start by building simple parsers • Combine them into more complex parsers • Repeat until you reach the root • Hierarchy of parsers should resemble target syntax tree
  15. 15. Parser combinators (illustrated) 10 + 5 NumberParser WhiteSpaceParser SignParser NumberParser THEN WhiteSpaceParser THEN SignParser THEN WhiteSpaceParser THEN NumberParser Number (5)Number (10) PlusOperator OperatorParser
  16. 16. Coding challenge Let’s develop a basic JSON parser
  17. 17. Further reading • Formal grammar on Wikipedia – https://en.wikipedia.org/wiki/Formal_grammar • Parsing in C# by Federico Tomassetti – https://tomassetti.me/parsing-in-csharp
  18. 18. Thank you! @Tyrrrz

×