Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Alexey Golub - Writing parsers in c# | 3Shape Meetup

15 views

Published on

Thorough introduction to language parsing in C#, overview of different approaches, and live-coding session that showcases how to build a working JSON parser using Sprache

Published in: Technology
  • Be the first to comment

Alexey Golub - Writing parsers in c# | 3Shape Meetup

  1. 1. Writing parsers in C# (“Projecting arbitrary character streams into C# objects using monadic parser combinators”) Speaker: Alexey Golub @Tyrrrz
  2. 2. What is a parser? • To parse — to resolve text into logical syntactic components • i.e. IEnumerable<T> Parse(IEnumerable<char> text) • e.g. double.Parse, XDocument.Parse
  3. 3. Where are parsers used? • Data deserialization (JSON, XML, YAML) • Static code analysis (ReSharper, TSLint) • Syntax highlighting (VS Code, Highlight.js) • Compilers, transpilers, interpreters (Roslyn, Markdig, Babel, SQL) • Template engines (Razor, Liquid, Scriban) • Natural language processing (Spellchecking, Translation)
  4. 4. What do parsers do? • Disambiguate text into domain objects • Assert that the text is well-formed 123 456,93 numeric literals thousands separator decimal separator numeric literal
  5. 5. Formal language theory • Alphabet – set of allowed characters • Language – set of words made from characters in alphabet • Grammar – set of rules that define how words are generated
  6. 6. Grammar types • Regular grammar – RHS of a production rule is a terminal or a terminal plus non-terminal • Context-free grammar – RHS of a production rule is a finite sequence of terminals and/or non-terminals
  7. 7. Rules of thumb • If a language has recursive grammar rules – it’s not regular • Regular grammar can be represented with regular expressions • Context-free grammar cannot be directly represented with regular expressions (in .NET)
  8. 8. Syntax trees • Primary goal of a parser is to break down text into syntactic components • Syntactic structure of context-free languages is represented by a syntax tree • Program can then further evaluate the syntax tree as required Root Terminal node Non-terminal node Terminal node Terminal node
  9. 9. Example AST produced by C-like code
  10. 10. Approaches • Loop/stack-based manual parsers • Loop through all characters in the input • Maintain context on a stack • Parser generators • Custom language that defines grammar • Compiles into code that you can execute • Parser combinators • Each parser is a delegate • Parsers can be combined into higher-order parsers
  11. 11. Example from JSON.net (manual parser)
  12. 12. ANTLR (parser generator)
  13. 13. Sprache (parser combinator)
  14. 14. Parser combinators • Start by building simple parsers • Combine them into more complex parsers • Repeat until you reach the root • Hierarchy of parsers should resemble target syntax tree
  15. 15. Parser combinators (illustrated) 10 + 5 NumberParser WhiteSpaceParser SignParser NumberParser THEN WhiteSpaceParser THEN SignParser THEN WhiteSpaceParser THEN NumberParser Number (5)Number (10) PlusOperator OperatorParser
  16. 16. Coding challenge Let’s develop a basic JSON parser
  17. 17. Further reading • Formal grammar on Wikipedia – https://en.wikipedia.org/wiki/Formal_grammar • Parsing in C# by Federico Tomassetti – https://tomassetti.me/parsing-in-csharp
  18. 18. Thank you! @Tyrrrz

×