Parser Combinators                              Writing external DSLsWednesday, November 9, 2011
DSLs                   • DSL is an ultimate abstraction                   • You create a language that precisely          ...
Internal & External                   • DSL can be internal - made using the host                          programming lan...
Internal DSL: Good                   • No external tool dependencies                   • Nice IDE support                 ...
Internal DSL: Bad                   • It requires expressive language                   • If the language is not expressiv...
External DSL: Good                   • Platform independent                   • Nicer syntax                   • External ...
External DSLs: Bad                   • No IDE support                   • Hard to implementWednesday, November 9, 2011
External DSLs                   • External DSLs can be implemented using                          interpretation or code g...
parseJSON :: String -> Either Error JSONObjectWednesday, November 9, 2011
JSON Grammar-like stuff               object := { fields? }               fields := field (, field)*               field := id ...
Bison (C)                               ANTLR (Java and many others)                              You can encode grammar b...
function parseObject() {       match({)       if (lookahead(ID))         parseFields()       match(})     }Wednesday, Nove...
function parseFields() {       parseField()       while (lookahead(,)) {         consume()         parseField()       }   ...
• Parser can either consume some input or                          not.                   • Parser can fail or succeed.   ...
• Usually sequences are just sequential calls.                   • Optional parts of grammar are just IFs                 ...
function parseFields() {       separatedBy(,, parseField)     }     function separatedBy(ch, f) {       f()       while (l...
function sepBy(ch, p) {       return         seq(p, many(seq(ch, p)))     }Wednesday, November 9, 2011
Combinator patternWednesday, November 9, 2011
Haskell in 60 secondsWednesday, November 9, 2011
main = putStrLn "Hello World!"     times x n = x * n     times = x n -> x * n     times = x -> n -> x * n     times = (*) ...
import Text.Parsec     parse ::         Parser      -> SourceName      -> String      -> Either ParseError aWednesday, Nov...
p = satisfy (x -> x == a)     main = print (parse p "<test>" "abc")     > Right a     p = satisfy (x -> x == a)     main =...
-- There is something like this in Parsec     char c = satisfy (x -> x == c)     -- Example     p = char a     main = prin...
char c = satisfy (==c)   <?> show [c]Wednesday, November 9, 2011
p = char a >>= x ->         char b >>= y ->         return [x, y]     main = print (parse p "<test>" "abc")     > Right "a...
p = do char a            char b            return "ab"     main = print (parse p "<test>" "abc")     > Right "ab"Wednesday...
p = do char a            char b <|> char q            char c <|> char d     main = print (parse p "<test>" "abc")Wednesday...
letter = satisfy isAlpha <?> "letter"     digit = satisfy isDigit <?> "digit"     spaces = skipMany space <?> "white space...
p = do letter            many (letter <|> digit)     main = print (parse p "<test>" "hello123")     > Right "ello123"Wedne...
p = do x <- letter            xs <- many (letter <|> digit)            return (x:xs)     main = print (parse p "<test>" "h...
ident = do x <- letter                xs <- many (letter <|> digit)                return (x:xs)     p = ident     main = ...
ident = do x <- letter                xs <- many (letter <|> digit)                return (x:xs)     p = ident <?> "variab...
ident = do x <- letter                xs <- many (letter <|> digit)                return (x:xs)             <?> "variable...
ident = do x <- letter                xs <- many (letter <|> digit)                return (x:xs)             <?> "variable...
ident = do x <- letter                xs <- many (letter <|> digit)                return (x:xs)             <?> "variable...
Lexer-like exampleWednesday, November 9, 2011
type Name = String     data Expr = Number Integer               | Var Name               | Let Name Expr               | S...
myDef = emptyDef {         identStart = letter <|> char _,         identLetter = alphaNum <|> char _,         opStart = op...
tokenParser = PT.makeTokenParser myDef     parens = PT.parens tokenParser     naturalNumber = PT.natural tokenParser     k...
simple = numberLiteral <|> var     numberLiteral = do n <- naturalNumber                        return $ Number n     var ...
letStmt = do keyword "let"                  defs <- commaSep1 def                  op ";"                  return $ Seq de...
main = print (         parse letStmt "<test>" "let one = 1, two = 2;"       )     ----     Right (       Seq [         Let...
CSV Parser ExampleWednesday, November 9, 2011
import Text.Parsec     csvFile                  = line `endBy1` eol     line                     = cell `sepBy` (char ,)  ...
import scala.util.parsing.combinator._     object CSVParser extends RegexParsers {       override def skipWhitespace = fal...
Pysec for Python     Spirit for C++     In standard library of Scala     A lot of others...Wednesday, November 9, 2011
Stuff to read                   • Language Implementation Patterns:                          Create Your Own Domain-Specifi...
Stuff to read                   • Domain Specific Languages                          by Martin FowlerWednesday, November 9,...
Stuff to read                   • http://learnyouahaskell.com/                          Learn You a Haskell for a Great Go...
The EndWednesday, November 9, 2011
Upcoming SlideShare
Loading in …5
×

Parser combinators

4,079 views

Published on

Parsec parser combinators by Denis Yermakov.
http://twitter.com/lifecoder

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,079
On SlideShare
0
From Embeds
0
Number of Embeds
2,507
Actions
Shares
0
Downloads
22
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Parser combinators

  1. 1. Parser Combinators Writing external DSLsWednesday, November 9, 2011
  2. 2. DSLs • DSL is an ultimate abstraction • You create a language that precisely describes the problem you are trying to solve.Wednesday, November 9, 2011
  3. 3. Internal & External • DSL can be internal - made using the host programming language. • Or external - you invent a little language for solving your problem.Wednesday, November 9, 2011
  4. 4. Internal DSL: Good • No external tool dependencies • Nice IDE support • Type checking for free • Easy to implement • Gives full power of your language • Parsing is freeWednesday, November 9, 2011
  5. 5. Internal DSL: Bad • It requires expressive language • If the language is not expressive enough DSL will be verbose and not pleasant to use • It can produce obscure error messagesWednesday, November 9, 2011
  6. 6. External DSL: Good • Platform independent • Nicer syntax • External tool can perform static analysis • Good error messagesWednesday, November 9, 2011
  7. 7. External DSLs: Bad • No IDE support • Hard to implementWednesday, November 9, 2011
  8. 8. External DSLs • External DSLs can be implemented using interpretation or code generation. • Parse • Analyze • Interpret | Generate CodeWednesday, November 9, 2011
  9. 9. parseJSON :: String -> Either Error JSONObjectWednesday, November 9, 2011
  10. 10. JSON Grammar-like stuff object := { fields? } fields := field (, field)* field := id : value value := literal | object | arrayWednesday, November 9, 2011
  11. 11. Bison (C) ANTLR (Java and many others) You can encode grammar by handWednesday, November 9, 2011
  12. 12. function parseObject() { match({) if (lookahead(ID)) parseFields() match(}) }Wednesday, November 9, 2011
  13. 13. function parseFields() { parseField() while (lookahead(,)) { consume() parseField() } }Wednesday, November 9, 2011
  14. 14. • Parser can either consume some input or not. • Parser can fail or succeed. • Parser can return some value.Wednesday, November 9, 2011
  15. 15. • Usually sequences are just sequential calls. • Optional parts of grammar are just IFs • Zero or more elements is WHILE • Grammar rule is a functionWednesday, November 9, 2011
  16. 16. function parseFields() { separatedBy(,, parseField) } function separatedBy(ch, f) { f() while (lookahead(ch)) { consume() f() } }Wednesday, November 9, 2011
  17. 17. function sepBy(ch, p) { return seq(p, many(seq(ch, p))) }Wednesday, November 9, 2011
  18. 18. Combinator patternWednesday, November 9, 2011
  19. 19. Haskell in 60 secondsWednesday, November 9, 2011
  20. 20. main = putStrLn "Hello World!" times x n = x * n times = x n -> x * n times = x -> n -> x * n times = (*) times 2 3 2 `times` 3 data Point = Point Int Int data Either a b = Left a | Right b Point 10 20 Left “Hello”Wednesday, November 9, 2011
  21. 21. import Text.Parsec parse :: Parser -> SourceName -> String -> Either ParseError aWednesday, November 9, 2011
  22. 22. p = satisfy (x -> x == a) main = print (parse p "<test>" "abc") > Right a p = satisfy (x -> x == a) main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b"Wednesday, November 9, 2011
  23. 23. -- There is something like this in Parsec char c = satisfy (x -> x == c) -- Example p = char a main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" > expecting "a"Wednesday, November 9, 2011
  24. 24. char c = satisfy (==c) <?> show [c]Wednesday, November 9, 2011
  25. 25. p = char a >>= x -> char b >>= y -> return [x, y] main = print (parse p "<test>" "abc") > Right "ab"Wednesday, November 9, 2011
  26. 26. p = do char a char b return "ab" main = print (parse p "<test>" "abc") > Right "ab"Wednesday, November 9, 2011
  27. 27. p = do char a char b <|> char q char c <|> char d main = print (parse p "<test>" "abc")Wednesday, November 9, 2011
  28. 28. letter = satisfy isAlpha <?> "letter" digit = satisfy isDigit <?> "digit" spaces = skipMany space <?> "white space" many p = ... censored ...Wednesday, November 9, 2011
  29. 29. p = do letter many (letter <|> digit) main = print (parse p "<test>" "hello123") > Right "ello123"Wednesday, November 9, 2011
  30. 30. p = do x <- letter xs <- many (letter <|> digit) return (x:xs) main = print (parse p "<test>" "hello123") > Right "hello123"Wednesday, November 9, 2011
  31. 31. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) p = ident main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting letterWednesday, November 9, 2011
  32. 32. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) p = ident <?> "variable name" main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting variable nameWednesday, November 9, 2011
  33. 33. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" p = ident main = print (parse p "<test>" "123hello")Wednesday, November 9, 2011
  34. 34. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = string "let" p = ident <|> letKeyword main = print (parse p "<test>" "letter") > Right "letter"Wednesday, November 9, 2011
  35. 35. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = do s <- string "let" notFollowedBy (letter <|> digit) spaces return s p = try(letKeyword) <|> ident main = print (parse p "<test>" "letter") ----- Right "letter"Wednesday, November 9, 2011
  36. 36. Lexer-like exampleWednesday, November 9, 2011
  37. 37. type Name = String data Expr = Number Integer | Var Name | Let Name Expr | Seq [Expr] deriving ShowWednesday, November 9, 2011
  38. 38. myDef = emptyDef { identStart = letter <|> char _, identLetter = alphaNum <|> char _, opStart = opLetter myDef, opLetter = oneOf "=,;", reservedOpNames = [ ",", ";", "=" ], reservedNames = ["let"], caseSensitive = True }Wednesday, November 9, 2011
  39. 39. tokenParser = PT.makeTokenParser myDef parens = PT.parens tokenParser naturalNumber = PT.natural tokenParser keyword = PT.reserved tokenParser identifier = PT.identifier tokenParser op = PT.reservedOp tokenParser commaSep = PT.commaSep tokenParser commaSep1 = PT.commaSep1 tokenParserWednesday, November 9, 2011
  40. 40. simple = numberLiteral <|> var numberLiteral = do n <- naturalNumber return $ Number n var = do s <- identifier return $ Var sWednesday, November 9, 2011
  41. 41. letStmt = do keyword "let" defs <- commaSep1 def op ";" return $ Seq defs def = do name <- identifier op "=" value <- simple return $ Let name valueWednesday, November 9, 2011
  42. 42. main = print ( parse letStmt "<test>" "let one = 1, two = 2;" ) ---- Right ( Seq [ Let "one" (Number 1), Let "two" (Number 2) ] )Wednesday, November 9, 2011
  43. 43. CSV Parser ExampleWednesday, November 9, 2011
  44. 44. import Text.Parsec csvFile = line `endBy1` eol line = cell `sepBy` (char ,) cell = wsopt >> many (noneOf ",n") wsopt = many (oneOf " t") eol = char n main = print (parse csvFile "<test>" input) where input = "aa, b,cnd,e,fn" > Right [["aa","b","c"],["d","e","f"]]Wednesday, November 9, 2011
  45. 45. import scala.util.parsing.combinator._ object CSVParser extends RegexParsers { override def skipWhitespace = false def whitespace = """[ t]*"""r def csv = rep1sep(row, "n") <~ "n" def row = rep1sep(field, ",") def field = opt(whitespace) ~> """[^n,]""".r def parse(s: String) = parseAll(csv, s) } println(CSVParser.parse("a,b,cnd,e, fn")) > [3.1] parsed: List(List(a, b, c), List(d, e, f))Wednesday, November 9, 2011
  46. 46. Pysec for Python Spirit for C++ In standard library of Scala A lot of others...Wednesday, November 9, 2011
  47. 47. Stuff to read • Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages by Terence ParrWednesday, November 9, 2011
  48. 48. Stuff to read • Domain Specific Languages by Martin FowlerWednesday, November 9, 2011
  49. 49. Stuff to read • http://learnyouahaskell.com/ Learn You a Haskell for a Great GoodWednesday, November 9, 2011
  50. 50. The EndWednesday, November 9, 2011

×