Language processing patterns

648 views
593 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
648
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Language processing patterns

  1. 1. Language processing patterns Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team
  2. 2. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) An EBNF for the 101companies System company : 'company' STRING '{' department* '}' EOF; department : 'department' STRING '{' ('manager' employee) ('employee' employee)* department* '}'; employee : STRING '{' 'address' STRING 'salary' FLOAT '}'; STRING : '"' (~'"')* '"'; FLOAT : ('0'..'9')+ ('.' ('0'..'9')+)?; Nonterminal Terminal Grouping Repetition Context-free syntax
  3. 3. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Another EBNF for the 101companies System COMPANY : 'company'; DEPARTMENT : 'department'; EMPLOYEE : 'employee'; MANAGER : 'manager'; ADDRESS : 'address'; SALARY : 'salary'; OPEN : '{'; CLOSE : '}'; STRING : '"' (~'"')* '"'; FLOAT : ('0'..'9')+ ('.' ('0'..'9')+)?; WS : (' '|'r'? 'n'|'t')+; Lexical (= token level) syntax
  4. 4. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) What’s a language processor? A program that performs language processing: Acceptor Parser Analysis Transformation ...
  5. 5. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Language processing patterns manual patterns generative patterns 1. The Chopper Pattern 2. The Lexer Pattern 3. The Copy/Replace Pattern 4. The Acceptor Pattern 5. The Parser Pattern 6. The Lexer Generation Pattern 7. The Acceptor Generation Pattern 8. The Parser Generation Pattern 9. The Text-to-object Pattern 10.The Object-to-text Pattern 11.The Text-to-tree Pattern 12.The Tree-walk Pattern 13.The Parser Generation2 Pattern
  6. 6. The Chopper Pattern Approximates the Lexer Pattern
  7. 7. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Chopper Pattern Intent: Analyze text at the lexical level. Operational steps (run time): 1. Chop input into “pieces”. 2. Classify each piece. 3. Process classified pieces in a stream.
  8. 8. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Chopping input into pieces with java.util.Scanner scanner = new Scanner(new File(...)); Default delimiter is whitespace. The object is iterable.
  9. 9. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Tokens = classifiers of pieces of input public enum Token { COMPANY, DEPARTMENT, MANAGER, EMPLOYEE, NAME, ADDRESS, SALARY, OPEN, CLOSE, STRING, FLOAT, }
  10. 10. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) public static Token classify(String s) { if (keywords.containsKey(s)) return keywords.get(s); else if (s.matches(""[^"]*"")) return STRING; else if (s.matches("d+(.d*)?")) return FLOAT; else throw new RecognitionException(...); } Classify chopped pieces into keywords, floats, etc.
  11. 11. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Process token stream to compute salary total public static double total(String s) throws ... { double total = 0; Recognizer recognizer = new Recognizer(s); Token current = null; Token previous = null; while (recognizer.hasNext()) { current = recognizer.next(); if (current==FLOAT && previous==SALARY) total += Double.parseDouble(recognizer.getLexeme()); previous = current; } return total; } The test for previous to be equal SALARY is not mandatory here. When could it be needed?
  12. 12. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:javaScanner
  13. 13. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Declare an enum type for tokens. Set up instance of java.util.Scanner. Iterate over pieces (strings) returned by scanner. Classify pieces as tokens. Use regular expression matching. Implement operations by iteration over pieces. For example: Total: aggregates floats Cut: copy tokens, modify floats Summary of implementation aspects
  14. 14. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) A problem with the Chopper Pattern Input: company “FooBar Inc.” { ... Pieces: ‘company’, ‘“FooBar’, ‘Inc.”’, ‘{‘, ... There is no general rule for chopping the input into pieces.
  15. 15. The Lexer Pattern Fixes the Chopper Pattern
  16. 16. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Lexer Pattern Intent: Analyze text at the lexical level. Operational steps (run time): 1. Recognize token/lexeme pairs in input. 2. Process token/lexeme pairs in a stream.
  17. 17. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Terminology Token: classification of lexical unit. Lexeme: the string that makes up the unit.
  18. 18. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Lookahead-based decisions if (Character.isDigit(lookahead)) { // Recognize float ... token = FLOAT; return; } if (lookahead=='"') { // Recognize string ... token = STRING; return; } ... Inspect lookahead and decide on what token to recognize.
  19. 19. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Recognize floats if (Character.isDigit(lookahead)) { do { read(); } while (Character.isDigit(lookahead)); if (lookahead=='.') { read(); while (Character.isDigit(lookahead)) read(); } token = FLOAT; return; } "d+(.d*)?" The code essentially implements this regexp:
  20. 20. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:javaLexer
  21. 21. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Declare an enum type for tokens. Read characters one by one. Use lookahead for decision making. Consume all characters for lexeme. Build token/lexeme pairs. Implement operations by iteration over pairs. Summary of implementation aspects Other approaches use automata theory (DFAs).
  22. 22. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) A problem with the Lexer Pattern (for the concrete approach discussed) How do we get (back) the conciseness of regular expressions? if (Character.isDigit(lookahead)) { do { read(); } while (Character.isDigit(lookahead)); if (lookahead=='.') { read(); while (Character.isDigit(lookahead)) read(); } token = FLOAT; return; }
  23. 23. The Copy/Replace Pattern Builds on top of the Lexer Pattern
  24. 24. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Copy/Replace Pattern Intent: Transform text at the lexical level. Operational steps (run time): 1. Recognize token/lexeme pairs in input. 2. Process token/lexeme pairs in a stream. 1. Copy some lexemes. 2. Replace others.
  25. 25. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Precise copy for comparison public Copy(String in, String out) throws ... { Recognizer recognizer = new Recognizer(in); Writer writer = new OutputStreamWriter( new FileOutputStream(out)); String lexeme = null; Token current = null; while (recognizer.hasNext()) { current = recognizer.next(); lexeme = recognizer.getLexeme(); writer.write(lexeme); } writer.close(); }
  26. 26. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Copy/replace for cutting salaries in half ... lexeme = recognizer.getLexeme(); // Cut salary in half if (current == FLOAT && previous == SALARY) lexeme = Double.toString( (Double.parseDouble( recognizer.getLexeme()) / 2.0d)); writer.write(lexeme); ...
  27. 27. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:javaLexer
  28. 28. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Build on top of the Lexer Pattern. Processor writes to an output stream. Processor may maintain history such as “previous”. Summary of implementation aspects
  29. 29. The Acceptor Pattern Builds on top of the Lexer Pattern
  30. 30. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) EBNF for 101companies System company : 'company' STRING '{' department* '}' EOF; department : 'department' STRING '{' ('manager' employee) ('employee' employee)* department* '}'; employee : STRING '{' 'address' STRING 'salary' FLOAT '}'; Wanted: an acceptor
  31. 31. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Acceptor Pattern Intent: Verify syntactical correctness of input. Operational steps (run time): Recognize lexemes/tokens based on the Lexer Pattern. Match terminals. Invoke procedures for nonterminals. Commit to alternatives based on lookahead. Verify elements of sequences one after another. Communicate acceptance failure as exception. We assume a recursive descent parser as acceptor.
  32. 32. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) void department() { match(DEPARTMENT); match(STRING); match(OPEN); match(MANAGER); employee(); while (test(EMPLOYEE)) { match(EMPLOYEE); employee(); } while (test(DEPARTMENT)) department(); match(CLOSE); } department : 'department' STRING '{' ('manager' employee) ('employee' employee)* dept* '}'; Grammar production Corresponding procedure Recursive descent parsing
  33. 33. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) void department() { match(DEPARTMENT); match(STRING); match(OPEN); match(MANAGER); employee(); while (test(EMPLOYEE) || test(DEPARTMENT)) { if (test(EMPLOYEE)) { match(EMPLOYEE); employee(); } else department(); } match(CLOSE); } department : 'department' STRING '{' ('manager' employee) ( ('employee' employee) | dept )* '}'; Use of alternatives Recursive descent parsing A revised production
  34. 34. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo See class Acceptor.java http://101companies.org/wiki/ Contribution:javaParser
  35. 35. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Rules for recursive- descent parsers Each nonterminal becomes a (void) procedure. RHS symbols become statements. Terminals become “match” statements. Nonterminals become procedure calls. Symbol sequences become statement sequences. Star/plus repetitions become while loops with lookahead. Alternatives are selected based on lookahead.
  36. 36. The Parser Pattern Builds on top of the Acceptor Pattern
  37. 37. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Parser Pattern Intent: Make accessible syntactical structure. Operational steps (run time): Accept input based on the Acceptor Pattern. Invoke semantic actions along acceptance.
  38. 38. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) void department() { match(DEPARTMENT); match(STRING); match(OPEN); match(MANAGER); employee(); while (test(EMPLOYEE)) { match(EMPLOYEE); employee(); } while (test(DEPARTMENT)) department(); match(CLOSE); } For comparison: no semantic actions
  39. 39. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) void department() { match(DEPARTMENT); String name = match(STRING); match(OPEN); openDept(name); match(MANAGER); employee(true); while (test(EMPLOYEE)) { match(EMPLOYEE); employee(false); } while (test(DEPARTMENT)) department(); match(CLOSE); closeDept(name); } Handle events for open and close department
  40. 40. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) All handlers for companies protected void openCompany(String name) { } protected void closeCompany(String name) { } protected void openDept(String name) { } protected void closeDept(String name) { } protected void handleEmployee( boolean isManager, String name, String address, Double salary) { }
  41. 41. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) A parser that totals public class Total extends Parser { private double total = 0; public double getTotal() { return total; } protected void handleEmployee( boolean isFinal, String name, String address, Double salary) { total += salary; } }
  42. 42. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo See class Parser.java http://101companies.org/wiki/ Contribution:javaParser
  43. 43. (C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Summary Language processing is a programming domain. Grammars may define two levels of syntax: token/lexeme level (lexical level) tree-like structure level (context-free level) Both levels are implementable in parsers: Recursive descent for parsing Use generative tools (as discussed soon)

×