© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Language processing
Course "So...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Summary
2
This is an introduct...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Language definition components
...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Language processing components...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Language processing components...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
How to do language processors?...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Haskell-based language process...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
A Pico example
8
begin declare...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Components for Pico
9
Abstract...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Abstract syntax
10
type Name =...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Concrete syntax to be recogniz...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Syntax vs. recognizer
Syntax d...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Recognizer
Leverage parser com...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Parsec
(http://www.haskell.org...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Parser
Add synthesis of abstra...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Type checker
Check that ...
de...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Type system vs. checker
17
Typ...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Interpreter
Leverage natural s...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Pretty printer
Map abstract sy...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Assembly code
Stack-based asse...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Assembly code
21
type Label = ...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Compiler
Use state to keep tra...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Machine
Very much like the int...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Flow charts
24
begin declare i...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Flow charts
Define an abstract ...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Flow charts
26
See Haskell cod...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
“Dot” sublanguage for flowchart...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
All of “dot”
28
dot User’s Man...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Visualizer
Very much like the ...
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Concluding remarks
30
© 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle
Issues with Haskell study
31
L...
Upcoming SlideShare
Loading in …5
×

An introduction on language processing

559 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
559
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An introduction on language processing

  1. 1. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Language processing Course "Software Language Engineering" University of Koblenz-Landau Department of Computer Science Ralf Lämmel Software Languages Team 1
  2. 2. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Summary 2 This is an introduction to language processing. We classify components of a language definition. We classify components in language processing. We illustrate language processing for a simple language. The processing components are written in Haskell.
  3. 3. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Language definition components Concrete syntax (useful for parsing and humans) Abstract syntax (useful for most processing) Type system Extra rules Dynamic semantics Translation semantics Pragmatics 3
  4. 4. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Language processing components Recognizer (execute concrete syntax) Parser (return parse/syntax trees) Option 1: Return concrete parse trees Option 2: Return abstract syntax trees Imploder (parse tree to abstract syntax trees) Pretty printer or unparser (return text again) 4
  5. 5. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Language processing components cont’d Type checker Interpreter Compiler to machine language Translator / generator to high-level language Software visualizers (e.g., call graph or flow chart) Software analyzers (e.g., dead-code detection) Software transformers (e.g., dead-code elimination) Software metrics tools (e.g., cyclomatic complexity) IDE integration (coloring, navigation, ...) 5
  6. 6. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle How to do language processors? What programming techniques to use? What programming technologies to use? Code generators APIs / combinator libraries Metaprogramming frameworks How to leverage language definitions? 6 Let’s get somedata points today.
  7. 7. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Haskell-based language processors for a simple imperative language 7
  8. 8. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle A Pico example 8 begin declare input : natural,               output : natural,               repnr : natural,               rep : natural;       input := 14;       output := 1;       while input - 1 do           rep := output;           repnr := input;           while repnr - 1 do              output := output + rep;              repnr := repnr - 1           od;           input := input - 1       od end Factorial function in the simple imperative language Pico Pico is an illustrative language which has been used by the SEN1/SWAT team at CWI Amsterdam for many years.
  9. 9. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Components for Pico 9 Abstract syntax Recognizer Parser Type checker Interpreter Pretty printer Assembly code Compiler Machine Flow charts Visualizer https://github.com/slecourse/slecourse/tree/master/sources/pico/
  10. 10. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Abstract syntax 10 type Name = String data Type = NatType | StrType type Program = ([Decl], [Stm]) type Decl = (Name,Type) data Stm = Assign Name Expr | IfStm Expr [Stm] [Stm] | While Expr [Stm] data Expr = Id Name | NatCon Int | StrCon String | Add Expr Expr | Sub Expr Expr | Conc Expr Expr + implementationaldetails (“deriving”) See Haskell code online: AbstractSyntax.hs
  11. 11. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Concrete syntax to be recognized 11 Program ="begin" Declarations {Statement ";"} "end" ; Declarations = "declare" {Declaration ","}* ";" ; Declaration = Id ":" Type; Type = "natural" | "string" ; Statement = Id ":=" Expression | "if" Expression "then" {Statement ";"}* "else" {Statement ";"}* "fi" | "while" Expression "do" {Statement ";"}* "od" ; Expression = Id | String | Natural | "(" Expression ")" | Expression "+" Expression | Expression "-" Expression | Expression "||" Expression Id = [a-z][a-z0-9]* !>> [a-z0-9]; Natural = [0-9]+ ; String = """ !["]* """; We need to implement this grammar.
  12. 12. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Syntax vs. recognizer Syntax definition may be declarative / technology-independent. Unambiguous grammar needed for recognition. Think of dangling else or operator priorities. Recognizers may be hand-written. Recognizers specs may be non-declarative / technology-dependent. 12 The same is true for parsers.
  13. 13. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Recognizer Leverage parser combinators (Parsec) Recognizer = functional program. Parser combinators are monadic. 13 See Haskell code online: Recognizer.hs
  14. 14. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Parsec (http://www.haskell.org/haskellwiki/Parsec, 23 April 2013) “Parsec is an industrial strength, monadic parser combinator library for Haskell. It can parse context-sensitive, infinite look-ahead grammars but it performs best on predictive (LL[1]) grammars.” 14 See also: http://101companies.org/wiki/Technology:Parsec http://hackage.haskell.org/packages/archive/parsec/3.1.3/doc/html/Text-Parsec.html
  15. 15. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Parser Add synthesis of abstract syntax terms to recognizer. Concrete and abstract syntaxes are coupled up. 15 See Haskell code online: Parser.hs
  16. 16. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Type checker Check that ... declarations are unambiguous; all referenced variables are declared; all expected operand types agree with actual ones; ... 16 See Haskell code online: TypeChecker.hs
  17. 17. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Type system vs. checker 17 Type systems ... are based on formal/declarative specifications; they are possibly executable (perhaps inefficiently). Type checkers ... implement type systems efficiently; they provide useful error messages. See Haskell code online: TypeChecker.hs
  18. 18. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Interpreter Leverage natural semantics. Define it as a total function. Use a special error result. Error messages could also be provided that way. 18 See Haskell code online: Interpreter.hs
  19. 19. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Pretty printer Map abstract syntax to text. Composable documents. Vertical / horizontal composition. Indentation. Another case of a combinator library (= DSL). 19 See Haskell code online: PrettyPrinter.hs
  20. 20. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Assembly code Stack-based assembly language. Explicit notion of label and gotos. Could be translated to Bytecode or x86 or .... 20
  21. 21. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Assembly code 21 type Label = String type Name = String data Instr  = DclInt Name -- Reserve a memory location for an integer variable  | DclStr Name -- Reserve a memory location for a string variable  | PushNat Int -- Push integer constant on the stack  | PushStr String -- Push string constant on the stack  | Rvalue Name -- Push the value of a variable on the stack  | Lvalue Name -- Push the address of a variable on the stack  | AssignOp -- Assign value on top, to variable at address top-1  | AddOp -- Replace top two stack values by their sum  | SubOp -- Replace top two stack values by their difference  | ConcOp -- Replace top two stack values by their concatenation  | Label Label -- Associate a label with the next instruction  | Go Label -- Go to instruction with given label  | GoZero Label -- Go to instruction with given label, if top equals 0  | GoNonZero Label -- Go to instruction with given label, if top not equal to 0      deriving (Eq, Show, Read) See Haskell code online: AssemblyCode.hs
  22. 22. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Compiler Use state to keep track of label generator. Generate machine instructions compositionally. A stream could also be used for linear output. 22 See Haskell code online: Compiler.hs
  23. 23. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Machine Very much like the interpreter. Interpret the assembly code instructions. Maintain a “memory” abstraction. Maintain an “instruction” pointer. 23 See Haskell code online: Machine.hs
  24. 24. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Flow charts 24 begin declare input : natural,               output : natural,               repnr : natural,               rep : natural;       input := 14;       output := 1;       while input - 1 do           rep := output;           repnr := input;           while repnr - 1 do              output := output + rep;              repnr := repnr - 1           od;           input := input - 1       od end
  25. 25. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Flow charts Define an abstract syntax of flow charts. No concrete syntax needed. Abstract syntax is mapped to “dot” (graphviz). 25 See Haskell code online: FlowChart.hs
  26. 26. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Flow charts 26 See Haskell code online. type FlowChart = ([Box], [Arrow]) type Box = (Id, BoxType) type Id = String -- Identifier for boxes data BoxType  = Start  | End  | Decision Text  | Activity Text type Text = String -- Text to show type Arrow = ((Id, FromType), Id) data FromType  = FromStart  | FromActivity  | FromYes  | FromNo
  27. 27. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle “Dot” sublanguage for flowcharts 27 digraph FlowChart { id1 [label="Start", shape=box, style=bold]; id2 [label="End", shape=box, style=bold]; id3 [label=" input := 14 ", shape=box]; id4 [label=" output := 1 ", shape=box]; id5 [label=" input - 1 ", shape=diamond]; id6 [label=" rep := output ", shape=box]; id7 [label=" repnr := input ", shape=box]; id8 [label=" repnr - 1 ", shape=diamond]; id9 [label=" output := output + rep ", shape=box]; id10 [label=" repnr := repnr - 1 ", shape=box]; id1 -> id3 [label=" ", headport= n , tailport= s ] id3 -> id4 [label=" ", headport= n , tailport= s ] id4 -> id5 [label=" ", headport= n , tailport= s ] id5 -> id6 [label=" Yes ", headport= n , tailport= sw ] id6 -> id7 [label=" ", headport= n , tailport= s ] id7 -> id8 [label=" ", headport= n , tailport= s ] id8 -> id9 [label=" Yes ", headport= n , tailport= sw ] id9 -> id10 [label=" ", headport= n , tailport= s ] id10 -> id7 [label=" ", headport= n , tailport= s ] id8 -> id4 [label=" No ", headport= n , tailport= se ] id5 -> id2 [label=" No ", headport= n , tailport= se ] }
  28. 28. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle All of “dot” 28 dot User’s Manual, January 26, 2006 34 A Graph File Grammar The following is an abstract grammar for the DOT language. Terminals are shown in bold font and nonterminals in italics. Literal characters are given in single quotes. Parentheses ( and ) indicate grouping when needed. Square brackets [ and ] enclose optional items. Vertical bars | separate alternatives. graph → [strict] (digraph | graph) id ’{’ stmt-list ’}’ stmt-list → [stmt [’;’] [stmt-list ] ] stmt → attr-stmt | node-stmt | edge-stmt | subgraph | id ’=’ id attr-stmt → (graph | node | edge) attr-list attr-list → ’[’ [a-list ] ’]’ [attr-list] a-list → id ’=’ id [’,’] [a-list] node-stmt → node-id [attr-list] node-id → id [port] port → port-location [port-angle] | port-angle [port-location] port-location → ’:’ id | ’:’ ’(’ id ’,’ id ’)’ port-angle → ’@’ id edge-stmt → (node-id | subgraph) edgeRHS [attr-list] edgeRHS → edgeop (node-id | subgraph) [edgeRHS] subgraph → [subgraph id] ’{’ stmt-list ’}’ | subgraph id An id is any alphanumeric string not beginning with a digit, but possibly in- cluding underscores; or a number; or any quoted string possibly containing escaped quotes. An edgeop is -> in directed graphs and -- in undirected graphs. The language supports C++-style comments: /* */ and //.
  29. 29. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Visualizer Very much like the compiler. Generate low-level graph representation. Also very much like the pretty printer. Pretty print expressions and statement in boxes. 29 See Haskell code online: Visualizer.hs
  30. 30. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Concluding remarks 30
  31. 31. © 2012-13 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Issues with Haskell study 31 Locations missing Non-declarative implosion Handling of priorities More analyses of interest Data-flow analysis Control-flow analysis IDE integration needed Some of these issues are addressed by a Rascal implementation of Pico. http://www.rascal-mpl.org/

×