Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SWP - A Generic Language Parser


Published on

This talk was part tongue in cheek, part serious, but entirely fun and given twice as a lightning talk - once at Europython & once at the ACCU python uk 05. It presents a generic python like language parser which does actually work. Think of it as an alternative to brackets in Lisp!

Published in: Economy & Finance, Technology
  • Be the first to comment

SWP - A Generic Language Parser

  1. 1. “SWP” A Generic Language Parser (Gloop?) (SWP == Semantic Whitespace Parser for want of a better name) Michael Sparks
  2. 2. Parse Anything Got bored of seeing “use Prothon”... “no” ● Hacking python to add a keyword whilst trivial ● wasn't trivial enough Got bored of seeing “use prothon's replacement” ● Thought it might be a fun thing to try ● Got very bored of seeing “use the replacement for ● prothon's replacement” etc ●
  3. 3. Parse Anything Parse this: def displayResult(result,quiet): if not quiet: print quot;The result of parsing your program:quot; print result print if not result: print quot;Rule match/evaluation orderquot; for rule in r: print quot; quot;, rule end end else: if result is None: print quot;Parse failedquot; else: print quot;Successquot; end end end
  4. 4. Parse Anything # Parse this: # Sample logo like language using the parser # shape square: pen down repeat 4: forward 10 rotate 90 end pen up end repeat (360/5): square() rotate 5 end
  5. 5. Parse Anything Parse this: # # Example based on defining grammars for L-Systems. # OBJECT tree L_SYSTEM: ROOT G RULES: G -> T { G } { A G } { B G } { CG} (0.00 .. 0.15) G -> T { A B G } { B A G } { C AG} (0.15 .. 0.30) G -> T { A C G } { B B G } { C BG} (0.30 .. 0.45) G -> T { A A G } { B C G } { C CG} (0.45 .. 0.60) G -> T { A G } { C G } (0.70 .. 0.80) G -> T { A G } { B G } (0.80 .. 0.95) G -> T { A G } (0.95 .. 1.00) T -> T (0.00 .. 0.75) ENDRULES ENDOBJECT
  6. 6. Parse Anything Parse this: # # An SML-like language using this parser. # structure Stk = struct : exception EmptyStack_exception datatype 'x stack = EmptyStack | push of ('x * 'x stack) fun pop(push(x,y)) = y fun pop EmptyStack = raise EmptyStack_exception fun top(push(x,y)) = x fun top EmptyStack = raise EmptyStack_exception end
  7. 7. Parse Anything, etc EXPORT OBJECT person: PRIVATE: flat name, telephone address::PTR TO LONG telephone ENDATTRS ENDOBJECT PROC compare_address(address1::PTR TO LONG, address2::PTR TO LONG): # Returns *TRUE* if the address2 exists _inside address1 DEF result=TRUE, f FOR f:=0 TO 5: IF address2[f]: IF Not(((StrLen address2[f])==0) AND ((StrLen address1[f])==0)): # The following line incorrectly(?) says that a # NULL string does not exist inside a NULL string. # The IF above corrects this result:=result AND ( ((InStr address1[f],address2[f])<>-1) OR ((StrLe ENDIF ENDIF ENDFOR ENDPROC result
  8. 8. Parse This?! OBJECT tree L_SYSTEM: ROOT G structure Stk = struct : exception EmptyStack_exception if (__name__ == quot;__main__quot;): datatype 'x stack = EmptyStack | push of ('x * 'x import sys stack) assign lexonly False shape square: assign trace False repeat 4: for fields in using query: forward 10 SELECT fname, lname,, : rotate 90 FROM tcontact, tsite end WHERE table_contact.objid = quot;CONTIDquot; end AND table_site.objid = quot;SITEIDquot; end ENDSELECT RULES: endfor G -> T { A G } { C G } (0.70 .. 0.80) if sys.argv[1]: G -> T { A G } { B G } (0.80 .. 0.95) assign source open(sys.argv[1]).read() G -> T { A G } (0.95 .. 1.00) else: ENDRULES assign source quot;junkquot; ENDOBJECT end end
  9. 9. Parsed! ['program', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'OBJECT'], ['factorlist', ['factorlist', ['factorlist', ['ID', 'tree']], ['trailedfactor', ['ID', ● 'L_SYSTEM'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'ROOT'], ['factorlist', ['ID', 'G']]]]], ['statement_list', ['assignment', '=', ['explist', ['functioncall', ['ID', 'structure'], ['factorlist', ['ID', 'Stk']]]], ['explist', ['functioncall', ['trailedfactor', ['ID', 'struct'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'exception'], ['factorlist', ['ID', 'EmptyStack_exception']]]]], ['statement_list', ['assignment', '=', ['explist', ['functioncall', ['ID', 'datatype'], ['factorlist', ['factorlist', ['ID', quot;'xquot;]], ['ID', 'stack']]]], ['explist', ['infixepr', '|', ['ID', 'EmptyStack'], ['explist', ['functioncall', ['ID', 'push'], ['factorlist', ['factorlist', ['ID', 'of']], ['bracketedexpression', ['bracketedexpression', ['explist', ['infixepr', '*', ['ID', quot;'xquot;], ['explist', ['functioncall', ['ID', quot;'xquot;], ['factorlist', ['ID', 'stack']]]]]]]]]]]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'shape'], ['factorlist', ['factorlist', ['trailedfactor', ['ID', 'square'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'repeat'], ['factorlist', ['factorlist', ['trailedfactor', ['number', 4], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'forward'], ['factorlist', ['number', 10]]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'rotate'], ['factorlist', ['number', 90]]]]]]]]]]], ['ID', 'end']]]]]]]]]], ['ID', 'end']]]]]]]]]]], ['factorlist', ['ID', 'end']]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['trailedfactor', ['ID', 'RULES'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['infixepr', '->', ['ID', 'G'], ['explist', ['functioncall', ['ID', 'T'], ['factorlist', ['factorlist', ['factorlist', ['constructorexpression', ['constructorexpression', ['explist', ['functioncall', ['ID', 'A'], ['factorlist', ['ID', 'G']]]]]]], ['constructorexpression', ['constructorexpression', ['explist', ['functioncall', ['ID', 'C'], ['factorlist', ['ID', 'G']]]]]]], ['bracketedexpression', ['bracketedexpression', ['explist', ['infixepr', '..', ['dottedfactor', ['number', 0], ['attribute', ['number', 70]]], ['explist', ['expression', ['dottedfactor', ['number', 0], ['attribute', ['number', 80]]]]]]]]]]]]]]], ['statement_list', ['exprstatement', ['explist', ['infixepr', '->', ['ID', 'G'], ['explist', ['functioncall', ['ID', 'T'], ['factorlist', ['factorlist', ['factorlist', ['constructorexpression', ['constructorexpression', ['explist', ['functioncall', ['ID', 'A'], ['factorlist', ['ID', 'G']]]]]]], ['constructorexpression', ['constructorexpression', ['explist', ['functioncall', ['ID', 'B'], ['factorlist', ['ID', 'G']]]]]]], ['bracketedexpression', ['bracketedexpression', ['explist', ['infixepr', '..', ['dottedfactor', ['number', 0], ['attribute', ['number', 80]]], ['explist', ['expression', ['dottedfactor', ['number', 0], ['attribute', ['number', 95]]]]]]]]]]]]]]], ['statement_list', ['exprstatement', ['explist', ['infixepr', '->', ['ID', 'G'], ['explist', ['functioncall', ['ID', 'T'], ['factorlist', ['factorlist', ['constructorexpression', ['constructorexpression', ['explist', ['functioncall', ['ID', 'A'], ['factorlist', ['ID', 'G']]]]]]], ['bracketedexpression', ['bracketedexpression', ['explist', ['infixepr', '..', ['dottedfactor', ['number', 0], ['attribute', ['number', 95]]], ['explist', ['expression', ['dottedfactor', ['number', 1], ['attribute', ['number', 0]]]]]]]]]]]]]]]]]]]]], ['factorlist', ['ID', 'ENDRULES']]]]]]]]]]]], ['ID', 'ENDOBJECT']]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'if'], ['factorlist', ['factorlist', ['trailedfactor', ['bracketedexpression', ['bracketedexpression', ['explist', ['infixepr', '==', ['ID', '__name__'], ['explist', ['expression', ['string', '__main__']]]]]]], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'import'], ['factorlist', ['ID', 'sys']]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'assign'], ['factorlist', ['factorlist', ['ID', 'lexonly']], ['ID', 'False']]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'assign'], ['factorlist', ['factorlist', ['ID', 'trace']], ['ID', 'False']]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'for'], ['factorlist', ['factorlist', ['factorlist', ['factorlist', ['factorlist', ['ID', 'fields']], ['ID', 'in']], ['ID', 'using']], ['trailedfactor', ['ID', 'query'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'SELECT'], ['factorlist', ['ID', 'first_name']]], ['explist', ['expression', ['ID', 'last_name']], ['explist', ['expression', ['dottedfactor', ['ID', 'table_contact'], ['attribute', ['ID', 'phone']]]], ['explist', ['expression', ['ID', 'e_mail']], ['explist', ['functioncall', ['dottedfactor', ['ID', 'table_site'], ['attribute', ['trailedfactor', ['ID', 'name'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'FROM'], ['factorlist', ['ID', 'table_contact']]], ['explist', ['expression', ['ID', 'table_site']]]]], ['statement_list', ['assignment', '=', ['explist', ['functioncall', ['ID', 'WHERE'], ['factorlist', ['dottedfactor', ['ID', 'table_contact'], ['attribute', ['ID', 'objid']]]]]], ['explist', ['expression', ['string', '<CASECONTACTID>']]]], ['statement_list', ['assignment', '=', ['explist', ['functioncall', ['ID', 'AND'], ['factorlist', ['dottedfactor', ['ID', 'table_site'], ['attribute', ['ID', 'objid']]]]]], ['explist', ['expression', ['string', '<CASESITEID>']]]]]]]]]]]], ['factorlist', ['ID', 'ENDSELECT']]]]]]]]]]]]]], ['ID', 'endfor']]]]], ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'if'], ['factorlist', ['factorlist', ['factorlist', ['dottedfactor', ['ID', 'sys'], ['attribute', ['trailedfactor', ['trailedfactor', ['ID', 'argv'], ['bracketedtrailer', ['explist', ['expression', ['number', 1]]]]], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'assign'], ['factorlist', ['factorlist', ['factorlist', ['ID', 'source']], ['ID', 'open']], ['dottedfactor', ['bracketedexpression', ['bracketedexpression', ['explist', ['expression', ['dottedfactor', ['ID', 'sys'], ['attribute', ['trailedfactor', ['ID', 'argv'], ['bracketedtrailer', ['explist', ['expression', ['number', 1]]]]]]]]]]], ['methodcall', 'read', ['bracketedexpression', None]]]]]]]]]]]]]], ['trailedfactor', ['ID', 'else'], ['blocktrailer', ['block', ['statement_list', ['exprstatement', ['explist', ['functioncall', ['ID', 'assign'], ['factorlist', ['factorlist', ['ID', 'source']], ['string', 'junk']]]]]]]]]], ['ID', 'end']]]]]]]]]]]]]], ['ID', 'end']]]]]]]]
  10. 10. Grammar (SLR) program -> block block -> BLOCKSTART statement_list BLOCKEND statement_list -> statement* statement -> (expression | expression ASSIGNMENT expression | ) EOL expression -> oldexpression (COMMA expression)* oldexpression -> (factor [factorlist] | factor INFIXOPERATOR expression ) factorlist -> factor* factor factor -> ( bracketedexpression | constructorexpression | NUMBER | STRING | ID | factor DOT dotexpression | factor trailer | factor trailertoo ) dotexpression -> (ID bracketedexpression | factor ) bracketedexpression -> BRA [ expression ] KET constructorexpression -> BRA3 [ expression ] KET3 trailer -> BRA2 expression KET2 trailertoo -> COLON EOL block
  11. 11. Notes Just uses a slightly modified PLY (1.5) ● All of the examples are parseable by the same ● parser – no changes to the lexer or parser. Just spits out a syntax tree ● Treats everything as a function ●
  12. 12. Everything's a function This is a function: ● if bar(bibble=>baz): bla bla bla bingle bongle else: babble babble this = bing Parsed as: Call function “if” with the arguments: ● bar(bibble=>baz), codeblock, “else”, codeblock, “endif”
  13. 13. Where...? http:/// ● ● I'd be curious to see someone put a lisp back end ● on it :-) Actually no, don't do that, someone might use this – then...