Developing a Procedural
Language for PostgreSQL
            
I CAN HAZ P.L.Z PLZ?

          
MKAY... WHY?

 To learn how
MKAY... WHY?

 To learn how
 To get to speak at conferences
MKAY... WHY?

 To learn how
 To get to speak at conferences
 Achieve world domination
MKAY... WHY?

 To learn how
 To get to speak at conferences
 Achieve world domination
UM, WHAT?
WHAT'S LOLCODE?
                                1 Oh hai. In teh beginnin
1 In the beginning, some Guy
                   ...
WHAT'S LOLCODE
A lolcat is an image combining a photograph, most
frequently a cat, with a humorous and idiosyncratic capti...
UM... OKAY...
SO, LOLCODE?

Adam Lindsay had a revelation. From Ceiling Cat.


    quot;This is a love letter to very clever people who
...
CHEEZBURGER

 
CHEEZBURGER

 

      SPEEK LOLCODE
CHEEZBURGER

 

      SPEEK LOLCODE




      ANATUMMY OF A PL
CHEEZBURGER

 

       SPEEK LOLCODE

    FUNKSHUN KALL HANDLRZ


       ANATUMMY OF A PL
CHEEZBURGER

 

       SPEEK LOLCODE

    FUNKSHUN KALL HANDLRZ
      MAEK UN INTURPRETR
       ANATUMMY OF A PL
CHEEZBURGER

 

      SPEEK LOLCODE
I CAN SPEEK LOLCODE

 Data types
    NUMBR: Integer values
    NUMBAR: Floating point values
    YARN: String values
    N...
I CAN SPEEK LOLCODE
 Operators
   Arithmetic
      SUM OF x AN y, DIFF OF x AN y
      PRODUKT OF x AN y, QUOSHUNT OF x AN...
I CAN SPEEK LOLCODE
 Operators (cont.)
   Comparison
      BOTH SAEM x AN y
         WIN iff x == y
      DIFFRINT x AN y
...
I CAN SPEEK LOLCODE

  Comparison
 
BOTH SAEM ANIMAL AN quot;CATquot;, O RLY?
    YA RLY, VISIBLE quot;J00 HAV A CATquot;
...
I CAN SPEEK LOLCODE

  Case
COLOR, WTF?
    OMG quot;Rquot;
        VISIBLE   quot;RED FISHquot;
        GTFO
    OMG quot...
I CAN SPEEK LOLCODE

  Loops
 
IM IN YR <label> <operation> YR <variable>
  [TIL|WILE <expression>]
     <code block>
IM O...
I CAN SPEEK PL/LOLCODE

 PL/LOLCODE-specific stuff
    VISIBLE == PL/pgSQL RAISE
       Alt. VISIBLE <level> quot;Messageq...
I CAN SPEEK PL/LOLCODE

 PL/LOLCODE-specific stuff
    VISIBLE == PL/pgSQL RAISE
       Alt. VISIBLE <level> quot;Messageq...
I CAN SPEEK PL/LOLCODE
 PL/LOLCODE-specific stuff
    VISIBLE == PL/pgSQL RAISE
       Alt. VISIBLE <level> quot;Messagequ...
I CAN SPEEK PL/LOLCODE
HAI
    I HAS A X ITZ SMOOSH quot;MONORAILquot; quot;KITTYquot; MKAY
    I HAS A Y ITZ 3
    I HAS ...
CHEEZBURGER

 

      SPEEK LOLCODE




      ANATUMMY OF A PL
ANATUMMY OF A PL
 Function call handler
    plpgsql_call_handler, plperl_call_handler, etc.
    This function is responsib...
ANATUMMY OF A PL

 Where's the interpreter?
    Some PLs come with their own interpreter
       PL/pgSQL, SQL/PSM, PL/LOLC...
ANATUMMY OF A PL

 Where's the interpreter?
    Some PLs come with their own interpreter
       PL/pgSQL, SQL/PSM, PL/LOLC...
ANATUMMY OF A PL

 Compilation
   Commonly a function will be quot;compiledquot; the first time
   it is run in a session
...
ANATUMMY OF A PL

 Compilation
   Commonly a function will be quot;compiledquot; the first time
   it is run in a session
...
CHEEZBURGER

 

       SPEEK LOLCODE

    FUNKSHUN KALL HANDLRZ


       ANATUMMY OF A PL
FUNKSHUN KALL HANDLRZ

 Job of the function handler
    Figure out if it's a TRIGGER function
    Determine argument value...
FUNKSHUN KALL HANDLRZ

PG_MODULE_MAGIC;
Datum pl_lolcode_call_handler
(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(pl_lolcode_...
FUNKSHUN KALL HANDLRZ
     Get function information
       Form_pg_proc from server/catalog/pg_proc.h
 
Form_pg_proc procS...
FUNKSHUN KALL HANDLRZ

  Get function source
 
procsrcdatum = SysCacheGetAttr(PROCOID,
procTup, Anum_pg_proc_prosrc, &isnu...
FUNKSHUN KALL HANDLRZ

    Get return type
 
typeTup = SearchSysCache(TYPEOID, ObjectIdGetDatum
(procStruct->prorettype), ...
FUNKSHUN KALL HANDLRZ

    Get arguments and their types
 
for (i = 0; i < procStruct->pronargs; i++)
    {
        snprin...
FUNKSHUN KALL HANDLRZ

  Pass the function source to the interpreter
 
pllolcode_yy_scan_string(proc_source);
pllolcode_yy...
FUNKSHUN KALL HANDLRZ
   Return the proper value
if (returnTypeOID != VOIDOID) {
    if (returnVal->type == ident_NOOB) fc...
FUNKSHUN KALL HANDLRZ
 Note the function manager interface (fmgr.[h|c])
   InputFunctionCall
       Calls a previously det...
CHEEZBURGER

 

       SPEEK LOLCODE

    FUNKSHUN KALL HANDLRZ
      MAEK UN INTURPRETR
       ANATUMMY OF A PL
MAEK UN INTURPRETR

 Don't write the parser on your own.
   It's painful
   It's a lot of work
   It's easy to do wrong
  ...
MAEK UN INTURPRETR
 Tools for writing parsers and interpreters
    There are a fair number of them
       ANTLR
       Lex...
MAEK UN INTURPRETR
    Parser generation tools require a language definition
       Generally in a variant of BNF (Backus-...
MAEK UN INTURPRETR

 Flex + Bison == two parts 
    Flex = lexer
        Divides input into a set of labeled tokens
    Bi...
MAEK UN INTURPRETR

    Flex input is mostly straightforward
 
           /* Arithmetic functions */
quot;SUM OFquot;     ...
MAEK UN INTURPRETR

    State machine allows for processing blocks
 
OBTW                     { BEGIN LOL_BLOCKCMT; }
<LOL...
MAEK UN INTURPRETR

  Some code to make memory allocation work
 
#define   YYSTACK_FREE pfree
#define   YYSTACK_MALLOC pal...
MAEK UN INTURPRETR
     Flex makes a big stream of tokens
     Bison turns them into something useful
 
     Define the to...
MAEK UN INTURPRETR

 Bison's job (in this case) is to build a parse tree
    We don't actually execute anything until the ...
MAEK UN INTURPRETR
typedef struct lolNodeList {
        int length;
        lolNode *head;
        lolNode *tail;
} lolNod...
MAEK UN INTURPRETR

 Each lolNode is a particular statement
    VISIBLE
    ORLY
    IHASA
    ...
    Also, constants and...
MAEK UN INTURPRETR

 TROOF node
   Really simple -- handles a TROOF (Boolean) constant
   NodeType node_type = TROOF
   No...
MAEK UN INTURPRETR
 VISIBLE nodes
   Remember, VISIBLE rasies an exception
   Structure members:
      NodeType node_type ...
MAEK UN INTURPRETR

 SUMOF node
   SUM OF x AN y = x + y
   Structure members:
      lolNodeList *list == a two-element li...
MAEK UN INTURPRETR

 ORLY node
   O RLY? == if, YA RLY == then, MEBBE == else if, NO
   WAI == else
   lolNodeList *list p...
MAEK UN INTURPRETR

 So what does Bison need to make this work?
MAEK UN INTURPRETR

    Token definitions
       These describe the data Flex generates
 
%token FOUNDYR IHASA ITZ R
%toke...
MAEK UN INTURPRETR
     A definition of the grammar
 
lol_cmd:

        lol_expression EXPREND   { $$ = $1; }

        | O...
MAEK UN INTURPRETR

    A definition of the data type used in Bison's stack
 
%union     {
           int numbrVal;
      ...
MAEK UN INTURPRETR

    Type declarations to map various declared expressions
    to union members
 
%type   <nodeList> lo...
MAEK UN INTURPRETR

    As the parser sees token streams it can interpret as a
    particular object in the grammar, it ru...
MISELANEEUS

    DEVELOPMENT
 


    ENVIRONMENT




              INTRO TO SPI
SPI
   GIMMEH <var> OUTTA DATABUKKIT quot;<query>quot;
      Note that this only returns one value
   PL/LOLCODE uses a sp...
Datums
 Everything is a Datum
    What a Datum actually is is architecture-dependant
    sizeof(Datum) == sizeof(long) >= ...
Development Environment
 Configure with --enable-debug, --enable-cassert
     --enable-debug
        Include debugging sym...
I CAN HAZ NAP NOW?
U HAZ KWESCHUNZ?
KTHXBYE
Development Environment

jtolley@uber:~/devel/pgsql$ ps -ef | grep postgres
jtolley   7465 6283 1 13:23 ?          00:00:0...
Developing A Procedural Language For Postgre Sql
Developing A Procedural Language For Postgre Sql
Developing A Procedural Language For Postgre Sql
Developing A Procedural Language For Postgre Sql
Developing A Procedural Language For Postgre Sql
Upcoming SlideShare
Loading in …5
×

Developing A Procedural Language For Postgre Sql

2,822 views
2,696 views

Published on

Presentation presented by Joshua Tolley and PostgreSQL Conference West 08. Corresponding video is here: http://www.vimeo.com/3728119

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,822
On SlideShare
0
From Embeds
0
Number of Embeds
204
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Developing A Procedural Language For Postgre Sql

  1. 1. Developing a Procedural Language for PostgreSQL  
  2. 2. I CAN HAZ P.L.Z PLZ?  
  3. 3. MKAY... WHY? To learn how
  4. 4. MKAY... WHY? To learn how To get to speak at conferences
  5. 5. MKAY... WHY? To learn how To get to speak at conferences Achieve world domination
  6. 6. MKAY... WHY? To learn how To get to speak at conferences Achieve world domination
  7. 7. UM, WHAT?
  8. 8. WHAT'S LOLCODE? 1 Oh hai. In teh beginnin 1 In the beginning, some Guy Ceiling Cat maded teh skiez made an Image of a cat. And An da Urfs, but he did not the picture was without eated dem. caption, and void. 2 Da Urfs no had shapez An 2 And said Guy did make a haded dark face, An Ceiling caption unto his Image. And Cat rode invisible bike over it was funny, and the Guy did teh waterz. call it good. 3 At start, no has lyte. An 3 And this guy's Image did Ceiling Cat sayz, i can haz become an internet meme, all lite? An lite wuz. Web 2.0-ish and stuff. 4 An Ceiling Cat sawed teh 4 There was much rejoicing. lite, to seez stuffs...
  9. 9. WHAT'S LOLCODE A lolcat is an image combining a photograph, most frequently a cat, with a humorous and idiosyncratic caption in (often) broken English—a dialect which is known as quot;lolspeakquot;, or quot;kittehquot;. The name quot;lolcatquot; is a compound word of quot;LOLquot; and quot;catquot;.[1] Another name is cat macro, being a type of image macro.[2] Lolcats are created for photo sharing imageboards and other internet forums. Lolcats are similar to other anthropomorphic animal-based image macros such as the O RLY?  owl.[3]
  10. 10. UM... OKAY...
  11. 11. SO, LOLCODE? Adam Lindsay had a revelation. From Ceiling Cat. quot;This is a love letter to very clever people who are slightly bored. I had no idea there were so many of us out there.quot;   - FAQ, lolcode.com
  12. 12. CHEEZBURGER  
  13. 13. CHEEZBURGER   SPEEK LOLCODE
  14. 14. CHEEZBURGER   SPEEK LOLCODE ANATUMMY OF A PL
  15. 15. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ ANATUMMY OF A PL
  16. 16. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ MAEK UN INTURPRETR ANATUMMY OF A PL
  17. 17. CHEEZBURGER   SPEEK LOLCODE
  18. 18. I CAN SPEEK LOLCODE Data types NUMBR: Integer values NUMBAR: Floating point values YARN: String values NOOB: Null values TROOF: Boolean values (WIN / FAIL)
  19. 19. I CAN SPEEK LOLCODE Operators Arithmetic SUM OF x AN y, DIFF OF x AN y PRODUKT OF x AN y, QUOSHUNT OF x AN y MOD OF x AN y BIGGR OF x AN y, SMALLR OF x AN y   Boolean BOTH OF x AN y EITHER OF x AN y WON OF x AN y ALL OF x AN y [AN ...] MKAY ANY OF x AN y [AN ...] MKAY NOT x
  20. 20. I CAN SPEEK LOLCODE Operators (cont.) Comparison BOTH SAEM x AN y WIN iff x == y DIFFRINT x AN y FAIL iff x == y Concatenation SMOOSH x y z p d q ... MKAY  Concatenates infinitely many YARNs Casting MAEK x A <type> x IS NOW A <type>
  21. 21. I CAN SPEEK LOLCODE Comparison   BOTH SAEM ANIMAL AN quot;CATquot;, O RLY? YA RLY, VISIBLE quot;J00 HAV A CATquot; MEBBE BOTH SAEM ANIMAL AN quot;MAUSquot; VISIBLE quot;JOO HAV A MAUS? WTF?quot; NO WAI, VISIBLE quot;J00 SUXquot; OIC
  22. 22. I CAN SPEEK LOLCODE Case COLOR, WTF? OMG quot;Rquot; VISIBLE quot;RED FISHquot; GTFO OMG quot;Yquot; VISIBLE quot;YELLOW FISHquot; OMG quot;Gquot; OMG quot;Bquot; VISIBLE quot;FISH HAS A FLAVORquot; GTFO OMGWTF VISIBLE quot;FISH IS TRANSPARENTquot; OIC
  23. 23. I CAN SPEEK LOLCODE Loops   IM IN YR <label> <operation> YR <variable> [TIL|WILE <expression>] <code block> IM OUTTA YR <label>   IT variable The default when nothing else is specified Similar to Perl's $_
  24. 24. I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) Function arguments are named LOL1, LOL2, etc. automatically PL/LOLCODE isn't smart enough yet to let you give them arbitrary names
  25. 25. I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) LIAR LIAR PANTZ Function arguments are named LOL1, LOL2, etc. automatically ON FIRE!!!1 PL/LOLCODE isn't smart enough yet to let you give them arbitrary names
  26. 26. I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) Function arguments are named LOL1, LOL2, etc. automatically PL/LOLCODE isn't smart enough yet to let you give them arbitrary names As of a few nights ago, this is no longer true. W00T!!1
  27. 27. I CAN SPEEK PL/LOLCODE HAI I HAS A X ITZ SMOOSH quot;MONORAILquot; quot;KITTYquot; MKAY I HAS A Y ITZ 3 I HAS A Z ITZ BIGGR OF Y AN 15 BOTH SAEM Z AN Y, O RLY? YA RLY, VISIBLE quot;SUMTHINZ RONG!!quot; NO WAI, VISIBLE quot;THX. CAN I HAZ CHEEZBURGR?quot; OIC IM IN YR MYLOOP Y R DIFF OF Y AN 1 DIFFRINT Y AN BIGGR OF Y AN 5, O RLY? YA RLY, GTFO OIC IM OUTTA YR MYLOOP KTHXBYE
  28. 28. CHEEZBURGER   SPEEK LOLCODE ANATUMMY OF A PL
  29. 29. ANATUMMY OF A PL Function call handler plpgsql_call_handler, plperl_call_handler, etc. This function is responsible for executing each function call for a particular language CREATE LANGUAGE Must be written in a compiled language Must take language_handler as a parameter Function validator plpgsql_validator, plperl_validator Optional Validates functions when they're created Takes an OID argument TRUSTED vs. UNTRUSTED
  30. 30. ANATUMMY OF A PL Where's the interpreter? Some PLs come with their own interpreter PL/pgSQL, SQL/PSM, PL/LOLCODE, PL/Proxy Others use an interpreter from a shared library PL/Perl, PL/Python, PL/R, most everything else Some do other strange things PL/J Talks to a Java virtual machine via RMI calls PLs with built-in interpreters can use PostgreSQL's internal data types to avoid conversion overhead
  31. 31. ANATUMMY OF A PL Where's the interpreter? Some PLs come with their own interpreter PL/pgSQL, SQL/PSM, PL/LOLCODE, PL/Proxy Others use an interpreter from a shared library PL/Perl, PL/Python, PL/R, most everything else Some do other strange things PL/J Talks to a Java virtual machine via RMI calls PLs with built-in interpreters can use PostgreSQL's internal data types to avoid conversion overhead PL/LOLCODE isn't that smart yet
  32. 32. ANATUMMY OF A PL Compilation Commonly a function will be quot;compiledquot; the first time it is run in a session The function call handler looks up argument and return types only once Sometimes keeps the parse tree in memory Avoids having to re-parse the function next time it's called
  33. 33. ANATUMMY OF A PL Compilation Commonly a function will be quot;compiledquot; the first time it is run in a session The function call handler looks up argument and return types only once Sometimes keeps the parse tree in memory Avoids having to re-parse the function next time it's called PL/LOLCODE isn't that smart yet
  34. 34. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ ANATUMMY OF A PL
  35. 35. FUNKSHUN KALL HANDLRZ Job of the function handler Figure out if it's a TRIGGER function Determine argument values Determine return type Find the function source ...alternatively, find a pre-stored parse tree Execute the function Return the proper values
  36. 36. FUNKSHUN KALL HANDLRZ PG_MODULE_MAGIC; Datum pl_lolcode_call_handler (PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pl_lolcode_call_handler); Datum pl_lolcode_call_handler(PG_FUNCTION_ARGS) { /* ... */ }
  37. 37. FUNKSHUN KALL HANDLRZ  Get function information Form_pg_proc from server/catalog/pg_proc.h   Form_pg_proc procStruct; HeapTuple procTup; procTup = SearchSysCache(PROCOID, ObjectIdGetDatum(fcinfo- >flinfo->fn_oid), 0, 0, 0); if (!HeapTupleIsValid(procTup)) elog(ERROR, quot;Cache lookup failed for procedure %uquot;, fcinfo->flinfo->fn_oid); procStruct = (Form_pg_proc) GETSTRUCT(procTup); ReleaseSysCache(procTup);
  38. 38. FUNKSHUN KALL HANDLRZ Get function source   procsrcdatum = SysCacheGetAttr(PROCOID, procTup, Anum_pg_proc_prosrc, &isnull); if (isnull) elog(ERROR, quot;Function source is nullquot;); proc_source = DatumGetCString (DirectFunctionCall1(textout, procsrcdatum));
  39. 39. FUNKSHUN KALL HANDLRZ Get return type   typeTup = SearchSysCache(TYPEOID, ObjectIdGetDatum (procStruct->prorettype), 0, 0, 0); if (!HeapTupleIsValid(typeTup)) elog(ERROR, quot;Cache lookup failed for type %uquot;, procStruct->prorettype); typeStruct = (Form_pg_type) GETSTRUCT(typeTup); resultTypeIOParam = getTypeIOParam(typeTup); fmgr_info_cxt(typeStruct->typinput, &flinfo, TopMemoryContext);
  40. 40. FUNKSHUN KALL HANDLRZ Get arguments and their types   for (i = 0; i < procStruct->pronargs; i++) { snprintf(arg_name, 255, quot;LOL%dquot;, i+1); lolDeclareVar(arg_name); LOLifyDatum(fcinfo->arg[i], fcinfo->argnull[i], procStruct->proargtypes.values[i], arg_name); } LOLifyDatum() gets the typeStruct for the argument type, gets the argument value as a C string, and sticks it in PL/LOLCODE's variable table
  41. 41. FUNKSHUN KALL HANDLRZ Pass the function source to the interpreter   pllolcode_yy_scan_string(proc_source); pllolcode_yyparse(NULL); pllolcode_yylex_destroy();   More on this later
  42. 42. FUNKSHUN KALL HANDLRZ Return the proper value if (returnTypeOID != VOIDOID) { if (returnVal->type == ident_NOOB) fcinfo->isnull = true; else { if (returnTypeOID == BOOLOID) retval = InputFunctionCall(&flinfo, lolVarGetTroof(returnVal) == lolWIN ? quot;Tquot; : quot;Fquot;, resultTypeIOParam, -1); else { /* ... */ retval = InputFunctionCall(&flinfo, rettmp, resultTypeIOParam, -1); } } }
  43. 43. FUNKSHUN KALL HANDLRZ Note the function manager interface (fmgr.[h|c]) InputFunctionCall Calls a previously determined datatype input function OutputFunctionCall Opposite of InputFunctionCall DirectFunctionCall[1..9] Call a specific, named function with various numbers of arguments Require FmgrInfo structures to describe the functions to be called Read through fmgr.h and fmgr.c for more details
  44. 44. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ MAEK UN INTURPRETR ANATUMMY OF A PL
  45. 45. MAEK UN INTURPRETR Don't write the parser on your own. It's painful It's a lot of work It's easy to do wrong Someone else already has Writing the interpreter on your own is ok, but use a tool I wrote an LOLCODE interpreter to ... Learn how Be sure it was licensed the way I wanted it licensed (BSD)
  46. 46. MAEK UN INTURPRETR Tools for writing parsers and interpreters There are a fair number of them ANTLR Lex + Yacc Flex + Bison Parsers, language definitions, etc. are complex business Top-down vs. Bottom-up, Associativity rules... I chose Flex + Bison, because that's what PostgreSQL uses I've no particular expertise to choose based on merits I forgot most of my formal languages class :)
  47. 47. MAEK UN INTURPRETR Parser generation tools require a language definition Generally in a variant of BNF (Backus-Naur Form)   <symbol1> ::= <expression> <postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> quot;.quot; <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL> <zip-part> ::= <town-name> quot;,quot; <state-code> <ZIP-code> <EOL> <opt-jr-part> ::= quot;Sr.quot; | quot;Jr.quot; | <roman-numeral> | quot;quot;Thanks, Wikipedia
  48. 48. MAEK UN INTURPRETR Flex + Bison == two parts  Flex = lexer Divides input into a set of labeled tokens Bison = parser Given a stream of tokens, interpret them and act accordingly
  49. 49. MAEK UN INTURPRETR Flex input is mostly straightforward     /* Arithmetic functions */ quot;SUM OFquot; { return SUMOF; } quot;DIFF OFquot; { return DIFFOF; } quot;PRODUKT OFquot; { return PRODUKTOF; } quot;QUOSHUNT OFquot; { return QUOSHUNTOF; } Some regular expressions [A-Za-z][A-Za-z0-9_]+ { return IDENTIFIER; }
  50. 50. MAEK UN INTURPRETR State machine allows for processing blocks   OBTW { BEGIN LOL_BLOCKCMT; } <LOL_BLOCKCMT>TLDR { BEGIN 0; lol_lex_debug(quot;yylex: block commentquot;); } <LOL_BLOCKCMT>n { lineNumber++; } <LOL_BLOCKCMT>. {}
  51. 51. MAEK UN INTURPRETR Some code to make memory allocation work   #define YYSTACK_FREE pfree #define YYSTACK_MALLOC palloc #define YYFREE pfree #define YYMALLOC palloc ... %option noyyalloc noyyrealloc noyyfree PostgreSQL uses memory contexts, so the parser has to use them too.
  52. 52. MAEK UN INTURPRETR Flex makes a big stream of tokens Bison turns them into something useful   Define the tokens:   %token HAI KTHXBYE EXPREND %token YARN IDENTIFIER NUMBR NUMBAR %token TROOFWIN TROOFFAIL TROOF Define grammar and associated actions %% lol_program: HAI EXPREND lol_cmdblock KTHXBYE EXPREND { executeProgram($3); }
  53. 53. MAEK UN INTURPRETR Bison's job (in this case) is to build a parse tree We don't actually execute anything until the tree is built In some future day, we'll save the parse tree after the first execution, and avoid re-parsing each time a function is called Once the tree is built, we'll pass it to a function that will execute it Specifically, the data structure is a linked list where each node can point to the next node in the list as well as to a new list This is similar to the structure PL/pgSQL uses
  54. 54. MAEK UN INTURPRETR typedef struct lolNodeList { int length; lolNode *head; lolNode *tail; } lolNodeList; struct lolNode { NodeType node_type; struct lolNode *next; /* list for sub-nodes */ lolNodeList *list; NodeVal nodeVal; };
  55. 55. MAEK UN INTURPRETR Each lolNode is a particular statement VISIBLE ORLY IHASA ... Also, constants and variables are nodes YARN NUMBAR ... How each node uses the members of the lolNode struct depends on the node
  56. 56. MAEK UN INTURPRETR TROOF node Really simple -- handles a TROOF (Boolean) constant NodeType node_type = TROOF NodeVal nodeVal = WIN or FAIL When executed, this node simply sets IT to the WIN or FAIL value stored in nodeVal
  57. 57. MAEK UN INTURPRETR VISIBLE nodes Remember, VISIBLE rasies an exception Structure members: NodeType node_type = VISIBLE lolNodeList *list = pointer to a list of nodes containing: A constant string (a 1-node list) An expression evaluating to a string (possibly many nodes) NodeVal nodeVal = a number representing the message level (NOTICE, WARNING, etc.) When executed, this node runs the list, which puts a value in IT, and raises IT as an exception
  58. 58. MAEK UN INTURPRETR SUMOF node SUM OF x AN y = x + y Structure members: lolNodeList *list == a two-element list, where each node represents an operand The result is returned in IT
  59. 59. MAEK UN INTURPRETR ORLY node O RLY? == if, YA RLY == then, MEBBE == else if, NO WAI == else lolNodeList *list points to a set of nodes, one per YA RLY, MEBBE, and NO WAI block. Each YA RLY, MEBBE, and NO WAI block has a sub-list of its own, containing the instructions in the block MEBBE blocks also contain a list of nodes containing the MEBBE expression
  60. 60. MAEK UN INTURPRETR So what does Bison need to make this work?
  61. 61. MAEK UN INTURPRETR Token definitions These describe the data Flex generates   %token FOUNDYR IHASA ITZ R %token SUMOF DIFFOF PRODUKTOF QUOSHUNTOF MODOF BIGGROF SMALLROF AN %token BOTHOF EITHEROF WONOF NOT ALLOF ANYOF MKAY %token BOTHSAEM DIFFRINT These will turn into #define statements (e.g. #define DIFFRINT 123)
  62. 62. MAEK UN INTURPRETR A definition of the grammar   lol_cmd: lol_expression EXPREND { $$ = $1; } | ORLY EXPREND lol_orly_block OIC EXPREND { $$ = lolMakeNode(ORLY, tmpnval, $3); } | WTF EXPREND lol_wtf_block OIC EXPREND { $$ = lolMakeNode(WTF, tmpnval, $3); } | IMINYR IDENTIFIER EXPREND lol_cmdblock IMOUTTAYR IDENTIFIER EXPREND { /* TODO: Check $2 and $7 for equality */ $$ = lolMakeNode(IMINYR, tmpnval, $4); }
  63. 63. MAEK UN INTURPRETR A definition of the data type used in Bison's stack   %union { int numbrVal; double numbarVal; char *yarnVal; struct lolNodeList *nodeList; struct lolNode *node; } This is pretty much always a union
  64. 64. MAEK UN INTURPRETR Type declarations to map various declared expressions to union members   %type <nodeList> lol_program lol_cmdblock lol_orly_block %type <node> lol_const lol_var lol_cmd lol_xaryoprgroup %type <yarnVal> YARN %type <yarnVal> IDENTIFIER
  65. 65. MAEK UN INTURPRETR As the parser sees token streams it can interpret as a particular object in the grammar, it runs the code associated with that object, using $1, $2, etc. to identify objects on the stack   lol_expression: ... | IHASA lol_var ITZ lol_expression { $$ = lolMakeNode (DECL, ($2)->nodeVal, lolMakeList($4)); }
  66. 66. MISELANEEUS DEVELOPMENT   ENVIRONMENT INTRO TO SPI
  67. 67. SPI GIMMEH <var> OUTTA DATABUKKIT quot;<query>quot; Note that this only returns one value PL/LOLCODE uses a special parse tree node for SPI The expression that generates the query is stored in the node's sub-list Executing that list puts the text of the query in IT SPI sends that query to the server for processing The first column of the first row is stuck in the variable in the GIMMEH statement SPIval = SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1); LOLifyString(SPIval, SPI_gettypeid(SPI_tuptable->tupdesc, 1), node- >nodeVal.yarnVal);
  68. 68. Datums Everything is a Datum What a Datum actually is is architecture-dependant sizeof(Datum) == sizeof(long) >= sizeof(void *) >= 4 Different data types are stored in Datums in different ways You need to know what data type the datum is supposed to store SPI_tuptable is a global pointer to a structure that describes the last tuple returned by SPI Based on a Datum and knowledge of the type it represents, you can get useful values from it There are lots of functions, macros, etc. for processing Datums
  69. 69. Development Environment Configure with --enable-debug, --enable-cassert  --enable-debug Include debugging symbols in final library This allows things like gdb to know where in the program you are, and tell you about it Disables optimizations Makes executables bigger --enable-cassert Adds (in)sanity checks Makes things slower More likely to tell you if you do something really stupid cf. memory context problems
  70. 70. I CAN HAZ NAP NOW?
  71. 71. U HAZ KWESCHUNZ?
  72. 72. KTHXBYE
  73. 73. Development Environment jtolley@uber:~/devel/pgsql$ ps -ef | grep postgres jtolley 7465 6283 1 13:23 ? 00:00:00 postgres: jtolley jtolley [local] idle jtolley@uber:~/devel/pgsql$ gdb postgres 7465 GNU gdb 6.8-debian ... Attaching to program: /home/jtolley/devel/pgdb/bin/postgres, process 7465 Reading symbols from /usr/lib/libxml2.so.2...done. Loaded symbols for /usr/lib/libxml2.so.2 ... Loaded symbols for /home/jtolley/devel/pgdb/lib/postgresql/pllolcode.so 0xb8003430 in __kernel_vsyscall () (gdb)

×