Developing A Procedural Language For Postgre Sql

2,947 views

Published on

Presentation presented by Joshua Tolley and PostgreSQL Conference West 08. Corresponding video is here: http://www.vimeo.com/3728119

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,947
On SlideShare
0
From Embeds
0
Number of Embeds
213
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Developing A Procedural Language For Postgre Sql

  1. 1. Developing a Procedural Language for PostgreSQL  
  2. 2. I CAN HAZ P.L.Z PLZ?  
  3. 3. MKAY... WHY? To learn how
  4. 4. MKAY... WHY? To learn how To get to speak at conferences
  5. 5. MKAY... WHY? To learn how To get to speak at conferences Achieve world domination
  6. 6. MKAY... WHY? To learn how To get to speak at conferences Achieve world domination
  7. 7. UM, WHAT?
  8. 8. WHAT'S LOLCODE? 1 Oh hai. In teh beginnin 1 In the beginning, some Guy Ceiling Cat maded teh skiez made an Image of a cat. And An da Urfs, but he did not the picture was without eated dem. caption, and void. 2 Da Urfs no had shapez An 2 And said Guy did make a haded dark face, An Ceiling caption unto his Image. And Cat rode invisible bike over it was funny, and the Guy did teh waterz. call it good. 3 At start, no has lyte. An 3 And this guy's Image did Ceiling Cat sayz, i can haz become an internet meme, all lite? An lite wuz. Web 2.0-ish and stuff. 4 An Ceiling Cat sawed teh 4 There was much rejoicing. lite, to seez stuffs...
  9. 9. WHAT'S LOLCODE A lolcat is an image combining a photograph, most frequently a cat, with a humorous and idiosyncratic caption in (often) broken English—a dialect which is known as quot;lolspeakquot;, or quot;kittehquot;. The name quot;lolcatquot; is a compound word of quot;LOLquot; and quot;catquot;.[1] Another name is cat macro, being a type of image macro.[2] Lolcats are created for photo sharing imageboards and other internet forums. Lolcats are similar to other anthropomorphic animal-based image macros such as the O RLY?  owl.[3]
  10. 10. UM... OKAY...
  11. 11. SO, LOLCODE? Adam Lindsay had a revelation. From Ceiling Cat. quot;This is a love letter to very clever people who are slightly bored. I had no idea there were so many of us out there.quot;   - FAQ, lolcode.com
  12. 12. CHEEZBURGER  
  13. 13. CHEEZBURGER   SPEEK LOLCODE
  14. 14. CHEEZBURGER   SPEEK LOLCODE ANATUMMY OF A PL
  15. 15. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ ANATUMMY OF A PL
  16. 16. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ MAEK UN INTURPRETR ANATUMMY OF A PL
  17. 17. CHEEZBURGER   SPEEK LOLCODE
  18. 18. I CAN SPEEK LOLCODE Data types NUMBR: Integer values NUMBAR: Floating point values YARN: String values NOOB: Null values TROOF: Boolean values (WIN / FAIL)
  19. 19. I CAN SPEEK LOLCODE Operators Arithmetic SUM OF x AN y, DIFF OF x AN y PRODUKT OF x AN y, QUOSHUNT OF x AN y MOD OF x AN y BIGGR OF x AN y, SMALLR OF x AN y   Boolean BOTH OF x AN y EITHER OF x AN y WON OF x AN y ALL OF x AN y [AN ...] MKAY ANY OF x AN y [AN ...] MKAY NOT x
  20. 20. I CAN SPEEK LOLCODE Operators (cont.) Comparison BOTH SAEM x AN y WIN iff x == y DIFFRINT x AN y FAIL iff x == y Concatenation SMOOSH x y z p d q ... MKAY  Concatenates infinitely many YARNs Casting MAEK x A <type> x IS NOW A <type>
  21. 21. I CAN SPEEK LOLCODE Comparison   BOTH SAEM ANIMAL AN quot;CATquot;, O RLY? YA RLY, VISIBLE quot;J00 HAV A CATquot; MEBBE BOTH SAEM ANIMAL AN quot;MAUSquot; VISIBLE quot;JOO HAV A MAUS? WTF?quot; NO WAI, VISIBLE quot;J00 SUXquot; OIC
  22. 22. I CAN SPEEK LOLCODE Case COLOR, WTF? OMG quot;Rquot; VISIBLE quot;RED FISHquot; GTFO OMG quot;Yquot; VISIBLE quot;YELLOW FISHquot; OMG quot;Gquot; OMG quot;Bquot; VISIBLE quot;FISH HAS A FLAVORquot; GTFO OMGWTF VISIBLE quot;FISH IS TRANSPARENTquot; OIC
  23. 23. I CAN SPEEK LOLCODE Loops   IM IN YR <label> <operation> YR <variable> [TIL|WILE <expression>] <code block> IM OUTTA YR <label>   IT variable The default when nothing else is specified Similar to Perl's $_
  24. 24. I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) Function arguments are named LOL1, LOL2, etc. automatically PL/LOLCODE isn't smart enough yet to let you give them arbitrary names
  25. 25. I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) LIAR LIAR PANTZ Function arguments are named LOL1, LOL2, etc. automatically ON FIRE!!!1 PL/LOLCODE isn't smart enough yet to let you give them arbitrary names
  26. 26. I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) Function arguments are named LOL1, LOL2, etc. automatically PL/LOLCODE isn't smart enough yet to let you give them arbitrary names As of a few nights ago, this is no longer true. W00T!!1
  27. 27. I CAN SPEEK PL/LOLCODE HAI I HAS A X ITZ SMOOSH quot;MONORAILquot; quot;KITTYquot; MKAY I HAS A Y ITZ 3 I HAS A Z ITZ BIGGR OF Y AN 15 BOTH SAEM Z AN Y, O RLY? YA RLY, VISIBLE quot;SUMTHINZ RONG!!quot; NO WAI, VISIBLE quot;THX. CAN I HAZ CHEEZBURGR?quot; OIC IM IN YR MYLOOP Y R DIFF OF Y AN 1 DIFFRINT Y AN BIGGR OF Y AN 5, O RLY? YA RLY, GTFO OIC IM OUTTA YR MYLOOP KTHXBYE
  28. 28. CHEEZBURGER   SPEEK LOLCODE ANATUMMY OF A PL
  29. 29. ANATUMMY OF A PL Function call handler plpgsql_call_handler, plperl_call_handler, etc. This function is responsible for executing each function call for a particular language CREATE LANGUAGE Must be written in a compiled language Must take language_handler as a parameter Function validator plpgsql_validator, plperl_validator Optional Validates functions when they're created Takes an OID argument TRUSTED vs. UNTRUSTED
  30. 30. ANATUMMY OF A PL Where's the interpreter? Some PLs come with their own interpreter PL/pgSQL, SQL/PSM, PL/LOLCODE, PL/Proxy Others use an interpreter from a shared library PL/Perl, PL/Python, PL/R, most everything else Some do other strange things PL/J Talks to a Java virtual machine via RMI calls PLs with built-in interpreters can use PostgreSQL's internal data types to avoid conversion overhead
  31. 31. ANATUMMY OF A PL Where's the interpreter? Some PLs come with their own interpreter PL/pgSQL, SQL/PSM, PL/LOLCODE, PL/Proxy Others use an interpreter from a shared library PL/Perl, PL/Python, PL/R, most everything else Some do other strange things PL/J Talks to a Java virtual machine via RMI calls PLs with built-in interpreters can use PostgreSQL's internal data types to avoid conversion overhead PL/LOLCODE isn't that smart yet
  32. 32. ANATUMMY OF A PL Compilation Commonly a function will be quot;compiledquot; the first time it is run in a session The function call handler looks up argument and return types only once Sometimes keeps the parse tree in memory Avoids having to re-parse the function next time it's called
  33. 33. ANATUMMY OF A PL Compilation Commonly a function will be quot;compiledquot; the first time it is run in a session The function call handler looks up argument and return types only once Sometimes keeps the parse tree in memory Avoids having to re-parse the function next time it's called PL/LOLCODE isn't that smart yet
  34. 34. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ ANATUMMY OF A PL
  35. 35. FUNKSHUN KALL HANDLRZ Job of the function handler Figure out if it's a TRIGGER function Determine argument values Determine return type Find the function source ...alternatively, find a pre-stored parse tree Execute the function Return the proper values
  36. 36. FUNKSHUN KALL HANDLRZ PG_MODULE_MAGIC; Datum pl_lolcode_call_handler (PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pl_lolcode_call_handler); Datum pl_lolcode_call_handler(PG_FUNCTION_ARGS) { /* ... */ }
  37. 37. FUNKSHUN KALL HANDLRZ  Get function information Form_pg_proc from server/catalog/pg_proc.h   Form_pg_proc procStruct; HeapTuple procTup; procTup = SearchSysCache(PROCOID, ObjectIdGetDatum(fcinfo- >flinfo->fn_oid), 0, 0, 0); if (!HeapTupleIsValid(procTup)) elog(ERROR, quot;Cache lookup failed for procedure %uquot;, fcinfo->flinfo->fn_oid); procStruct = (Form_pg_proc) GETSTRUCT(procTup); ReleaseSysCache(procTup);
  38. 38. FUNKSHUN KALL HANDLRZ Get function source   procsrcdatum = SysCacheGetAttr(PROCOID, procTup, Anum_pg_proc_prosrc, &isnull); if (isnull) elog(ERROR, quot;Function source is nullquot;); proc_source = DatumGetCString (DirectFunctionCall1(textout, procsrcdatum));
  39. 39. FUNKSHUN KALL HANDLRZ Get return type   typeTup = SearchSysCache(TYPEOID, ObjectIdGetDatum (procStruct->prorettype), 0, 0, 0); if (!HeapTupleIsValid(typeTup)) elog(ERROR, quot;Cache lookup failed for type %uquot;, procStruct->prorettype); typeStruct = (Form_pg_type) GETSTRUCT(typeTup); resultTypeIOParam = getTypeIOParam(typeTup); fmgr_info_cxt(typeStruct->typinput, &flinfo, TopMemoryContext);
  40. 40. FUNKSHUN KALL HANDLRZ Get arguments and their types   for (i = 0; i < procStruct->pronargs; i++) { snprintf(arg_name, 255, quot;LOL%dquot;, i+1); lolDeclareVar(arg_name); LOLifyDatum(fcinfo->arg[i], fcinfo->argnull[i], procStruct->proargtypes.values[i], arg_name); } LOLifyDatum() gets the typeStruct for the argument type, gets the argument value as a C string, and sticks it in PL/LOLCODE's variable table
  41. 41. FUNKSHUN KALL HANDLRZ Pass the function source to the interpreter   pllolcode_yy_scan_string(proc_source); pllolcode_yyparse(NULL); pllolcode_yylex_destroy();   More on this later
  42. 42. FUNKSHUN KALL HANDLRZ Return the proper value if (returnTypeOID != VOIDOID) { if (returnVal->type == ident_NOOB) fcinfo->isnull = true; else { if (returnTypeOID == BOOLOID) retval = InputFunctionCall(&flinfo, lolVarGetTroof(returnVal) == lolWIN ? quot;Tquot; : quot;Fquot;, resultTypeIOParam, -1); else { /* ... */ retval = InputFunctionCall(&flinfo, rettmp, resultTypeIOParam, -1); } } }
  43. 43. FUNKSHUN KALL HANDLRZ Note the function manager interface (fmgr.[h|c]) InputFunctionCall Calls a previously determined datatype input function OutputFunctionCall Opposite of InputFunctionCall DirectFunctionCall[1..9] Call a specific, named function with various numbers of arguments Require FmgrInfo structures to describe the functions to be called Read through fmgr.h and fmgr.c for more details
  44. 44. CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ MAEK UN INTURPRETR ANATUMMY OF A PL
  45. 45. MAEK UN INTURPRETR Don't write the parser on your own. It's painful It's a lot of work It's easy to do wrong Someone else already has Writing the interpreter on your own is ok, but use a tool I wrote an LOLCODE interpreter to ... Learn how Be sure it was licensed the way I wanted it licensed (BSD)
  46. 46. MAEK UN INTURPRETR Tools for writing parsers and interpreters There are a fair number of them ANTLR Lex + Yacc Flex + Bison Parsers, language definitions, etc. are complex business Top-down vs. Bottom-up, Associativity rules... I chose Flex + Bison, because that's what PostgreSQL uses I've no particular expertise to choose based on merits I forgot most of my formal languages class :)
  47. 47. MAEK UN INTURPRETR Parser generation tools require a language definition Generally in a variant of BNF (Backus-Naur Form)   <symbol1> ::= <expression> <postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> quot;.quot; <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL> <zip-part> ::= <town-name> quot;,quot; <state-code> <ZIP-code> <EOL> <opt-jr-part> ::= quot;Sr.quot; | quot;Jr.quot; | <roman-numeral> | quot;quot;Thanks, Wikipedia
  48. 48. MAEK UN INTURPRETR Flex + Bison == two parts  Flex = lexer Divides input into a set of labeled tokens Bison = parser Given a stream of tokens, interpret them and act accordingly
  49. 49. MAEK UN INTURPRETR Flex input is mostly straightforward     /* Arithmetic functions */ quot;SUM OFquot; { return SUMOF; } quot;DIFF OFquot; { return DIFFOF; } quot;PRODUKT OFquot; { return PRODUKTOF; } quot;QUOSHUNT OFquot; { return QUOSHUNTOF; } Some regular expressions [A-Za-z][A-Za-z0-9_]+ { return IDENTIFIER; }
  50. 50. MAEK UN INTURPRETR State machine allows for processing blocks   OBTW { BEGIN LOL_BLOCKCMT; } <LOL_BLOCKCMT>TLDR { BEGIN 0; lol_lex_debug(quot;yylex: block commentquot;); } <LOL_BLOCKCMT>n { lineNumber++; } <LOL_BLOCKCMT>. {}
  51. 51. MAEK UN INTURPRETR Some code to make memory allocation work   #define YYSTACK_FREE pfree #define YYSTACK_MALLOC palloc #define YYFREE pfree #define YYMALLOC palloc ... %option noyyalloc noyyrealloc noyyfree PostgreSQL uses memory contexts, so the parser has to use them too.
  52. 52. MAEK UN INTURPRETR Flex makes a big stream of tokens Bison turns them into something useful   Define the tokens:   %token HAI KTHXBYE EXPREND %token YARN IDENTIFIER NUMBR NUMBAR %token TROOFWIN TROOFFAIL TROOF Define grammar and associated actions %% lol_program: HAI EXPREND lol_cmdblock KTHXBYE EXPREND { executeProgram($3); }
  53. 53. MAEK UN INTURPRETR Bison's job (in this case) is to build a parse tree We don't actually execute anything until the tree is built In some future day, we'll save the parse tree after the first execution, and avoid re-parsing each time a function is called Once the tree is built, we'll pass it to a function that will execute it Specifically, the data structure is a linked list where each node can point to the next node in the list as well as to a new list This is similar to the structure PL/pgSQL uses
  54. 54. MAEK UN INTURPRETR typedef struct lolNodeList { int length; lolNode *head; lolNode *tail; } lolNodeList; struct lolNode { NodeType node_type; struct lolNode *next; /* list for sub-nodes */ lolNodeList *list; NodeVal nodeVal; };
  55. 55. MAEK UN INTURPRETR Each lolNode is a particular statement VISIBLE ORLY IHASA ... Also, constants and variables are nodes YARN NUMBAR ... How each node uses the members of the lolNode struct depends on the node
  56. 56. MAEK UN INTURPRETR TROOF node Really simple -- handles a TROOF (Boolean) constant NodeType node_type = TROOF NodeVal nodeVal = WIN or FAIL When executed, this node simply sets IT to the WIN or FAIL value stored in nodeVal
  57. 57. MAEK UN INTURPRETR VISIBLE nodes Remember, VISIBLE rasies an exception Structure members: NodeType node_type = VISIBLE lolNodeList *list = pointer to a list of nodes containing: A constant string (a 1-node list) An expression evaluating to a string (possibly many nodes) NodeVal nodeVal = a number representing the message level (NOTICE, WARNING, etc.) When executed, this node runs the list, which puts a value in IT, and raises IT as an exception
  58. 58. MAEK UN INTURPRETR SUMOF node SUM OF x AN y = x + y Structure members: lolNodeList *list == a two-element list, where each node represents an operand The result is returned in IT
  59. 59. MAEK UN INTURPRETR ORLY node O RLY? == if, YA RLY == then, MEBBE == else if, NO WAI == else lolNodeList *list points to a set of nodes, one per YA RLY, MEBBE, and NO WAI block. Each YA RLY, MEBBE, and NO WAI block has a sub-list of its own, containing the instructions in the block MEBBE blocks also contain a list of nodes containing the MEBBE expression
  60. 60. MAEK UN INTURPRETR So what does Bison need to make this work?
  61. 61. MAEK UN INTURPRETR Token definitions These describe the data Flex generates   %token FOUNDYR IHASA ITZ R %token SUMOF DIFFOF PRODUKTOF QUOSHUNTOF MODOF BIGGROF SMALLROF AN %token BOTHOF EITHEROF WONOF NOT ALLOF ANYOF MKAY %token BOTHSAEM DIFFRINT These will turn into #define statements (e.g. #define DIFFRINT 123)
  62. 62. MAEK UN INTURPRETR A definition of the grammar   lol_cmd: lol_expression EXPREND { $$ = $1; } | ORLY EXPREND lol_orly_block OIC EXPREND { $$ = lolMakeNode(ORLY, tmpnval, $3); } | WTF EXPREND lol_wtf_block OIC EXPREND { $$ = lolMakeNode(WTF, tmpnval, $3); } | IMINYR IDENTIFIER EXPREND lol_cmdblock IMOUTTAYR IDENTIFIER EXPREND { /* TODO: Check $2 and $7 for equality */ $$ = lolMakeNode(IMINYR, tmpnval, $4); }
  63. 63. MAEK UN INTURPRETR A definition of the data type used in Bison's stack   %union { int numbrVal; double numbarVal; char *yarnVal; struct lolNodeList *nodeList; struct lolNode *node; } This is pretty much always a union
  64. 64. MAEK UN INTURPRETR Type declarations to map various declared expressions to union members   %type <nodeList> lol_program lol_cmdblock lol_orly_block %type <node> lol_const lol_var lol_cmd lol_xaryoprgroup %type <yarnVal> YARN %type <yarnVal> IDENTIFIER
  65. 65. MAEK UN INTURPRETR As the parser sees token streams it can interpret as a particular object in the grammar, it runs the code associated with that object, using $1, $2, etc. to identify objects on the stack   lol_expression: ... | IHASA lol_var ITZ lol_expression { $$ = lolMakeNode (DECL, ($2)->nodeVal, lolMakeList($4)); }
  66. 66. MISELANEEUS DEVELOPMENT   ENVIRONMENT INTRO TO SPI
  67. 67. SPI GIMMEH <var> OUTTA DATABUKKIT quot;<query>quot; Note that this only returns one value PL/LOLCODE uses a special parse tree node for SPI The expression that generates the query is stored in the node's sub-list Executing that list puts the text of the query in IT SPI sends that query to the server for processing The first column of the first row is stuck in the variable in the GIMMEH statement SPIval = SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1); LOLifyString(SPIval, SPI_gettypeid(SPI_tuptable->tupdesc, 1), node- >nodeVal.yarnVal);
  68. 68. Datums Everything is a Datum What a Datum actually is is architecture-dependant sizeof(Datum) == sizeof(long) >= sizeof(void *) >= 4 Different data types are stored in Datums in different ways You need to know what data type the datum is supposed to store SPI_tuptable is a global pointer to a structure that describes the last tuple returned by SPI Based on a Datum and knowledge of the type it represents, you can get useful values from it There are lots of functions, macros, etc. for processing Datums
  69. 69. Development Environment Configure with --enable-debug, --enable-cassert  --enable-debug Include debugging symbols in final library This allows things like gdb to know where in the program you are, and tell you about it Disables optimizations Makes executables bigger --enable-cassert Adds (in)sanity checks Makes things slower More likely to tell you if you do something really stupid cf. memory context problems
  70. 70. I CAN HAZ NAP NOW?
  71. 71. U HAZ KWESCHUNZ?
  72. 72. KTHXBYE
  73. 73. Development Environment jtolley@uber:~/devel/pgsql$ ps -ef | grep postgres jtolley 7465 6283 1 13:23 ? 00:00:00 postgres: jtolley jtolley [local] idle jtolley@uber:~/devel/pgsql$ gdb postgres 7465 GNU gdb 6.8-debian ... Attaching to program: /home/jtolley/devel/pgdb/bin/postgres, process 7465 Reading symbols from /usr/lib/libxml2.so.2...done. Loaded symbols for /usr/lib/libxml2.so.2 ... Loaded symbols for /home/jtolley/devel/pgdb/lib/postgresql/pllolcode.so 0xb8003430 in __kernel_vsyscall () (gdb)

×