Programming with Python and PostgreSQL

53,141 views

Published on

presentation from PgEast 2011

Published in: Technology
6 Comments
64 Likes
Statistics
Notes
No Downloads
Views
Total views
53,141
On SlideShare
0
From Embeds
0
Number of Embeds
98
Actions
Shares
0
Downloads
995
Comments
6
Likes
64
Embeds 0
No embeds

No notes for slide

Programming with Python and PostgreSQL

  1. 1. Programming with Python and PostgreSQL Peter Eisentraut peter@eisentraut.org F-Secure Corporation PostgreSQL Conference East 2011 CC-BY
  2. 2. Partitioning • Part I: Client programming (60 min) • Part II: PL/Python (30 min)
  3. 3. Why Python?
  4. 4. Why Python? Pros: • widely used • easy • strong typing • scripting, interactive use • good PostgreSQL support • client and server (PL) interfaces • open source, community-based
  5. 5. Why Python? Pros: • widely used • easy • strong typing • scripting, interactive use • good PostgreSQL support • client and server (PL) interfaces • open source, community-based Pros: • no static syntax checks, must rely on test coverage • Python community has varying interest in RDBMS
  6. 6. Part IClient Programming
  7. 7. Example import psycopg2 dbconn = psycopg2.connect(dbname=dellstore2) cursor = dbconn.cursor() cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) for row in cursor.fetchall(): print "Name: %s %s" % (row[0], row[1]) cursor.close() db.close()
  8. 8. Drivers Name License Platforms Py Versions Psycopg LGPL Unix, Win 2.4–3.2 PyGreSQL BSD Unix, Win 2.3–2.6 ocpgdb BSD Unix 2.3–2.6 py-postgresql BSD pure Python 3.0+ bpgsql (alpha) LGPL pure Python 2.3–2.6 pg8000 BSD pure Python 2.5–3.0+
  9. 9. Drivers Name License Platforms Py Versions Psycopg LGPL Unix, Win 2.4–3.2 PyGreSQL BSD Unix, Win 2.3–2.6 ocpgdb BSD Unix 2.3–2.6 py-postgresql BSD pure Python 3.0+ bpgsql (alpha) LGPL pure Python 2.3–2.6 pg8000 BSD pure Python 2.5–3.0+ More details • http://wiki.postgresql.org/wiki/Python • http://wiki.python.org/moin/PostgreSQL
  10. 10. DB-API 2.0 • the standard Python database API • all mentioned drivers support it • defined in PEP 249 • discussions: db-sig@python.org • very elementary (from a PostgreSQL perspective) • outdated relative to Python language development • lots of extensions and incompatibilities possible
  11. 11. Higher-Level Interfaces • Zope • SQLAlchemy • Django
  12. 12. Psycopg Facts • Main authors: Federico Di Gregorio, Daniele Varrazzo • License: LGPLv3+ • Web site: http://initd.org/psycopg/ • Documentation: http://initd.org/psycopg/docs/ • Git, Gitweb • Mailing list: psycopg@postgresql.org • Twitter: @psycopg • Latest version: 2.4 (February 27, 2011)
  13. 13. Using the Driver import psycopg2 dbconn = psycopg2.connect(...) ...
  14. 14. Driver Independence? import psycopg2 dbconn = psycopg2.connect(...) # hardcodes driver name
  15. 15. Driver Independence? import psycopg2 as dbdriver dbconn = dbdriver.connect(...)
  16. 16. Driver Independence? dbtype = psycopg2 # e.g. from config file dbdriver = __import__(dbtype, globals(), locals(), [], -1) dbconn = dbdriver.connect(...)
  17. 17. Connecting # libpq-like connection string dbconn = psycopg2.connect(dbname=dellstore2 host=localhost port=5432) # same dbconn = psycopg2.connect(dsn=dbname=dellstore2 host=localhost port=5432) # keyword arguments # (not all possible libpq options supported) dbconn = psycopg2.connect(database=dellstore2, host=localhost, port=5432) DB-API 2.0 says: arguments database dependent
  18. 18. “Cursors” cursor = dbconn.cursor() • not a real database cursor, only an API abstraction • think “statement handle”
  19. 19. Server-Side Cursors cursor = dbconn.cursor(name=mycursor) • a real database cursor • use for large result sets
  20. 20. Executing # queries cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) # updates cursor.execute("UPDATE customers SET password = NULL") print "%d rows updated" % cursor.rowcount # or anything else cursor.execute("ANALYZE customers")
  21. 21. Fetching Query Results cursor.execute("SELECT firstname, lastname FROM ...") cursor.fetchall() [(AABBKO, DUTOFRPLOK), (AABTSI, ZFCKMPRVVJ), (AACOHS, EECCQPVTIW), (AACVVO, CLSXSGZYKS), (AADVMN, MEMQEWYFYE), (AADXQD, GLEKVVLZFV), (AAEBUG, YUOIINRJGE)]
  22. 22. Fetching Query Results cursor.execute("SELECT firstname, lastname FROM ...") for row in cursor.fetchall(): print "Name: %s %s" % (row[0], row[1])
  23. 23. Fetching Query Results cursor.execute("SELECT firstname, lastname FROM ...") for row in cursor.fetchall(): print "Name: %s %s" % (row[0], row[1]) Note: field access only by number
  24. 24. Fetching Query Results cursor.execute("SELECT firstname, lastname FROM ...") row = cursor.fetchone() if row is not None: print "Name: %s %s" % (row[0], row[1])
  25. 25. Fetching Query Results cursor.execute("SELECT firstname, lastname FROM ...") for row in cursor: print "Name: %s %s" % (row[0], row[1])
  26. 26. Fetching Query Results in Batches cursor = dbconn.cursor(name=mycursor) cursor.arraysize = 500 # default: 1 cursor.execute("SELECT firstname, lastname FROM ...") while True: batch = cursor.fetchmany() break if not batch for row in batch: print "Name: %s %s" % (row[0], row[1])
  27. 27. Fetching Query Results in Batches cursor = dbconn.cursor(name=mycursor) cursor.execute("SELECT firstname, lastname FROM ...") cursor.itersize = 2000 # default for row in cursor: print "Name: %s %s" % (row[0], row[1])
  28. 28. Getting Query Metadata cursor.execute("SELECT DISTINCT state, zip FROM customers") print cursor.description[0].name print cursor.description[0].type_code print cursor.description[1].name print cursor.description[1].type_code state 1043 # == psycopg2.STRING zip 23 # == psycopg2.NUMBER
  29. 29. Passing Parameters cursor.execute(""" UPDATE customers SET password = %s WHERE customerid = %s """, ["sekret", 37])
  30. 30. Passing Parameters Not to be confused with (totally evil): cursor.execute(""" UPDATE customers SET password = %s WHERE customerid = %d """ % ["sekret", 37])
  31. 31. Passing Parameters cursor.execute("INSERT INTO foo VALUES (%s)", "bar") # WRONG cursor.execute("INSERT INTO foo VALUES (%s)", ("bar")) # WRONG cursor.execute("INSERT INTO foo VALUES (%s)", ("bar",)) # correct cursor.execute("INSERT INTO foo VALUES (%s)", ["bar"]) # correct (from Psycopg documentation)
  32. 32. Passing Parameters cursor.execute(""" UPDATE customers SET password = %(pw)s WHERE customerid = %(id)s """, {id: 37, pw: "sekret"})
  33. 33. Passing Many Parameter Sets cursor.executemany(""" UPDATE customers SET password = %s WHERE customerid = %s """, [["ahTh4oip", 100], ["Rexahho7", 101], ["Ee1aetui", 102]])
  34. 34. Calling Procedures cursor.callproc(pg_start_backup, label)
  35. 35. Data Types from decimal import Decimal from psycopg2 import Date cursor.execute(""" INSERT INTO orders (orderdate, customerid, netamount, tax, totalamount) VALUES (%s, %s, %s, %s, %s)""", [Date(2011, 03, 23), 12345, Decimal("899.95"), 8.875, Decimal("979.82")])
  36. 36. Mogrify from decimal import Decimal from psycopg2 import Date cursor.mogrify(""" INSERT INTO orders (orderdate, customerid, netamount, tax, totalamount) VALUES (%s, %s, %s, %s, %s)""", [Date(2011, 03, 23), 12345, Decimal("899.95"), 8.875, Decimal("979.82")]) Result: "nINSERT INTO orders (orderdate, customerid,n netamount, tax, totalamount)nVALUES (2011-03-23::date, 12345, 899.95, 8.875, 979.82)"
  37. 37. Data Types cursor.execute(""" SELECT * FROM orders WHERE customerid = 12345 """) Result: (12002, datetime.date(2011, 3, 23), 12345, Decimal(899.95), Decimal(8.88), Decimal(979.82))
  38. 38. Nulls Input: cursor.mogrify("SELECT %s", [None]) SELECT NULL Output: cursor.execute("SELECT NULL") cursor.fetchone() (None,)
  39. 39. Booleans cursor.mogrify("SELECT %s, %s", [True, False]) SELECT true, false
  40. 40. Binary Data Standard way: from psycopg2 import Binary cursor.mogrify("SELECT %s", [Binary("foo")]) "SELECT Ex666f6f::bytea"
  41. 41. Binary Data Standard way: from psycopg2 import Binary cursor.mogrify("SELECT %s", [Binary("foo")]) "SELECT Ex666f6f::bytea" Other ways: cursor.mogrify("SELECT %s", [buffer("foo")]) "SELECT Ex666f6f::bytea" cursor.mogrify("SELECT %s", [bytearray.fromhex(u"deadbeef")]) "SELECT Exdeadbeef::bytea" There are more. Check the documentation. Check the versions.
  42. 42. Date/Time Standard ways: from psycopg2 import Date, Time, Timestamp cursor.mogrify("SELECT %s, %s, %s", [Date(2011, 3, 23), Time(9, 0, 0), Timestamp(2011, 3, 23, 9, 0, 0)]) "SELECT 2011-03-23::date, 09:00:00::time, 2011-03-23T09:00:00::timestamp"
  43. 43. Date/Time Other ways: import datetime cursor.mogrify("SELECT %s, %s, %s, %s", [datetime.date(2011, 3, 23), datetime.time(9, 0, 0), datetime.datetime(2011, 3, 23, 9, 0), datetime.timedelta(minutes=90)]) "SELECT 2011-03-23::date, 09:00:00::time, 2011-03-23T09:00:00::timestamp, 0 days 5400.000000 seconds::interval" mx.DateTime also supported
  44. 44. Arrays foo = [1, 2, 3] bar = [datetime.time(9, 0), datetime.time(10, 30)] cursor.mogrify("SELECT %s, %s", [foo, bar]) "SELECT ARRAY[1, 2, 3], ARRAY[09:00:00::time, 10:30:00::time]"
  45. 45. Tuples foo = (1, 2, 3) cursor.mogrify("SELECT * FROM customers WHERE customerid IN %s", [foo]) SELECT * FROM customers WHERE customerid IN (1, 2, 3)
  46. 46. Hstore import psycopg2.extras psycopg2.extras.register_hstore(cursor) x = {a: foo, b: bar} cursor.mogrify("SELECT %s", [x]) "SELECT hstore(ARRAY[Ea, Eb], ARRAY[Efoo, Ebar])"
  47. 47. Unicode Support Cause all result strings to be returned as Unicode strings: psycopg2.extensions.register_type(psycopg2.extensions. UNICODE) psycopg2.extensions.register_type(psycopg2.extensions. UNICODEARRAY)
  48. 48. Transaction Control Transaction blocks are used by default. Must use dbconn.commit() or dbconn.rollback()
  49. 49. Transaction Control: Autocommit import psycopg2.extensions dbconn.set_isolation_level(psycopg2.extensions. ISOLATION_LEVEL_AUTOCOMMIT) cursor = dbconn.cursor() cursor.execute("VACUUM")
  50. 50. Transaction Control: Isolation Mode import psycopg2.extensions dbconn.set_isolation_level(psycopg2.extensions. ISOLATION_LEVEL_SERIALIZABLE) # or other level cursor = dbconn.cursor() cursor.execute(...) ... dbconn.commit()
  51. 51. Exception Handling StandardError |__ Warning |__ Error |__ InterfaceError |__ DatabaseError |__ DataError |__ OperationalError | |__ psycopg2.extensions.QueryCanceledError | |__ psycopg2.extensions.TransactionRollbackError |__ IntegrityError |__ InternalError |__ ProgrammingError |__ NotSupportedError
  52. 52. Error Messages try: cursor.execute("boom") except Exception, e: print e.pgerror
  53. 53. Error Codes import psycopg2.errorcodes while True: try: cursor.execute("UPDATE something ...") cursor.execute("UPDATE otherthing ...") break except Exception, e: if e.pgcode == psycopg2.errorcodes.SERIALIZATION_FAILURE: continue else: raise
  54. 54. Connection and Cursor Factories Want: accessing result columns by name Recall: dbconn = psycopg2.connect(dsn=...) cursor = dbconn.cursor() cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) for row in cursor.fetchall(): print "Name: %s %s" % (row[0], row[1]) # stupid :(
  55. 55. Connection and Cursor Factories Solution 1: Using DictConnection: import psycopg2.extras dbconn = psycopg2.connect(dsn=..., connection_factory=psycopg2.extras.DictConnection) cursor = dbconn.cursor() cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) for row in cursor.fetchall(): print "Name: %s %s" % (row[firstname], # or row[0] row[lastname]) # or row[1]
  56. 56. Connection and Cursor Factories Solution 2: Using RealDictConnection: import psycopg2.extras dbconn = psycopg2.connect(dsn=..., connection_factory=psycopg2.extras.RealDictConnection) cursor = dbconn.cursor() cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) for row in cursor.fetchall(): print "Name: %s %s" % (row[firstname], row[lastname])
  57. 57. Connection and Cursor Factories Solution 3: Using NamedTupleConnection: import psycopg2.extras dbconn = psycopg2.connect(dsn=..., connection_factory=psycopg2.extras.NamedTupleConnection) cursor = dbconn.cursor() cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) for row in cursor.fetchall(): print "Name: %s %s" % (row.firstname, # or row[0] row.lastname) # or row[1]
  58. 58. Connection and Cursor Factories Alternative: Using DictCursor/RealDictCursor/NamedTupleCursor: import psycopg2.extras dbconn = psycopg2.connect(dsn=...) cursor = dbconn.cursor(cursor_factory=psycopg2.extras. DictCursor/RealDictCursor/NameTupleCursor) cursor.execute(""" SELECT firstname, lastname FROM customers ORDER BY 1, 2 LIMIT 10 """) for row in cursor.fetchall(): print "Name: %s %s" % (row[firstname], row[lastname]) # (resp. row.firstname, row.lastname)
  59. 59. Supporting New Data Types Only a finite list of types is supported by default: Date, Binary, etc. • map new PostgreSQL data types into Python • map new Python data types into PostgreSQL
  60. 60. Mapping New PostgreSQL Types IntoPython import psycopg2 import psycopg2.extensions def cast_oidvector(value, _cursor): """Convert oidvector to Python array""" if value is None: return None return map(int, value.split( )) OIDVECTOR = psycopg2.extensions.new_type((30,), OIDVECTOR, cast_oidvector) psycopg2.extensions.register_type(OIDVECTOR)
  61. 61. Mapping New Python Types intoPostgreSQL from psycopg2.extensions import adapt, register_adapter, AsIs class Point(object): def __init__(self, x, y): self.x = x self.y = y def adapt_point(point): return AsIs("(%s, %s)" % (adapt(point.x), adapt(point.y))) register_adapter(Point, adapt_point) cur.execute("INSERT INTO atable (apoint) VALUES (%s)", (Point(1.23, 4.56),)) (from Psycopg documentation)
  62. 62. Connection Pooling With Psycopg from psycopg2.pool import SimpleConnectionPool pool = SimpleConnectionPool(1, 20, dsn=...) dbconn = pool.getconn() ... pool.putconn(dbconn) pool.closeall()
  63. 63. Connection Pooling With Psycopg for non-threaded applications: from psycopg2.pool import SimpleConnectionPool pool = SimpleConnectionPool(1, 20, dsn=...) dbconn = pool.getconn() ... pool.putconn(dbconn) pool.closeall() for non-threaded applications: from psycopg2.pool import ThreadedConnectionPool pool = ThreadedConnectionPool(1, 20, dsn=...) dbconn = pool.getconn() cursor = dbconn.cursor() ... pool.putconn(dbconn) pool.closeall()
  64. 64. Connection Pooling With DBUtils import psycopg2 from DBUtils.PersistentDB import PersistentDB dbconn = PersistentDB(psycopg2, dsn=...) cursor = dbconn.cursor() ... see http://pypi.python.org/pypi/DBUtils/
  65. 65. The Other Stuff • thread safety: can share connections, but not cursors • COPY support: cursor.copy_from(), cursor.copy_to() • large object support: connection.lobject() • 2PC: connection.xid(), connection.tpc_begin(), . . . • query cancel: dbconn.cancel() • notices: dbconn.notices • notifications: dbconn.notifies • asynchronous communication • coroutine support • logging cursor
  66. 66. Part IIPL/Python
  67. 67. Setup • included with PostgreSQL • configure --with-python • apt-get/yum install postgresql-plpython • CREATE LANGUAGE plpythonu; • Python 3: CREATE LANGUAGE plpython3u; • “untrusted”, superuser only
  68. 68. Basic Examples CREATE FUNCTION add(a int, b int) RETURNS int LANGUAGE plpythonu AS $$ return a + b $$; CREATE FUNCTION longest(a text, b text) RETURNS text LANGUAGE plpythonu AS $$ if len(a) > len(b): return a elif len(b) > len(a): return b else: return None $$;
  69. 69. Using Modules CREATE FUNCTION json_to_array(j text) RETURNS text[] LANGUAGE plpythonu AS $$ import json return json.loads(j) $$;
  70. 70. Database Calls CREATE FUNCTION clear_passwords() RETURNS int LANGUAGE plpythonu AS $$ rv = plpy.execute("UPDATE customers SET password = NULL") return rv.nrows $$;
  71. 71. Database Calls With Parameters CREATE FUNCTION set_password(username text, password text) RETURNS boolean LANGUAGE plpythonu AS $$ plan = plpy.prepare("UPDATE customers SET password = $1 WHERE username= $2", [text, text]) rv = plpy.execute(plan, [username, password]) return rv.nrows == 1 $$;
  72. 72. Avoiding Prepared Statements CREATE FUNCTION set_password(username text, password text) RETURNS boolean LANGUAGE plpythonu AS $$ rv = plpy.execute("UPDATE customers SET password = %s WHERE username= %s" % (plpy.quote_nullable(username), plpy.quote_literal(password))) return rv.nrows == 1 $$; (available in 9.1-to-be)
  73. 73. Caching Plans CREATE FUNCTION set_password2(username text, password text) RETURNS boolean LANGUAGE plpythonu AS $$ if myplan in SD: plan = SD[myplan] else: plan = plpy.prepare("UPDATE customers SET password = $1 WHERE username= $2", [text, text]) SD[myplan] = plan rv = plpy.execute(plan, [username, password]) return rv.nrows == 1 $$;
  74. 74. Processing Query Results CREATE FUNCTION get_customer_name(username text) RETURNS boolean LANGUAGE plpythonu AS $$ plan = plpy.prepare("SELECT firstname || || lastname AS ""name"" FROM customers WHERE username = $1", [text]) rv = plpy.execute(plan, [username], 1) return rv[0][name] $$;
  75. 75. Compare: PL/Python vs. DB-API PL/Python: plan = plpy.prepare("SELECT ...") for row in plpy.execute(plan, ...): plpy.info(row["fieldname"]) DB-API: dbconn = psycopg2.connect(...) cursor = dbconn.cursor() cursor.execute("SELECT ...") for row in cursor.fetchall() do: print row[0]
  76. 76. Set-Returning and Table Functions CREATE FUNCTION get_customers(id int) RETURNS SETOF customers LANGUAGE plpythonu AS $$ plan = plpy.prepare("SELECT * FROM customers WHERE customerid = $1", [int]) rv = plpy.execute(plan, [id]) return rv $$;
  77. 77. Triggers CREATE FUNCTION delete_notifier() RETURNS trigger LANGUAGE plpythonu AS $$ if TD[event] == DELETE: plpy.notice("one row deleted from table %s" % TD[table_name]) $$; CREATE TRIGGER customers_delete_notifier AFTER DELETE ON customers FOR EACH ROW EXECUTE PROCEDURE delete_notifier();
  78. 78. Exceptions CREATE FUNCTION test() RETURNS text LANGUAGE plpythonu AS $$ try: rv = plpy.execute("SELECT ...") except plpy.SPIError, e: plpy.notice("something went wrong") The transaction is still aborted in < 9.1.
  79. 79. New in PostgreSQL 9.1 • SPI calls wrapped in subtransactions • custom SPI exceptions: subclass per SQLSTATE, .sqlstate attribute • plpy.subtransaction() context manager • support for OUT parameters • quoting functions • validator • lots of internal improvements
  80. 80. The End

×