• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Working with databases in Perl
 

Working with databases in Perl

on

  • 7,547 views

An overview of the main questions/design issues when starting to work with databases in Perl ...

An overview of the main questions/design issues when starting to work with databases in Perl

- choosing a database
- matching DB datatypes to Perl datatypes
- DBI architecture (handles, drivers, etc.)
- steps of DBI interaction : prepare/execute/fetch
- ORM principles and difficulties, ORMs on CPAN
- a few examples with DBIx::DataModel
- performance issues

First given at YAPC::EU::2009 in Lisbon. Updated version given at FPW2011 in Paris and YAPC::EU::2011 in Riga

Statistics

Views

Total Views
7,547
Views on SlideShare
7,534
Embed Views
13

Actions

Likes
3
Downloads
139
Comments
0

2 Embeds 13

http://www.slideshare.net 10
http://www.techgig.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Working with databases in Perl Working with databases in Perl Presentation Transcript

    • Working with databases in Perl Tutorial for FPW::2011, Paris [email_address] Département Office
    • Overview
      • intended audience : beginners
        • in Perl
        • in Databases
      • main topics
        • Relational databases
        • Perl DBI basics
        • Advanced Perl DBI
        • Object-Relational Mappings
      • disclaimer
        • didn't have personal exposure to everything mentioned in this tutorial
    • Relational databases RDBMS = Relational Database Management System
    • Relational model join on c3 filter Table (rows + columns) projection c1 c2 c3 1 foo 1 2 foo 2 3 bar 1 c3 c4 1 xx 2 yy c1 c2 c3 c4 1 foo 1 xx 2 foo 2 yy 3 bar 1 xx
    • Maybe you don't want a RDBMS
      • Other solutions for persistency in Perl:
          • BerkeleyDB : persistent hashes / arrays
          • Judy : persistent dynamic arrays / hashes
          • Redis : persistent arrays / hashes / sets / sorted sets
          • CouchDB : OO/hierarchical database
          • MongoDB : document-oriented database
          • KiokuDB : persistent objects, front-end to BerkeleyDB / CouchDB / etc.
          • Plain Old File (using for example File::Tabular )
          • KinoSearch : bunch of fields with fulltext indexing
          • LDAP : directory
          • Net::Riak : buckets and keys
        • See http://en.wikipedia.org/wiki/NoSQL
    • Features of RDBMS
      • Relational
      • Indexing
      • Concurrency
      • Distributed
      • Transactions (commit / rollback )
      • Authorization
      • Triggers and stored procedures
      • Internationalization
      • Fulltext
    • Choosing a RDBMS
      • Sometimes there is no choice (enforced by context) !
      • Criteria
        • cost, proprietary / open source
        • volume
        • features
        • resources (CPU, RAM, etc.)
        • ease of installation / deployment / maintenance
        • stored procedures
      • Common choices (open source)
        • SQLite (file-based)
        • mysql
        • Postgres
          • Postgres can have server-side procedures in Perl !
    • Talking to a RDBMS
      • SQL : Standard Query Language. Except that
        • the standard is hard to find (not publicly available)
        • vendors rarely implement the full standard
        • most vendors have non-standard extensions
        • it's not only about queries
          • DML : Data Manipulation Language
          • DDL : Data Definition Language
    • Writing SQL SQL is too low-level, I don't ever want to see it SQL is the most important part of my application, I won't let anybody write it for me
    • Data Definition Language (DDL)
      • CREATE TABLE author (
      • author_id INTEGER PRIMARY KEY,
      • author_name VARCHAR(20),
      • e_mail VARCHAR(20),
      • );
      • CREATE/ALTER/DROP/RENAME
      • DATABASE
      • INDEX
      • VIEW
      • TRIGGER
    • Data Manipulation Language (DML)
      • SELECT author_name, distribution_name
      • FROM author INNER JOIN distribution
      • ON author.author_id = distribution.author_id
      • WHERE distribution_name like 'DBD::%';
      • INSERT INTO author ( author_id, author_name, e_mail )
      • VALUES ( 123, 'JFOOBAR', 'john@foobar.com' );
      • UPDATE author
      • SET e_mail = 'john@foobar.com'
      • WHERE author_id = 3456;
      • DELETE FROM author
      • WHERE author_id = 3456;
    • Best practice : placeholders
      • SELECT author_name, distribution_name
      • FROM author INNER JOIN distribution
      • ON author.author_id = distribution.author_id
      • WHERE distribution_name like ? ;
      • INSERT INTO author ( author_id, author_name, e_mail )
      • VALUES ( ? , ? , ? );
      • UPDATE author
      • SET e_mail = ?
      • WHERE author_id = ? ;
      • DELETE FROM author
      • WHERE author_id = ? ;
      •  no type distinction (int/string)  statements can be cached
      •  avoid SQL injection problems
        • SELECT * FROM foo
        • WHERE val = $x ;
        • $x eq '123; DROP TABLE foo'
      • sometimes other syntax (for ex. $1, $2)
    • Perl DBI Basics
    • Architecture Database DBD driver DBI Object-Relational Mapper Perl program TIOOWTDI There is only one way to do it TAMMMWTDI There are many, many many ways to do it TIMTOWTDI There is more than one way to do it
    • DBD Drivers
        • Databases
          • Adabas DB2 DBMaker Empress Illustra Informix Ingres InterBase MaxDB Mimer Oracle Ovrimos PO Pg PrimeBase QBase Redbase SQLAnywhere SQLite Solid Sqlflex Sybase Unify mSQL monetdb mysql
        • Other kinds of data stores
          • CSV DBM Excel File iPod LDAP
        • Proxy, relay, etc
          • ADO Gofer JDBC Multi Multiplex ODBC Proxy SQLRelay
        • Fake, test
          • NullP Mock RAM Sponge
    • When SomeExoticDB has no driver
      • Quotes from DBI::DBD :
          • " The first rule for creating a new database driver for the Perl DBI is very simple: DON'T! "
          • " The second rule for creating a new database driver for the Perl DBI is also very simple: Don't -- get someone else to do it for you! "
      • nevertheless there is good advice/examples
        • see DBI::DBD
      • Other solution : forward to other drivers
        • ODBC (even on Unix)
        • JDBC
        • SQLRelay
    • DBI API
      • handles
        • the whole package (DBI)
        • driver handle ($dh)
        • database handle ($dbh)
        • statement handle ($sth)
      • interacting with handles
        • objet-oriented
          • ->connect(…), ->prepare(…), ->execute(...), …
        • tied hash
          • ->{AutoCommit}, ->{NAME_lc}, ->{CursorName}, …
    • Connecting
      • my $dbh = DBI-> connect ($connection_string);
      • my $dbh = DBI-> connect ($connection_string,
      • $user,
      • $password,
      • { %attributes } );
      • my $dbh = DBI-> connect_cached ( @args );
    • Some dbh attributes
      • AutoCommit
        • if true, every statement is immediately committed
        • if false, need to call
          • $dbh->begin_work();
          • … # inserts, updates, deletes
          • $dbh->commit();
      • RaiseError
        • like autodie for standard Perl functions : errors raise exceptions
      • see also
        • PrintError
        • HandleError
        • ShowErrorStatement
      • and also
        • LongReadLen
        • LongTrunkOK
        • RowCacheSize
       hash API : attributes can be set dynamically [ local ] $dbh->{$attr_name} = $val
      • peek at $dbh internals
      • DB<1> x $dbh  {}
      • DB< 2 > x tied %$dbh  {…}
    • Data retrieval
      • my $sth = $dbh-> prepare ($sql);
      • $sth-> execute ( @bind_values );
      • my @columns = @{$sth->{NAME}};
      • while (my $row_aref = $sth-> fetch ) {
      • }
      • # or
      • $dbh-> do ($sql);
      • see also : prepare_cached
    • Other ways of fetching
      • single row
          • fetchrow_array
          • fetchrow_arrayref (a.k.a fetch)
          • fetchrow_hashref
      • lists of rows (with optional slicing)
          • fetchall_arrayref
          • fetchall_hashref
      • prepare, execute and fetch
          • selectall_arrayref
          • selectall_hashref
      • vertical slice
          • selectcol_arrayref
       little DBI support for cursors
    • Advanced Perl DBI
    • Transactions
      • $dbh->{ RaiseError } = 1; # errors will raise exceptions
      • eval {
      • $dbh-> begin_work (); # will turn off AutoCommit
      • … # inserts, updates, deletes
      • $dbh-> commit ();
      • };
      • if ($@) {
      • my $err = $@;
      • eval {$dbh-> rollback ()};
      • my $rollback_result = $@ || &quot;SUCCESS&quot;;
      • die &quot;FAILED TRANSACTION : $err&quot;
      • . &quot;; ROLLBACK: $rollback_result&quot;;
      • }
      • encapsulated in DBIx::Transaction or ORMs
      • $schema-> transaction ( sub { …} );
      • nested transactions : must keep track of transaction depth
      • savepoint / release : only in DBIx::Class
    • Efficiency
      • my $sth = $dbh->prepare(<<'');
      • SELECT author_id, author_name, e_mail
      • FROM author
      • my ($id, $name, $e_mail);
      • $sth->execute;
      • $sth-> bind_columns ( ($id, $name, $e_mail));
      • while ($sth->fetch) {
      • print &quot;author $id is $name at $e_mailn&quot;;
      • }
       avoids cost of allocating / deallocating Perl variables  don't store a reference and reuse it after another fetch
    • Metadata
      • datasources
        • my @sources = DBI-> data_sources ($driver);
      • table_info
        • my $sth = $dbh-> table_info (@search_criteria);
        • while (my $row = $sth->fetchrow_hashref) {
        • print &quot;$row->{TABLE_NAME} : $row->{TABLE_TYPE}n&quot;;
        • }
      • others
        • column_info()
        • primary_key_info()
        • foreign_key_info()
       many drivers only have partial implementations
    • Lost connection
      • manual recover
          • if ($dbh->errstr =~ /broken connection/i) { … }
      • DBIx::RetryOverDisconnects
        • intercepts requests (prepare, execute, …)
        • filters errors
        • attemps to reconnect and restart the transaction
      • some ORMs have their own layer for recovering connections
      • some drivers have their own mechanism
          • $dbh->{mysql_auto_reconnect} = 1;
    • Datatypes
      • NULL  undef
      • INTEGER, VARCHAR, DATE  perl scalar
        • usually DWIM works
        • if needed, can specify explicitly
          • $sth->bind_param($col_num, $value, SQL_DATETIME);
      • BLOB  perl scalar
      • ARRAY (Postgres)  arrayref
    • Large objects
      • usually : just scalars in memory
      • when reading : control BLOB size
        • $dbh->{LongReadLen} = $max_bytes;
        • $dbh->{LongTrunkOK} = 1
      • when writing : can inform the driver
        • $sth->bind_param($ix, $blob, SQL_BLOB);
      • driver-specific stream API. Ex :
        • Pg : pg_lo_open, pg_lo_write, pg_lo_lseek
        • Oracle : ora_lob_read(…), ora_lob_write(…), ora_lob_append(…)
    • Tracing / profiling
      • $dbh->trace($trace_setting, $trace_where)
        • 0 - Trace disabled.
        • 1 - Trace top-level DBI method calls returning with results or errors.
        • 2 - As above, adding tracing of top-level method entry with parameters.
        • 3 - As above, adding some high-level information from the driver and some internal information from the DBI.
      • $dbh->{Profile} = 2; # profile at the statement level
        • many powerful options
        • see L< DBI::Profile >
    • Stored procedures
      • my $sth = $dbh->prepare($db_specific_sql);
      • # prepare params to be passed to the called procedure
      • $sth-> bind_param (1, $val1);
      • $sth->bind_param(2, $val2);
      • # prepare memory locations to receive the results
      • $sth-> bind_param_inout (3, $result1);
      • $sth->bind_param_inout(4, $result2);
      • # execute the whole thing
      • $sth->execute;
    • Object-Relational Mapping (ORM)
    • ORM Principle r1 r2 ... c1 c2 c3 ... c3 c4 +c1: String +c2: String +c3: class2 r1 : class1 RDBMS r2 : class1 Application table1 table2
    • ORM: What for ?
      • [catalyst list] On Thu, 2006-06-08, Steve wrote:
      • Not intending to start any sort of rancorous discussion,
      • but I was wondering whether someone could illuminate
      • me a little?
      • I'm comfortable with SQL, and with DBI. I write basic
      • SQL that runs just fine on all databases, or more
      • complex SQL when I want to target a single database
      • (ususally postgresql).
      • What value does an ORM add for a user like me?
    • ORM useful for …
      • dynamic SQL
        • navigation between tables
        • generate complex SQL queries from Perl datastructures
        • better than phrasebook or string concatenation
      • automatic data conversions (inflation / deflation)
      • expansion of tree data structures coded in the relational model
      • transaction encapsulation
      • data validation
      • computed fields
      • caching
      • schema deployment
       See Also : http://lists.scsys.co.uk/pipermail/catalyst/2006-June/008059.html
    • Impedance mismatch
      • SELECT c1, c2 FROM table1
          •  missing c3 , so cannot navigate to class2
          • is it a valid instance of class1 ?
      • SELECT * FROM table1 LEFT JOIN table2 ON …
          •  what to do with the c4 column ?
          • is it a valid instance of class1 ?
      • SELECT c1, c2, length(c2) AS l_c2 FROM table1
          •  no predeclared method in class1 for accessing l_c2
      c1 c2 c3 c3 c4 +c1: String +c2: String +c3: class2 r1 : class1 RDBMS RAM table1 table2
    • ORM Landscape
      • Leader
        • DBIx::Class (a.k.a. DBIC)
      • Also discussed here
        • DBIx::DataModel
      • Many others
        • Rose::DB, Jifty::DBI, Fey::ORM, ORM, DBIx::ORM::Declarative, Tangram, Coat::Persistent, DBR, DBIx::Sunny, DBIx::Skinny, DBI::Easy, …
    • Model (UML) Artist CD Track 1 * 1 *
    • DBIx::Class Schema
      • package MyDatabase::Main;
      • use base qw/DBIx::Class::Schema/;
      • __PACKAGE__->load_namespaces;
      • package MyDatabase::Main::Result::Artist;
      • use base qw/DBIx::Class/;
      • __PACKAGE__->load_components(qw/PK::Auto Core/);
      • __PACKAGE__->table('artist');
      • __PACKAGE__->add_columns(qw/ artistid name /);
      • __PACKAGE__->set_primary_key('artistid');
      • __PACKAGE__->has_many('cds' =>
      • 'MyDatabase::Main::Result::Cd');
      • package ...
      • ...
    • DBIx::Class usage
      • my $schema = MyDatabase::Main
      • ->connect('dbi:SQLite:db/example.db');
      • my @artists = (['Michael Jackson'], ['Eminem']);
      • $schema->populate('Artist', [
      • [qw/name/],
      • @artists,
      • ]);
      • my $rs = $schema->resultset('Track')->search(
      • {
      • 'cd.title' => $cdtitle
      • },
      • {
      • join => [qw/ cd /],
      • }
      • );
      • while (my $track = $rs->next) {
      • print $track->title . &quot;n&quot;;
      • }
    • DBIx::DataModel Schema
      • package MyDatabase;
      • use DBIx::DataModel;
      • DBIx::DataModel->Schema(__PACKAGE__)
      • ->Table(qw/Artist artist artistid/)
      • ->Table(qw/CD cd cdid /)
      • ->Table(qw/Track track trackid /)
      • ->Association([qw/Artist artist 1 /],
      • [qw/CD cds 0..* /])
      • ->Composition([qw/CD cd 1 /],
      • [qw/Track tracks 1..* /]);
    • DBIx::DataModel usage
      • my $dbh = DBI->connect('dbi:SQLite:db/example.db');
      • MyDatabase->dbh($dbh);
      • my @artists = (['Michael Jackson'], ['Eminem']);
      • MyDatabase::Artist->insert(['name'], @artists);
      • my $statement = MyDatabase->join(qw/CD tracks/)->select(
      • -columns => [qw/track.title|trtitle …/],
      • -where => { 'cd.title' => $cdtitle },
      • -resultAs => 'statement', # default : arrayref of rows
      • );
      • while (my $track = $statement->next) {
      • print &quot;$track->{trtitle}n&quot;;
      • }
    • Conclusion
    • Further info
      • Database textbooks
      • DBI manual ( L<DBI >, L< DBI:.FAQ >, L< DBI::Profile >)
      • Book : &quot;Programming the DBI&quot;
      • Vendor's manuals
      • ORMs
        • DBIx::Class::Manual
        • DBIx::DataModel
       mastering databases requires a lot of reading !
    • Bonus slides
    • Names for primary / foreign keys
      • primary : unique; foreign : same name
      • author.author_id  distribution.author_id
          • RDBMS knows how to perform joins ( &quot;NATURAL JOIN&quot; )
      • primary : constant; foreign : unique based on table + column name
      • author.id  distribution.author_id
          • ORM knows how to perform joins (RoR ActiveRecord)
          • SELECT * FROM table1, table2 ….  which id ?
      • primary : constant; foreign : just table name
      • author.id  distribution.author
          • $a_distrib->author() : foreign key or related record ?
       columns for joins should always be indexed
    • Locks and isolation levels
      • Locks on rows
        • shared
          • other clients can also get a shared lock
          • requests for exclusive lock must wait
        • exclusive
          • all other requests for locks must wait
      • Intention locks (on whole tables)
        • Intent shared
        • Intent exclusive
      • Isolation levels
        • read-uncommitted
        • read-committed
        • repeatable-read
        • serializable
      SELECT … FOR READ ONLY SELECT … FOR UPDATE SELECT … LOCK IN SHARE MODE LOCK TABLE(S) … READ/WRITE SET TRANSACTION ISOLATION LEVEL …
    • Cursors
      • my $sql = &quot;SELECT * FROM SomeTable FOR UPDATE &quot;;
      • my $sth1 = $dbh->prepare($sql);
      • $sth1->execute();
      • my $curr = &quot;WHERE CURRENT OF $sth1->{CursorName} &quot;;
      • while (my $row = $sth1->fetch) {
      • if (…) {
      • $dbh->do(&quot;D ELETE FROM SomeTable WHERE $curr&quot;);
      • } else {
      • my $sth2 = $dbh->prepare(
      • &quot;UPDATE SomeTable SET col = ? WHERE $curr&quot;);
      • $sth2->execute($new_val);
    • Modeling (UML) Author Distribution Module 1 * 1 * ► depends on * * ► contains
    • Terminology Author Distribution Module 1 * 1 * ► depends on * * ► contains multiplicity association name class association composition
    • Implementation author_id author_name e_mail 1 * 1 * * * Author distrib_id module_id Dependency distrib_id distrib_name d_release author_id Distribution module_id module_name distrib_id Module 1 1 link table for n-to-n association