Gofer
   A scalable stateless proxy for DBI




Tim.Bunce@pobox.com - September 2008
DBI Layer Cake
Application
    DBI
 DBD::Foo
Database API



                 Database
New Layers
 Application          Gofer Transport

     DBI                    DBI
 DBD::Gofer               DBD::Foo
Gofer Transport       Database API



                          Database
Gofer gives you...
                               Few
        Application         round-trips           Gofer Transport

              DBI                                        DBI
        DBD::Gofer                                   DBD::Foo
       Gofer Transport
                             Stateless
                             protocol                           Server-side
                                               Database DBD      caching
Client-side                                     only needed
 caching                                         on server
                              Pluggable
   Retry on     Configure     transports        Flexible and easy to extend
    error        via DSN   ssh, http, etc...         e.g. DBIx::Router
DBD::Proxy vs DBD::Gofer
                              DBD::Proxy   DBD::Gofer
  Supports transactions           ✓         ✗(not yet)
Supports very large results       ✓        ✗(memory)
      Large test suite            ✗             ✓
   Minimal round-trips            ✗             ✓
 Automatic retry on error         ✗             ✓
Modular & Pluggable classes       ✗             ✓
    Tunable via Policies          ✗             ✓
         Scalable                 ✗             ✓
   Connection pooling             ✗             ✓
    Client-side caching           ✗             ✓
    Server-side caching           ✗             ✓
Gofer, logically
•   Gofer is
    - A scalable stateless proxy architecture for DBI
    - Transport independent
    - Highly configurable on client and server side
    - Efficient, in CPU time and minimal round-trips
    - Well tested
    - Scalable
    - Cacheable
    - Simple and reliable
Gofer, structurally
•   Gofer is
    - A simple stateless request/response protocol
    - A DBI proxy driver: DBD::Gofer
    - A request executor module
    - A set of pluggable transport modules
    - An extensible client configuration mechanism


•   Development sponsored by Shopzilla.com
Gofer Protocol
•   DBI::Gofer::Request & DBI::Gofer::Response
•   Simple blessed hashes
    - Request contains all required information to
      connect and execute the requested methods.
    - Response contains results from methods calls,
      including result sets (rows of data).
•   Serialized and transported by transport
    modules like DBI::Gofer::Transport::http
Using DBD::Gofer
•   Via DSN
    - By adding a prefix
    $dsn = “dbi:Driver:dbname”;
    $dsn = “dbi:Gofer:transport=foo;dsn=$dsn”;


•   Via DBI_AUTOPROXY environment variable
    - Automatically applies to all DBI connect calls
    $ export DBI_AUTOPROXY=“dbi:Gofer:transport=foo”;
    - No code changes required!
Gofer Transports
•   DBI::Gofer::Transport::null
    - The ‘null’ transport.
    - Serializes request object,
      transports it nowhere,
      then deserializes and passes it to
      DBI::Gofer::Execute to execute
    - Serializes response object,
      transports it nowhere,
      then deserializes and returns it to caller
  - Very useful for testing.
DBI_AUTOPROXY=“dbi:Gofer:transport=null”
Gofer Transports
•   DBD::Gofer::Transport::stream (ssh)
    - Can ssh to remote system to self-start server
       ssh -xq user@host.domain 
       perl -MDBI::Gofer::Transport::stream 
       -e run_stdio_hex
  - Automatically reconnects if required
  - ssh gives you security and optional compression
DBI_AUTOPROXY=’dbi:Gofer:transport=stream
;url=ssh:user@host.domain’
Gofer Transports
•   DBD::Gofer::Transport::http
    - Sends requests as http POST requests
    - Server typically Apache mod_perl running
      DBI::Gofer::Transport::http
    - Very flexible server-side configuration options
    - Can use https for security
    - Can use web techniques for scaling and high-
      availability. Will support web caching.
DBI_AUTOPROXY=’dbi:Gofer:transport=http
;url=http://example.com/gofer’
Gofer Transports
•   DBD::Gofer::Transport::gearman
    - Distributes requests to a pool of workers
    - Gearman a lightweight distributed job queue
      http://www.danga.com/gearman
    - Gearman is implemented by the same people who
      wrote memcached, perlbal, mogileFS, & DJabberd


DBI_AUTOPROXY=’dbi:Gofer:transport=gearman
;url=http://example.com/gofer’
Pooling via gearman vs http
• I haven’t compared them in use myself yet
+ Gearman may have lower latency
+ Gearman spreads load over multiple machines
  without need for load-balancer
+ Gearman coalescing may be beneficial
+ Gearman async client may work well with POE
- Gearman clients need to be told about servers
• More gearman info http://danga.com/words/
  2007_04_linuxfest_nw/linuxfest.pdf (p61+)
DBD::Gofer
• A proxy driver
• Aims to be as ‘transparent’ as possible
• Accumulates details of DBI method calls
• Delays forwarding request for as long as
  possible
• execute_array() is a single round-trip
• Policy mechanism allows fine-grained tuning
  to trade transparency for speed
DBD::Gofer::Policy::*
•   Three policies supplied: pedantic, classic, and
    rush. Classic is the default.
•   Policies are implemented as classes
•   Currently 22 individual items within a policy
•   Policy items can be dynamic methods
•   Policy is selected via DSN:
DBI_AUTOPROXY=“dbi:Gofer:transport=null
;policy=pedantic”
Round-trips per Policy
                             pedantic classic       rush
$dbh = DBI->connect_cached   connect()     ✓
$dbh->ping                      ✓
                                          if not     if not
$dbh->quote                     ✓        default    default

$sth = $dbh->prepare            ✓
$sth->execute                   ✓          ✓          ✓
$sth->{NUM_OF_FIELDS}
$sth->fetchrow_array
                                                    cached
$dbh->tables                    ✓          ✓       after first
Error Handling
•   DBD::Gofer can automatically retry on failure
DBI_AUTOPROXY=“dbi:Gofer:transport=...;
retry_limit=3”
•   Default behaviour is to retry if
     $request->is_idemponent is true
    - looks at SQL, returns true for most SELECTs
•   Default is retry_limit=0, so disabled
•   You can define your own behaviour:
    DBI->connect(..., {
        go_retry_hook => sub { ... },
    });
Client-side Caching
•   DBD::Gofer can automatically cache results
DBI_AUTOPROXY=“dbi:Gofer:transport=null
;cache=CacheModuleName”
•   Default behaviour is to cache if
     $request->is_idemponent is true
      looks at SQL returns true for most SELECTs
•   Use any cache module with standard interface
      set($k,$v) and get($k)
•   You can define your own behaviour by
    subclassing
Server-side Caching
•   DBI::Gofer::Transport::* can automatically
    cache results
•   Use any cache module with standard interface
    - set($k,$v) and get($k)
•   You can define your own behaviour by
    subclassing
Gofer Caveats
•   Adds latency time cost per network round-trip
    - Can minimize round-trips by fine-tuning a policy
    - and/or using client-side caching
•   State-less-ness has implications...
    - No transactions, AutoCommit only, at the moment.
    - Can’t alter $dbh attributes after connect
    - Can’t use temp tables, locks, and other per-connection
      persistent state, except within a stored procedure
    - Code using last_insert_id needs a (simple) change
    - See the docs for a few other very obscure caveats
An Example
Using Gofer for Connection Pooling,
   Scaling, and High Availability
The Problem
    Apache
               40 worker processes per server
     Worker
     Worker
     Worker
     Worker
     Worker
                                  +vers
     Worker
     Worker
     Worker
     Worker
     Worker
                            150 ser
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker                                         Database
     Worker
     Worker                                          Server
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker

                             =
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Workers                1 overloaded database
-
A Solution
                        A ‘database proxy’ that
    Apache
     Worker
                        does connection pooling
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker                                             Database
     Worker
     Worker                                              Server
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
                                  Holds a few
     Worker
     Worker
     Worker
     Worker                    connections open
     Worker
     Worker
     Worker
     Worker
     Worker
     Workers
                                                    Repeat for all servers
-              Uses them to ser vice DBI requests
An Implementation
                 Standard apache+mod_perl
    Apache
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker                                   DBI + DBD::*
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
                           Apache

     Worker                                                  Database
     Worker
     Worker                                                   Server
     Worker
     Worker
     Worker                         !"#$%#&
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker
     Worker                    DBI::Gofer::Execute and
     Worker
     Worker
     Worker
     Worker                  DBI::Gofer::Transport::http
     Workers
                          modules implement stateless proxy

-       DBD::Gofer with http transport
Load Balance and Cache
                                                                                                                Load balancing
 server                                                                                                         and fail-over




                                                                Apache running DBI Gofer
  with
   40
workers
                                                                                                                         Far fewer db

                                 HTTP Load Balancer and Cache
                                                                                                                         connections
                                                                                           Workers



   x          Many = 6000 db
  150         web                                                                                    Gofer          Database
                   connections
              servers                                                                                Servers         Server
servers


                                                                Apache running DBI Gofer




                                                                                           Workers


                                                                                                               Caching
   -                                          New middle tier
          Caching
Gofer’s Future
•   Caching for http transport
•   Optional JSON serialization
•   Caching in DBD::Gofer (now done)
•   Make state-less-ness of transport optional
    to support transactions
•   Patches welcome!
Future: http caching
• Potential big win
• DBD::Gofer needs to indicate cache-ability
 - via appropriate http headers
• Server side needs to agree
 - and respond with appropriate http headers
• Caching then happens just like for web pages
 - if there’s a web cache between client and server
• Patches welcome!
Future: JSON
• Turns DBI into a web service!
 - Service Oriented Architecture anyone?
• Accessible to anything that can talk JSON
• Clients could be JavaScript, Java, ...
 - and various languages that begin with P ...
• Patches welcome!
Future: Transactions
• State-less-ness could be made optional
 - If transport layer being used agrees
 - Easiest to implement for stream / ssh
• Would be enabled by
    AutoCommit => 0
    $dbh->begin_work
• Patches welcome!
Questions?

DBD::Gofer 200809

  • 1.
    Gofer A scalable stateless proxy for DBI Tim.Bunce@pobox.com - September 2008
  • 2.
    DBI Layer Cake Application DBI DBD::Foo Database API Database
  • 3.
    New Layers Application Gofer Transport DBI DBI DBD::Gofer DBD::Foo Gofer Transport Database API Database
  • 4.
    Gofer gives you... Few Application round-trips Gofer Transport DBI DBI DBD::Gofer DBD::Foo Gofer Transport Stateless protocol Server-side Database DBD caching Client-side only needed caching on server Pluggable Retry on Configure transports Flexible and easy to extend error via DSN ssh, http, etc... e.g. DBIx::Router
  • 5.
    DBD::Proxy vs DBD::Gofer DBD::Proxy DBD::Gofer Supports transactions ✓ ✗(not yet) Supports very large results ✓ ✗(memory) Large test suite ✗ ✓ Minimal round-trips ✗ ✓ Automatic retry on error ✗ ✓ Modular & Pluggable classes ✗ ✓ Tunable via Policies ✗ ✓ Scalable ✗ ✓ Connection pooling ✗ ✓ Client-side caching ✗ ✓ Server-side caching ✗ ✓
  • 6.
    Gofer, logically • Gofer is - A scalable stateless proxy architecture for DBI - Transport independent - Highly configurable on client and server side - Efficient, in CPU time and minimal round-trips - Well tested - Scalable - Cacheable - Simple and reliable
  • 7.
    Gofer, structurally • Gofer is - A simple stateless request/response protocol - A DBI proxy driver: DBD::Gofer - A request executor module - A set of pluggable transport modules - An extensible client configuration mechanism • Development sponsored by Shopzilla.com
  • 8.
    Gofer Protocol • DBI::Gofer::Request & DBI::Gofer::Response • Simple blessed hashes - Request contains all required information to connect and execute the requested methods. - Response contains results from methods calls, including result sets (rows of data). • Serialized and transported by transport modules like DBI::Gofer::Transport::http
  • 9.
    Using DBD::Gofer • Via DSN - By adding a prefix $dsn = “dbi:Driver:dbname”; $dsn = “dbi:Gofer:transport=foo;dsn=$dsn”; • Via DBI_AUTOPROXY environment variable - Automatically applies to all DBI connect calls $ export DBI_AUTOPROXY=“dbi:Gofer:transport=foo”; - No code changes required!
  • 10.
    Gofer Transports • DBI::Gofer::Transport::null - The ‘null’ transport. - Serializes request object, transports it nowhere, then deserializes and passes it to DBI::Gofer::Execute to execute - Serializes response object, transports it nowhere, then deserializes and returns it to caller - Very useful for testing. DBI_AUTOPROXY=“dbi:Gofer:transport=null”
  • 11.
    Gofer Transports • DBD::Gofer::Transport::stream (ssh) - Can ssh to remote system to self-start server ssh -xq user@host.domain perl -MDBI::Gofer::Transport::stream -e run_stdio_hex - Automatically reconnects if required - ssh gives you security and optional compression DBI_AUTOPROXY=’dbi:Gofer:transport=stream ;url=ssh:user@host.domain’
  • 12.
    Gofer Transports • DBD::Gofer::Transport::http - Sends requests as http POST requests - Server typically Apache mod_perl running DBI::Gofer::Transport::http - Very flexible server-side configuration options - Can use https for security - Can use web techniques for scaling and high- availability. Will support web caching. DBI_AUTOPROXY=’dbi:Gofer:transport=http ;url=http://example.com/gofer’
  • 13.
    Gofer Transports • DBD::Gofer::Transport::gearman - Distributes requests to a pool of workers - Gearman a lightweight distributed job queue http://www.danga.com/gearman - Gearman is implemented by the same people who wrote memcached, perlbal, mogileFS, & DJabberd DBI_AUTOPROXY=’dbi:Gofer:transport=gearman ;url=http://example.com/gofer’
  • 14.
    Pooling via gearmanvs http • I haven’t compared them in use myself yet + Gearman may have lower latency + Gearman spreads load over multiple machines without need for load-balancer + Gearman coalescing may be beneficial + Gearman async client may work well with POE - Gearman clients need to be told about servers • More gearman info http://danga.com/words/ 2007_04_linuxfest_nw/linuxfest.pdf (p61+)
  • 15.
    DBD::Gofer • A proxydriver • Aims to be as ‘transparent’ as possible • Accumulates details of DBI method calls • Delays forwarding request for as long as possible • execute_array() is a single round-trip • Policy mechanism allows fine-grained tuning to trade transparency for speed
  • 16.
    DBD::Gofer::Policy::* • Three policies supplied: pedantic, classic, and rush. Classic is the default. • Policies are implemented as classes • Currently 22 individual items within a policy • Policy items can be dynamic methods • Policy is selected via DSN: DBI_AUTOPROXY=“dbi:Gofer:transport=null ;policy=pedantic”
  • 17.
    Round-trips per Policy pedantic classic rush $dbh = DBI->connect_cached connect() ✓ $dbh->ping ✓ if not if not $dbh->quote ✓ default default $sth = $dbh->prepare ✓ $sth->execute ✓ ✓ ✓ $sth->{NUM_OF_FIELDS} $sth->fetchrow_array cached $dbh->tables ✓ ✓ after first
  • 18.
    Error Handling • DBD::Gofer can automatically retry on failure DBI_AUTOPROXY=“dbi:Gofer:transport=...; retry_limit=3” • Default behaviour is to retry if $request->is_idemponent is true - looks at SQL, returns true for most SELECTs • Default is retry_limit=0, so disabled • You can define your own behaviour: DBI->connect(..., { go_retry_hook => sub { ... }, });
  • 19.
    Client-side Caching • DBD::Gofer can automatically cache results DBI_AUTOPROXY=“dbi:Gofer:transport=null ;cache=CacheModuleName” • Default behaviour is to cache if $request->is_idemponent is true looks at SQL returns true for most SELECTs • Use any cache module with standard interface set($k,$v) and get($k) • You can define your own behaviour by subclassing
  • 20.
    Server-side Caching • DBI::Gofer::Transport::* can automatically cache results • Use any cache module with standard interface - set($k,$v) and get($k) • You can define your own behaviour by subclassing
  • 21.
    Gofer Caveats • Adds latency time cost per network round-trip - Can minimize round-trips by fine-tuning a policy - and/or using client-side caching • State-less-ness has implications... - No transactions, AutoCommit only, at the moment. - Can’t alter $dbh attributes after connect - Can’t use temp tables, locks, and other per-connection persistent state, except within a stored procedure - Code using last_insert_id needs a (simple) change - See the docs for a few other very obscure caveats
  • 22.
    An Example Using Goferfor Connection Pooling, Scaling, and High Availability
  • 23.
    The Problem Apache 40 worker processes per server Worker Worker Worker Worker Worker +vers Worker Worker Worker Worker Worker 150 ser Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Database Worker Worker Server Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker = Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Workers 1 overloaded database -
  • 24.
    A Solution A ‘database proxy’ that Apache Worker does connection pooling Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Database Worker Worker Server Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Holds a few Worker Worker Worker Worker connections open Worker Worker Worker Worker Worker Workers Repeat for all servers - Uses them to ser vice DBI requests
  • 25.
    An Implementation Standard apache+mod_perl Apache Worker Worker Worker Worker Worker Worker Worker Worker Worker DBI + DBD::* Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Apache Worker Database Worker Worker Server Worker Worker Worker !"#$%#& Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker DBI::Gofer::Execute and Worker Worker Worker Worker DBI::Gofer::Transport::http Workers modules implement stateless proxy - DBD::Gofer with http transport
  • 26.
    Load Balance andCache Load balancing server and fail-over Apache running DBI Gofer with 40 workers Far fewer db HTTP Load Balancer and Cache connections Workers x Many = 6000 db 150 web Gofer Database connections servers Servers Server servers Apache running DBI Gofer Workers Caching - New middle tier Caching
  • 28.
    Gofer’s Future • Caching for http transport • Optional JSON serialization • Caching in DBD::Gofer (now done) • Make state-less-ness of transport optional to support transactions • Patches welcome!
  • 29.
    Future: http caching •Potential big win • DBD::Gofer needs to indicate cache-ability - via appropriate http headers • Server side needs to agree - and respond with appropriate http headers • Caching then happens just like for web pages - if there’s a web cache between client and server • Patches welcome!
  • 30.
    Future: JSON • TurnsDBI into a web service! - Service Oriented Architecture anyone? • Accessible to anything that can talk JSON • Clients could be JavaScript, Java, ... - and various languages that begin with P ... • Patches welcome!
  • 31.
    Future: Transactions • State-less-nesscould be made optional - If transport layer being used agrees - Easiest to implement for stream / ssh • Would be enabled by AutoCommit => 0 $dbh->begin_work • Patches welcome!
  • 32.