Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
pgpool: Features and
    Development


           Tatsuo Ishii
pgpool Global Development Group
       SRA OSS, Inc. Japan
Agenda
●   Developers
●   History
●   Existing pgpool project
●   Ongoing pgpool-II project
●   Demonstration
Who are we?
 ●   pgpool Global Development Group
     –   Tatsuo Ishii(SRA OSS, Inc. Japan)
     –   Devrim Gunduz(Command...
Developers!
Tomoaki
Communication                                               Kaori
manager                             ...
pgpool-I: the history
 ●   V0.1: Started as a personal project
     (2003/6/27)
 ●   V1.0: Synchronous replication (2004/3...
Why pgpool?
 ●   No general purpose connection pooling
     software was available
     –   Java has its own, but PHP does...
pgpool process architecture

                         pgpool
                     parent process

                        ...
pgpool functionality:
                 connection pooling
 ●   Reduce connection overhead
 ●   Limit maximum number of con...
How does connection pooling
             work?(1)
                                    pgpool

                            ...
How does connection pooling
             work?(2)
                                    pgpool
                             ...
pgpool understands
        frontend/backend protocol
 ●   Pgpool is transparent to both applications
     and PostgreSQL
 ...
Limitations of current
                  implementation
 ●   resetting pooled connection may not be
     perfect(need modi...
pgpool functionality: replication
 ●   Queries are duplicated and sent to
     PostgreSQL servers


                      ...
Dead lock problem
                    master                                       secondary
            session 1     ses...
“Strict” mode in replication
       to avoid deadlock problem
                     Query                        Query     ...
pgpool functionality: load
                 balanace
 ●   SELECT queries are sent to randomly
     chosen backend
 ●   The...
Limitations of current
     implementation in replication
 ●   CURRENT_TIMESTAMP and server- dependent-
     value-returni...
pgpool-II project
 ●   Information-Technology Promotion
     Agency, Japan (IPA: http://www.ipa.go.jp)
     granted projec...
Goal of pgpool-II project
 ●   Implement parallel query processing
 ●   Enhance pgpool
     –   Allow to have more than 2 ...
pgpool-II architecture overview

            Client
                                                                SQL   ...
pgpool-I and pgpool-II mode

                 replication
                                                                ...
Table partitioning control
CREATE TABLE dist_def (
   dbname TEXT,        -- database name
   schema_name TEXT, --schema n...
Simple parallel query
                                               SELECT * FROM accounts
                              ...
Complex query example
SELECT * FROM accounts                    SELECT * dblink('con',
WHERE aid = 1000                   ...
DML
 ●   INSERT recognize partition key value in a
     query and INSERT into appropreate DB
     node
 ●   UPDATE/DELETE ...
INSERT
                                             call                Syetm DB
     INSERT INTO accounts(aid, bid)      ...
Query Cache
 ●   Caches query result
 ●   Caches can be validated by pgpoolAdmin

      CREATE TABLE pgpool_catalog.query_...
Using parallel query and
             replication together

                                  pgpool-II
                  ...
pgpoolAdmin
 ●   Web based pgpool
     management tool
 ●   Apache/PHP/Smarty
 ●   functions
     –   Stopping pgpool
    ...
Benchmarks
 ●   Hitachi Blade Symphony BS1000
     –   10 blades
     –   Xeon 3.0GHz x 1
     –   1GB Mem
     –   UL320 ...
Sequencial Scan case(mid size)
                                                        ●   scale factor = 90
             ...
Index Scan case(mid size)
                                                             ●   scale factor = 90
             ...
Index Scan case(large size)
                                                             ●   scale factor = 900
          ...
complex query case
                                                       ●   scale factor = 900
               PostgreSQL...
pgpool-II restrictions
 ●   SQL restrictions
     –   COPY is not supported
     –   SELECT INTO, INSERT INTO ... SELECT i...
Future plans(TODO)
 ●   More intelligent query rewriting
 ●   Remove SQL restrictions
 ●   Employ two phase commit to keep...
References
 ●   pgpool page
     –   http://pgpool.projects.postgresql.org/
●   pgpool-II page
     –   English page
     ...
Thank you!




2006/07/09 Tronto   Copyright(c) 2006 pgpool DG   38
Upcoming SlideShare
Loading in …5
×

pgpool: Features and Development

3,770 views

Published on

Published in: Technology, Business, Travel
  • Be the first to comment

pgpool: Features and Development

  1. 1. pgpool: Features and Development Tatsuo Ishii pgpool Global Development Group SRA OSS, Inc. Japan
  2. 2. Agenda ● Developers ● History ● Existing pgpool project ● Ongoing pgpool-II project ● Demonstration
  3. 3. Who are we? ● pgpool Global Development Group – Tatsuo Ishii(SRA OSS, Inc. Japan) – Devrim Gunduz(Command Prompt, Inc.) – Yoshiyuki Asaba(SRA OSS, Inc. Japan) – Taiki Yamaguchi(SRA OSS, Inc. Japan) – pgpoo-II development team ● In addition to Tatsuo, Yoshiyuki and Taiki: – Tomoaki Sato, Yoshiharu Mori, Kaori Inaba (all from SRA OSS, Inc. Japan) 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 3
  4. 4. Developers! Tomoaki Communication Kaori manager Project manager System DB Taiki Yoshiharu PCP Query rewriting Query Cache Parallel execution pgpool engin developer Yoshiyuki Tatsuo Project leader Enhance Our Boss pgpool pgpool-I “Green Turtle” developer pgpool developer 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 4
  5. 5. pgpool-I: the history ● V0.1: Started as a personal project (2003/6/27) ● V1.0: Synchronous replication (2004/3) ● V2.0: Load balance (2004/6) ● V3.0: pgpool Global Development Group(2006/2) 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 5
  6. 6. Why pgpool? ● No general purpose connection pooling software was available – Java has its own, but PHP does not... ● No small to mid scale/light weight synchronous replication tool was available 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 6
  7. 7. pgpool process architecture pgpool parent process pre-fork pgpool PostgreSQL child process backend process pgpool.conf 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 7
  8. 8. pgpool functionality: connection pooling ● Reduce connection overhead ● Limit maximum number of connections to the PostgreSQL backend – New incoming connections are queued in the kernel if all pgpool processes are busy 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 8
  9. 9. How does connection pooling work?(1) pgpool PostgreSQL user1/db1 user1/db1 PostgreSQL user1/db1 user1/db1 PostgreSQL user1/db1 user1/db1 PostgreSQL 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 9
  10. 10. How does connection pooling work?(2) pgpool user1/db1 PostgreSQL user2/db2 user2/db2 PostgreSQL user3/db3 user2/db2 user3/db3 PostgreSQL user3/db3 user2/db2 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 10
  11. 11. pgpool understands frontend/backend protocol ● Pgpool is transparent to both applications and PostgreSQL ● Virtually no modifications to applications are needed ● No special APIs are needed ● Can be used with existing language APIs ● Can be more efficient than libpq because of smaller controlling granuality 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 11
  12. 12. Limitations of current implementation ● resetting pooled connection may not be perfect(need modification to applications) – temp tables – need help from PostgreSQL ● SSL is not supported – fall back to non SSL mode ● no pg_hba.conf like IP based authentication 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 12
  13. 13. pgpool functionality: replication ● Queries are duplicated and sent to PostgreSQL servers PostgreSQL query pgpool query PostgreSQL 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 13
  14. 14. Dead lock problem master secondary session 1 session 2 session 1 session 2 t lock wait lock lock wait lock 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 14
  15. 15. “Strict” mode in replication to avoid deadlock problem Query Query PostgreSQL pgpool packet PostgreSQL Wait until PostgreSQL pgpool something returns PostgreSQL PostgreSQL pgpool Query PostgreSQL reply back PostgreSQL pgpool Wait until the result something PostgreSQL returns 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 15
  16. 16. pgpool functionality: load balanace ● SELECT queries are sent to randomly chosen backend ● The ratio for load balancing can be changed PostgreSQL pgpool PostgreSQL 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 16
  17. 17. Limitations of current implementation in replication ● CURRENT_TIMESTAMP and server- dependent- value-returning-functions cannot be replicated ● MD5 and crypt authentication are not supported ● Sequences and SERIAL needs table locking if there are more than 1 connections ● Functions having side effects cannot be load balanced 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 17
  18. 18. pgpool-II project ● Information-Technology Promotion Agency, Japan (IPA: http://www.ipa.go.jp) granted project ● Started in February 2006, expected to release the first version in September 2006 under BSD license ● Features including parallel query and enhancement to pgpool ● Successor to pgpool 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 18
  19. 19. Goal of pgpool-II project ● Implement parallel query processing ● Enhance pgpool – Allow to have more than 2 DB nodes – More precise control using shared memory – Easy to manage ● Control port/protocol ● Detailed statistics on node status ● GUI management tool 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 19
  20. 20. pgpool-II architecture overview Client SQL Query Parser Cache Communication pgpoolAdmin Query pgpool Manager Rewriting Catalog shared Replication/load Parallel PCP command memory balance execution PostgreSQL PCP lib manager engine engine pgpool-II System DB PostgreSQL DB node 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 20
  21. 21. pgpool-I and pgpool-II mode replication parallel query load balance fail over virtually compatible with pgpool pgpoo-I pgpoo-II mode mode 1000 1000 1000 3000 2000 2000 2000 4000 3000 3000 4000 4000 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 21
  22. 22. Table partitioning control CREATE TABLE dist_def ( dbname TEXT, -- database name schema_name TEXT, --schema name table_name TEXT, -- table name col_name TEXT, -- key col name col_list TEXT[], -- col names type_list TEXT[], -- col types dist_def_func TEXT, -- function name PRIMARY KEY (dbname, schema_name, table_name) ); INSERT INTO dist_def VALUES ('y-mori','public','accounts','aid', ARRAY['aid','bid','abalance','filler'], ARRAY['integer','integer','integer','character(84)'],'dist_def_accounts' ); CREATE OR REPLACE FUNCTION dist_def_accounts (val ANYELEMENT) RETURNS INTEGER AS ' SELECT CASE WHEN $1 >= 0 and $1 < 100001 THEN 0 WHEN $1 >= 100001 and $1 < 200001 THEN 1 WHEN $1 >= 200001 and $1 < 300000 THEN 2 END' LANGUAGE SQL; 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 22
  23. 23. Simple parallel query SELECT * FROM accounts WHERE aid = 1000; SELECT * FROM accounts WHERE aid = 1000; PostgreSQL pgpool PostgreSQL SELECT * FROM accounts WHERE aid = 1000; 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 23
  24. 24. Complex query example SELECT * FROM accounts SELECT * dblink('con', WHERE aid = 1000 'SELECT pool_parallel('SELECT * FROM accounts ORDER BY aid; WHERE aid = 1000')') AS foo(...) ORDER BY aid; pgpool System DB PostgreSQL pgpool SELECT pool_parallel('SELECT * FROM accounts WHERE aid = 1000')') SELECT * FROM accounts PostgreSQL WHERE aid = 1000; 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 24
  25. 25. DML ● INSERT recognize partition key value in a query and INSERT into appropreate DB node ● UPDATE/DELETE simply issues the same query to all DB nodes 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 25
  26. 26. INSERT call Syetm DB INSERT INTO accounts(aid, bid) SELECT VALUES(500,100); return dist_def_accounts(500); DB node = 1 INSERT DB node 0 DB node 1 DB node 2 aid = 0-499 aid = 500-999 aid = 1000-1499 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 26
  27. 27. Query Cache ● Caches query result ● Caches can be validated by pgpoolAdmin CREATE TABLE pgpool_catalog.query_cache ( hash TEXT, -- query string(MD5 hashed) query TEXT, -- query string value bytea, -- query result(RowDescription and DataRow packets) dbname TEXT, -- database name create_time TIMESTAMP WITH TIME ZONE, -- cache creation time PRIMARY KEY(hash) ); 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 27
  28. 28. Using parallel query and replication together pgpool-II with parallel query Parallel query layer pgpool-II pgpool-II with replication with replication Replication and load balance layer PostgreSQL PostgreSQL PostgreSQL PostgreSQL 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 28
  29. 29. pgpoolAdmin ● Web based pgpool management tool ● Apache/PHP/Smarty ● functions – Stopping pgpool – Switch over – Monitoring connection pool status – Monitoring process status – Editing configuration file – Query Cache management 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 29
  30. 30. Benchmarks ● Hitachi Blade Symphony BS1000 – 10 blades – Xeon 3.0GHz x 1 – 1GB Mem – UL320 SCSI Disks ● Cent OS 4.3 ● PostgreSQL 8.1.4/pgbench ● Please note that these results are measured on pre-alpha version of pgpool-II and maybe slightly different from the future official release version 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 30
  31. 31. Sequencial Scan case(mid size) ● scale factor = 90 PostgreSQL pgpool-II 1.3 (90M rows) 1.2 1.1 ● SELECT * FROM 1 0.9 accounts WHERE 0.8 0.7 aid = :aid TPS 0.6 0.5 ● pgpool-II is 7 times 0.4 0.3 faster at the best 0.2 0.1 (32 concurrent 0 1 2 4 8 16 32 users) concurrent users 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 31
  32. 32. Index Scan case(mid size) ● scale factor = 90 PostgreSQL pgpool-II 3750 (90M rows) 3500 3250 3000 ● SELECT * FROM 2750 2500 accounts WHERE 2250 2000 aid = :aid TPS 1750 1500 1250 ● pgpool-II is 9 times 1000 750 faster at the best 500 250 (16 concurrent 0 1 2 4 8 16 32 users) concurrent users 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 32
  33. 33. Index Scan case(large size) ● scale factor = 900 PostgreSQL pgpool-II 1300 (900M rows) 1200 1100 ● SELECT * FROM 1000 900 accounts WHERE 800 700 aid = :aid TPS 600 500 ● pgpool-II is 18 400 300 times faster at the 200 100 best (32 0 1 2 4 8 16 32 concurrent users) concurrent users 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 33
  34. 34. complex query case ● scale factor = 900 PostgreSQL pgpool-II 45 (900M rows) 40 ● SELECT a1.abalance 35 FROM accounts as a1 30 ,accounts as a2 25 WHERE a1.aid = :aid1 TPS 20 and a2.aid = :aid2 15 10 ● pgpool-II is faster 5 only when at 8 0 1 2 4 8 concurrent users concurrent users 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 34
  35. 35. pgpool-II restrictions ● SQL restrictions – COPY is not supported – SELECT INTO, INSERT INTO ... SELECT is not supported – Extended protocol is not supported – INSERT requires explicit values if target column is the key for data patitioning – UPDATE with WHERE clause including function call is not supported – CREATE TABLE/ALTER TABLE requires pgpool restarting ● Transactions – If a DB node fails during INSERT/UPDATE/DELETE, data consistency will be broken – SQL commands via dblink will be executed as separate transactions 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 35
  36. 36. Future plans(TODO) ● More intelligent query rewriting ● Remove SQL restrictions ● Employ two phase commit to keep consistency among DB nodes. However this technique can be used only for queries outside transaction block since PREPARE TRANSACTION closes current transaction block ● More intelligent cache validation 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 36
  37. 37. References ● pgpool page – http://pgpool.projects.postgresql.org/ ● pgpool-II page – English page ● http://pgpool.projects.postgresql.org/pgpool-II/en/ – Japanese page ● http://pgpool.projects.postgresql.org/pgpool-II/ja/ 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 37
  38. 38. Thank you! 2006/07/09 Tronto Copyright(c) 2006 pgpool DG 38

×