Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics

19,757 views

Published on

In this talk we introduce Postgres-XL for scaling out PostgreSQL. We cover its architecture, how tables are distributed, and include a sample configuration for a small local test cluster. Finally, we discuss the differences to PostgreSQL and discuss Postgres-XL community building

Published in: Technology

Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics

  1. 1. 1©2014 TransLattice, Inc. All Rights Reserved. 1 Supersized PostgreSQL: Postgres-XL for Scale-out OLTP and Big Data Analytics June 17, 2014 Mason Sharp
  2. 2. 2©2014 TransLattice, Inc. All Rights Reserved. Agenda •  Postgres-XL Background •  Architecture Overview •  Distributing Tables •  Configuring a Local Cluster •  Backup and Recovery •  Differences to PostgreSQL and Postgres-XC •  Community
  3. 3. 3©2014 TransLattice, Inc. All Rights Reserved. Announced May 2014
  4. 4. 4©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL n  Open Source n  Scale-out RDBMS –  Massive Parallel Processing –  OLTP write and read scalability –  Multi-tenant security –  Cluster-wide ACID properties –  Can also act as distributed key-value store n  Mixed workloads, can eliminate need for separate clusters in some cases
  5. 5. 5©2014 TransLattice, Inc. All Rights Reserved. XL Origins Considered open sourcing very early
  6. 6. 6©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Applications and Use Case Examples Postgres-XL Application Use Case Example Business intelligence requiring MPP parallelism Financial services big data analysis of opportunities across service lines OLTP write-intensive workloads Online ad tracking Mixed workload environments Real-time reporting on a transactional system
  7. 7. 8©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Technology •  Data is automatically spread across cluster •  Queries return back data from all nodes in parallel •  The Global Transaction Manager maintains a consistent view of the data across cluster •  Connect to any coordinator
  8. 8. 9©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Features •  Multi Version Concurrency Control •  Referential Integrity* •  Full Text Search •  Geo-spatial Functions and Data Types •  JSON and XML Support •  Distributed Key-Value Store
  9. 9. 10©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Missing Features •  Built in High Availability •  Use external like Corosync/Pacemaker •  Area for future improvement •  “Easy” to re-shard / add nodes •  Some pieces there •  Can however “pre-shard” multiple nodes on the same server or VM •  Some FK and UNIQUE constraints
  10. 10. 11©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Connectors Looks like a PostgreSQL 9.2 database •  C •  C++ •  Perl •  Python •  PHP •  Erlang •  Haskell – Java – Javascript – .NET – Node.js – Ruby – Scala
  11. 11. 13©2014 TransLattice, Inc. All Rights Reserved. Performance Transactional work loads 10 50 100 200 300 400 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Dell DVD Benchmark StormDB Amazon RDS
  12. 12. 14©2014 TransLattice, Inc. All Rights Reserved. MPP Performance – DBT-1 (TPC-H) Postgres-XL PostgreSQL Postgres-XL
  13. 13. 15©2014 TransLattice, Inc. All Rights Reserved. Key-value Store Comparison Postgres-XL MongoDB
  14. 14. 17©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Architecture •  Based on the Postgres-XC project •  Coordinators •  Connection point for applications •  Parsing and planning of queries •  Data Nodes •  Actual data storage •  Local execution of queries •  Global Transaction Manager (GTM) •  Provides a consistent view to transactions •  GTM Proxy to increase performance
  15. 15. 18©2014 TransLattice, Inc. All Rights Reserved. Coordinators •  Handles network connectivity to the client •  Parse and plan statements •  Perform final query processing of intermediate result sets •  Manages two-phase commit •  Stores global catalog information
  16. 16. 19©2014 TransLattice, Inc. All Rights Reserved. Data Nodes •  Stores tables and indexes •  Only coordinators connects to data nodes •  Executes queries from the coordinators •  Communicates with peer data nodes for distributed joins
  17. 17. 20©2014 TransLattice, Inc. All Rights Reserved. Global Transaction Manager (GTM) •  Handles necessary MVCC tasks •  Transaction IDs •  Snapshots •  Manages cluster wide values •  Timestamps •  Sequences •  GTM Standby can be configured
  18. 18. 21©2014 TransLattice, Inc. All Rights Reserved. GTM Proxy •  Runs alongside Coordinator or Datanode •  Groups requests together •  Obtain range of transaction ids (XIDs) •  Obtain snapshot
  19. 19. 22©2014 TransLattice, Inc. All Rights Reserved. Data Distribution •  Replicated Tables •  Each row is replicated to all data nodes •  Statement based replication •  Distributed Tables •  Each row is stored on one data node •  Available strategies •  Hash •  Round Robin •  Modulo
  20. 20. 23©2014 TransLattice, Inc. All Rights Reserved. Availability •  No Single Point of Failure •  Global Transaction Manager Standby •  Coordinators are load balanced •  Streaming replication for data nodes •  But, currently manual…
  21. 21. 24©2014 TransLattice, Inc. All Rights Reserved. DDL
  22. 22. 25©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE my_table (…) DISTRIBUTE BY HASH(col) | MODULO(col) | ROUNDROBIN | REPLICATION [ TO NODE (nodename[,nodename…])]
  23. 23. 26©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE state_code ( state_code CHAR(2), state_name VARCHAR, : ) DISTRIBUTE BY REPLICATION An exact replica on each node
  24. 24. 27©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE invoice ( invoice_id SERIAL, cust_id INT, : ) DISTRIBUTE BY HASH(invoice_id) Distributed evenly amongst sharded nodes
  25. 25. 28©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE invoice ( invoice_id SERIAL, cust_id INT …. ) DISTRIBUTE BY HASH(invoice_id); CREATE TABLE lineitem ( lineitem_id SERIAL, invoice_id INT … ) DISTRIBUTE BY HASH(invoice_id); Joins on invoice_id can be pushed down to the datanodes
  26. 26. 29©2014 TransLattice, Inc. All Rights Reserved. test2=# create table t1 (col1 int, col2 int) distribute by hash(col1); test2=# create table t2 (col3 int, col4 int) distribute by hash(col3);
  27. 27. 30©2014 TransLattice, Inc. All Rights Reserved. explain select * from t1 inner join t2 on col1 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Seq Scan on t1 -> Hash -> Seq Scan on t2 Join push-down. Looks much like regular PostgreSQL
  28. 28. 31©2014 TransLattice, Inc. All Rights Reserved. test2=# explain select * from t1 inner join t2 on col2 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Remote Subquery Scan on all (datanode_1,datanode_2) Distribute results by H: col2 -> Seq Scan on t1 -> Hash -> Seq Scan on t2 Will read t1.col2 once and put in shared queue for consumption for other datanodes for joining. Datanode-datanode communication and parallelism
  29. 29. 34©2014 TransLattice, Inc. All Rights Reserved.
  30. 30. 35©2014 TransLattice, Inc. All Rights Reserved. Configuration
  31. 31. 36©2014 TransLattice, Inc. All Rights Reserved.
  32. 32. 37©2014 TransLattice, Inc. All Rights Reserved. Use pgxc_ctl! (demo in a few minutes)
  33. 33. 38©2014 TransLattice, Inc. All Rights Reserved. Otherwise, PostgreSQL-like steps apply: initgtm initdb
  34. 34. 39©2014 TransLattice, Inc. All Rights Reserved. postgresql.conf •  Very similar to regular PostgreSQL •  But there are extra parameters •  max_pool_size •  min_pool_size •  pool_maintenance _timeout •  remote_query_cost •  network_byte_cost •  sequence_range •  pooler_port •  gtm_host, gtm_port •  shared_queues •  shared_queue_size
  35. 35. 40©2014 TransLattice, Inc. All Rights Reserved. Install n  Download RPMs –  http://sourceforge.net/projects/postgres-xl/ files/Releases/Version_9.2rc/ Or n  configure; make; make install
  36. 36. 41©2014 TransLattice, Inc. All Rights Reserved. Setup Environment •  .bashrc •  ssh used, so ad installation bin dir to PATH
  37. 37. 42©2014 TransLattice, Inc. All Rights Reserved. Avoid password with pgxc_ctl •  ssh-keygen –t rsa (in ~/.ssh) •  For local test, no need to copy key, already there •  cat ~/.ssh/id_rsa.pub >> ~/.ssh/ authorized_keys
  38. 38. 43©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl •  Initialize cluster •  Start/stop •  Deploy to node •  Add coordinator •  Add data node / slave •  Add GTM slave
  39. 39. 44©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf •  Let’s create a local test cluster! •  One GTM •  One Coordinator •  Two Datanodes Make sure all are using different ports •  including pooler_port
  40. 40. 45©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf pgxcOwner=pgxl pgxcUser=$pgxcOwner pgxcInstallDir=/usr/postgres-xl-9.2
  41. 41. 46©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf gtmMasterDir=/data/gtm_master gtmMasterServer=localhost gtmSlave=n gtmProxy=n
  42. 42. 47©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf coordMasterDir=/data/coord_master coordNames=(coord1) coordPorts=(5432) poolerPorts=(20010) coordMasterServers=(localhost) coordMasterDirs=($coordMasterDir/coord1)
  43. 43. 48©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf coordMaxWALSenders=($coordMaxWALsernder) coordSlave=n coordSpecificExtraConfig=(none) coordSpecificExtraPgHba=(none)
  44. 44. 49©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf datanodeNames=(dn1 dn2) datanodeMasterDir=/data/dn_master datanodePorts=(5433 5434) datanodePoolerPorts=(20011 20012) datanodeMasterServers=(localhost localhost) datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
  45. 45. 50©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf datanodeMaxWALSenders=($datanodeMaxWalSe nder $datanodeMaxWalSender) datanodeSlave=n primaryDatanode=dn1
  46. 46. 51©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl pgxc_ctl init all Now have a working cluster
  47. 47. 52©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl – Add a second Coordinator Format: pgxc_ctl add coordinator master name host port pooler data_dir pgxc_ctl add coordinator master coord2 localhost 15432 20020 /data/coord_master/coord2
  48. 48. 53©2014 TransLattice, Inc. All Rights Reserved. PostgreSQL Differences
  49. 49. 54©2014 TransLattice, Inc. All Rights Reserved. pg_catalog •  pgxc_node •  Coordinator and Datanode definition •  Currently not GTM •  pgxc_class •  Table replication or distribution info
  50. 50. 55©2014 TransLattice, Inc. All Rights Reserved. Additional Commands •  PAUSE CLUSTER / UNPAUSE CLUSTER •  Wait for current transactions to complete •  Prevent new commands until UNPAUSE •  EXECUTE DIRECT ON (nodename) ‘command’ •  CLEAN CONNECTION •  Cleans connection pool •  CREATE NODE / ALTER NODE
  51. 51. 56©2014 TransLattice, Inc. All Rights Reserved. Noteworthy •  SELECT pgxc_pool_reload() •  pgxc_clean for 2PC •  Cleans prepared transactions •  If committed on one node, must be committed on all •  If aborted on one node, must be aborted on all •  If both committed and aborted output error
  52. 52. 57©2014 TransLattice, Inc. All Rights Reserved. Backup and Recovery
  53. 53. 58©2014 TransLattice, Inc. All Rights Reserved. Backup •  Like PostgreSQL •  pg_dump, pg_dumpall from any Coordinator •  Will pull all data from all nodes in one central location •  No notion yet of parallel pg_dump of individual nodes on node-by-node basis, nor parallel pg_restore
  54. 54. 59©2014 TransLattice, Inc. All Rights Reserved. Cold Backup •  Stop Cluster •  Backup individual data directories on nodes’ file system •  Gzip, compress, rsync off, etc •  Cold restore do the opposite
  55. 55. 60©2014 TransLattice, Inc. All Rights Reserved. Hot Backup •  Similar to PostgreSQL •  Base backup individual nodes •  use streaming replication or WAL archiving •  Backup GTM control file •  Contains XID •  Contains Sequence values •  Periodically written to •  On restart, it will skip ahead some values What about PITR?
  56. 56. 61©2014 TransLattice, Inc. All Rights Reserved. Hot Backup - Barriers •  CREATE BARRIER ‘barrier_name’ •  Waits for currently committing transactions to finish •  Writes a barrier in each node’s WAL for a consistent restoration point •  Currently done manually! •  Could be configured as background process in the future to periodically write In recovery.conf, use recovery_target_barrier
  57. 57. 62©2014 TransLattice, Inc. All Rights Reserved. Similarities and differences compared to Postgres-XC
  58. 58. 63©2014 TransLattice, Inc. All Rights Reserved. Code Merging PostgreSQL Postgres-XC Postgres-XL Postgres-XC Enhancements Postgres-XL Enhancements Planner and Executor are different
  59. 59. 64©2014 TransLattice, Inc. All Rights Reserved. Similarities with Postgres-XC
  60. 60. 65©2014 TransLattice, Inc. All Rights Reserved. Some of the original Postgres-XC developers developed Postgres-XL
  61. 61. 66©2014 TransLattice, Inc. All Rights Reserved. Global Transaction Manager
  62. 62. 67©2014 TransLattice, Inc. All Rights Reserved. Pooler
  63. 63. 68©2014 TransLattice, Inc. All Rights Reserved. pg_catalog
  64. 64. 69©2014 TransLattice, Inc. All Rights Reserved. pgxc_* pgxc_ctl, pgxc_node
  65. 65. 70©2014 TransLattice, Inc. All Rights Reserved. “pgxc” = Postgres-XL Cluster J
  66. 66. 71©2014 TransLattice, Inc. All Rights Reserved. Differences compared to Postgres-XC
  67. 67. 72©2014 TransLattice, Inc. All Rights Reserved. n  Performance n  Multi-tenant security n  Statistics
  68. 68. 73©2014 TransLattice, Inc. All Rights Reserved. Performance Data node ßà Data node Communication –  Avoid Coordinator-centric processing –  Can be many times faster for some queries Pooler needs to be configured for each Data node, not just coordinator –  More connections needed at each node
  69. 69. 74©2014 TransLattice, Inc. All Rights Reserved. Performance Re-parse avoidance In XC, SQL strings are sent to other nodes from Coordinator, where they are reparsed and replanned. In XL, plan once on coordinator, serialize and send over
  70. 70. 75©2014 TransLattice, Inc. All Rights Reserved. Performance Sequences Detect frequently accessed sequences, fetch range of values, avoid fewer round trips to GTM COPY INSERT SELECT
  71. 71. 76©2014 TransLattice, Inc. All Rights Reserved. Multi-tenant Security User can only get own info from global pg_catalog pg_database, pg_user Harder to hack if names of other database objects are unknown
  72. 72. 77©2014 TransLattice, Inc. All Rights Reserved. Statistics storm_stats Basic DML stats by time range pg_stat_statements may be sufficient
  73. 73. 78©2014 TransLattice, Inc. All Rights Reserved. Differences Planner and Executor are different
  74. 74. 79©2014 TransLattice, Inc. All Rights Reserved. Differences GUC n  remote_query_cost n  network_byte_cost n  pool_maintenance_timeout n  sequence_range n  shared_queues n  shared_queue_size
  75. 75. 80©2014 TransLattice, Inc. All Rights Reserved. Looking at Postgres-XL Code n  XC #ifdef PGXC n  XL #ifdef XCP #ifndef XCP
  76. 76. 81©2014 TransLattice, Inc. All Rights Reserved. Community
  77. 77. 82©2014 TransLattice, Inc. All Rights Reserved. Website and Code n  postgres-xl.org n  sourceforge.net/projects/postgres-xl n  Will add mirror for github.com n  Docs –  files.postgres-xl.org/documentation/index.html
  78. 78. 84©2014 TransLattice, Inc. All Rights Reserved. Philosophy n  Stability and bug fixes over features n  Performance-focused n  Open and inclusive –  Welcome input for roadmap, priorities and design –  Welcome others to become core members and contributors
  79. 79. 85©2014 TransLattice, Inc. All Rights Reserved. Planned Community Structure n  Like PostgreSQL –  Core –  Major Contributors –  Minor Contributors –  Advocacy
  80. 80. 87©2014 TransLattice, Inc. All Rights Reserved. Roadmap n  Catch up to PostgreSQL Community -> 9.3, 9.4 n  Make XL a more robust analytical platform –  Distributed Foreign Data Wrappers n  Remove blockers to adoption n  Native High Availability
  81. 81. 88©2014 TransLattice, Inc. All Rights Reserved. Thank You! msharp@translattice.com @mason_db

×