SlideShare a Scribd company logo
Scaling PostgreSQL
     with Stado
Who Am I?
• Jim Mlodgenski
  – Founder of Cirrus Technologies
  – Former Chief Architect of EnterpriseDB
  – Co-organizer of NYCPUG
Agenda
•   What is Stado?
•   Architecture
•   Query Flow
•   Scaling
•   Limitations
What is Stado?
• Continuation of GridSQL
• “Shared-Nothing”, distributed data architecture.
   – Leverage the power of multiple commodity
     servers while appearing as a single database
     to the application
• Essentially...
     Open Source
     Greenplum, Netezza or Teradata
Stado Details
• Designed for Parallel Querying
• Not just “Read-Only”, can execute
  UPDATE, DELETE
• Data Loader for parallel loading
• Standard connectivity via PostgreSQL
  compatible connectors: JDBC, ODBC,
  ADO.NET, libpq (psql)
What Stado is not?
• A replication solution like Slony or Bucardo
• A high availability solution like Synchronous
  Replication in PostgreSQL 9.1
• A scalable transactional solution like PostgresXC
• An elastic, eventually consistent NoSQL database
Architecture
• Loosely coupled, shared-
  nothing architecture
• Data repositories
   – Metadata database
   – Stado database
• Stado processes
   – Central coordinator
   – Agents
Configuration
• Can be configured for multiple logical “nodes” per
  physical server
  – Take advantage of multi-core processors
• Tables may be either replicated or partitioned
• Replicated tables for static lookup data or
  dimensions
  – Partitioned tables for large fact tables
Partitioning
• Tables may simultaneously use Stado
  Partitioning with Constraint Exclusion
  Partitioning
  – Large queries scan a much smaller subset of
    data by using subtables
  – Since each subtable is also partitioned
    across nodes, they are scanned in parallel
  – Queries execute much faster
Creating Tables
• Tables can be partitioned or
  replicated
CREATE TABLE STATE_CODES (
     STATE_CD varchar(2) PRIMARY KEY,
     USPS_CD varchar(2),
     NAME varchar(100),
     GNISIS varchar(8)) REPLICATED;
Creating Tables

CREATE TABLE roads (
  gid integer NOT NULL,
  statefp character varying(2),
  countyfp character varying(3),
  linearid character varying(22),
  fullname character varying(100),
  rttyp character varying(1),
  mtfcc character varying(5),
  the_geom geometry)
PARTITIONING KEY gid ON ALL;
Query Optimization
• Cost Based Optimizer
   – Takes into account Row Shipping
     (expensive)
• Looks for joins with replicated tables
   – Can be done locally
   – Looks for joins between tables on
     partitioned columns
Two Phase Aggregation
• SUM
  – SUM(stat1)
  – SUM2(SUM(stat1)
• AVG
  – SUM(stat1) / COUNT(stat1)
  – SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
Query 1
SELECT sum(st_length_spheroid(the_geom,
         'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344
        as interstate_miles
 FROM roads
 WHERE rttyp = 'I';




                 interstate_miles
                ------------------
                 84588.5425986619
                (1 row)
Query 1 :
Results
                                       120




                                       100



Nodes Actual (sec)                     80

    1 101.2080566

                      Time (seconds)
    4   25.6410708                     60                              Linear
                                                                       Actual
    8    14.3321144
                                       40
   12     5.4738612
   16     4.8214672
                                       20




                                        0
                                             1   4      8    12   16

                                                     Nodes
Query 2
SELECT s.name as state, c.name as county, a.population, b.road_length,
       a.population/b.road_length as person_per_km
  FROM (SELECT state_cd, county_cd, sum(population) as population
          FROM census_tract
         GROUP BY 1, 2) a,
       (SELECT statefp, countyfp,
               sum(st_length_spheroid(the_geom,
'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length
          FROM roads
         GROUP BY 1, 2) b,
       state_codes s, county_codes c
 WHERE a.state_cd = b.statefp
   AND a.county_cd = b.countyfp
   AND a.state_cd = c.state_cd
   AND a.county_cd = c.county_cd
   AND c.state_cd = s.state_cd
 ORDER BY 5 DESC
 LIMIT 20;
state       |     county       | population |   road_length    |   person_per_km
----------------------+-----------------+------------+------------------+------------------
New York             | New York         |    1537195 | 1465.35561969273 | 1049.02521909483
New York             | Kings            |    2465326 | 2785.37685011507 | 885.096032839562
New York             | Bronx            |    1332650 | 1638.47925579201 | 813.345665066614
New York             | Queens           |    2229379 | 4343.78066667893 | 513.234707521383
New Jersey           | Hudson           |     608975 | 1474.86512729116 | 412.902162191933
California           | San Francisco    |     776733 | 2125.05706617179 |   365.51159607175
Pennsylvania         | Philadelphia     |    1517550 | 5067.19918355051 | 299.484970894054
District of Columbia | Washington       |     572059 | 2191.33029860109 | 261.055579054054
New York             | Richmond         |     443728 | 1758.77468237864 | 252.293829588156
Massachusetts        | Suffolk          |     689807 | 2805.37242915611 | 245.887851762877
New Jersey           | Essex            |     793633 | 3359.22581976629 | 236.254733257324
Virginia             | Alexandria City |      128283 |   577.98117468444 | 221.950135434841
Puerto Rico          | San Juan         |     434374 | 1994.26820504899 | 217.811224638829
Virginia             | Arlington        |     189453 | 967.505165121908 | 195.816008874876
New Jersey           | Union            |     522541 | 2827.74655887522 | 184.790605919029
Maryland             | Baltimore City   |     651154 | 3707.01218958787 | 175.654669231717
Puerto Rico          | Catano           |      30071 | 174.765650431886 | 172.064704509654
Hawaii               | Honolulu         |     876156 |   5098.8482067881 | 171.834101441493
Puerto Rico          | Toa Baja         |      94085 | 558.532996996738 | 168.450208861249
Puerto Rico          | Carolina         |     186076 | 1122.20560229076 | 165.812752690026
(20 rows)
Query 2 :
Results
                                        4500


                                        4000


                                        3500
Nodes Actual (sec)
                                        3000
    1   3983.1002548

                       Time (seconds)
                                        2500
    4   1007.1235182                                                     Linear
                                                                         Actual
                                        2000
    8    563.6259202
   12     365.152858                    1500


   16    282.7345952                    1000


                                        500


                                          0
                                               1   4       8   12   16

                                                       Nodes
Scalability
Limitations
• SQL Support
  – Uses its own parser and optimizer
    so:
     • No Window Functions
     • No Stored Procedures
     • No Full Text Search
Transaction Performance
• Single row Insert, Update, or Delete are slow compared
  to a single PostgreSQL instance
   – The data must make an additional network trip to be
     committed
   – All partitioned rows must be hashed to be mapped to
     the proper node
   – All replicated rows must be committed to all nodes
• Use “gs-loader” for bulk loading for better performance
High Availability
• No heartbeat or fail-over control in the coordinator
  – High Availability for each PostgreSQL node must be
    configured separately
  – Streaming replication can be ideal for this
• Getting a consistent backup of the entire Stado
  database is difficult
  – Must ensure there are no transaction are occurring
  – Backup each node separately
Adding Nodes
• Requires Downtime
  – Data must be manually reloaded to partition
    the data to the new node
• With planning, the process can be fast with no
  mapping of data
  – Run multiple PostgreSQL instances on each
    physical server and move the PostgreSQL
    instances to new hardware as needed
Summary
• Stado can improve performance
  tremendously of queries
• Stado can scale linearly as more nodes
  are added
• Stado is open source so if the
  limitations are an issue,
  submit a patch
Download Stado at:
http://stado.us


Jim Mlodgenski
 Email:     jim@cirrusql.com
 Twitter:   @jim_mlodgenski


 NYC PostgreSQL User Group
 http://nycpug.org

More Related Content

What's hot

Join optimization in hive
Join optimization in hive Join optimization in hive
Join optimization in hive
Liyin Tang
 
Data preparation covariates
Data preparation covariatesData preparation covariates
Data preparation covariates
FAO
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Altinity Ltd
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Altinity Ltd
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Altinity Ltd
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Sergey Petrunya
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
Jeff Patti
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Insight Technology, Inc.
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Masayuki Matsushita
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
Databricks
 
Using PostGIS To Add Some Spatial Flavor To Your Application
Using PostGIS To Add Some Spatial Flavor To Your ApplicationUsing PostGIS To Add Some Spatial Flavor To Your Application
Using PostGIS To Add Some Spatial Flavor To Your ApplicationSteven Pousty
 
Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGIS
mleslie
 
Mysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilityMysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilitySergey Petrunya
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Vigen Sahakyan
 
Intro To PostGIS
Intro To PostGISIntro To PostGIS
Intro To PostGIS
mleslie
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
Julian Hyde
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Databricks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
Sergey Petrunya
 
Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
Barry Jones
 

What's hot (20)

Join optimization in hive
Join optimization in hive Join optimization in hive
Join optimization in hive
 
Data preparation covariates
Data preparation covariatesData preparation covariates
Data preparation covariates
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
Using PostGIS To Add Some Spatial Flavor To Your Application
Using PostGIS To Add Some Spatial Flavor To Your ApplicationUsing PostGIS To Add Some Spatial Flavor To Your Application
Using PostGIS To Add Some Spatial Flavor To Your Application
 
Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGIS
 
Mysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilityMysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperability
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Intro To PostGIS
Intro To PostGISIntro To PostGIS
Intro To PostGIS
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 

Similar to Scaling PostreSQL with Stado

Weakpass - defcon russia 23
Weakpass - defcon russia 23Weakpass - defcon russia 23
Weakpass - defcon russia 23
DefconRussia
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
RISC-V International
 
Ceilometer to Gnocchi
Ceilometer to GnocchiCeilometer to Gnocchi
Ceilometer to Gnocchi
Gordon Chung
 
Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduceohkura
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke HiramaInsight Technology, Inc.
 
LinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your App
LinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your AppLinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your App
LinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your App
Steven Pousty
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
I pv6 dhcp
I pv6 dhcpI pv6 dhcp
I pv6 dhcp
eufronio
 
PennNet and MAGPI
PennNet and MAGPIPennNet and MAGPI
PennNet and MAGPI
Shumon Huque
 
Traffic Analyzer for GPRS UMTS Networks (TAN)
Traffic Analyzer for GPRS UMTS Networks (TAN)Traffic Analyzer for GPRS UMTS Networks (TAN)
Traffic Analyzer for GPRS UMTS Networks (TAN)Muhannad Aulama
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
Alkin Tezuysal
 
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
Indonesia Network Operators Group
 
State of GeoServer 2012
State of GeoServer 2012State of GeoServer 2012
State of GeoServer 2012
Jody Garnett
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
flyinweb
 
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Amazon Web Services
 
Using amazon machine learning to identify trends in io t data technical 201
Using amazon machine learning to identify trends in io t data   technical 201Using amazon machine learning to identify trends in io t data   technical 201
Using amazon machine learning to identify trends in io t data technical 201
Amazon Web Services
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
ClickHouse 2018.  How to stop waiting for your queries to complete and start ...ClickHouse 2018.  How to stop waiting for your queries to complete and start ...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Altinity Ltd
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 

Similar to Scaling PostreSQL with Stado (20)

Weakpass - defcon russia 23
Weakpass - defcon russia 23Weakpass - defcon russia 23
Weakpass - defcon russia 23
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Ceilometer to Gnocchi
Ceilometer to GnocchiCeilometer to Gnocchi
Ceilometer to Gnocchi
 
Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduce
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
 
LinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your App
LinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your AppLinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your App
LinuxFest NW - Using Postgis To Add Some Spatial Flavor To Your App
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
I pv6 dhcp
I pv6 dhcpI pv6 dhcp
I pv6 dhcp
 
PennNet and MAGPI
PennNet and MAGPIPennNet and MAGPI
PennNet and MAGPI
 
Traffic Analyzer for GPRS UMTS Networks (TAN)
Traffic Analyzer for GPRS UMTS Networks (TAN)Traffic Analyzer for GPRS UMTS Networks (TAN)
Traffic Analyzer for GPRS UMTS Networks (TAN)
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
 
State of GeoServer 2012
State of GeoServer 2012State of GeoServer 2012
State of GeoServer 2012
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
 
Using amazon machine learning to identify trends in io t data technical 201
Using amazon machine learning to identify trends in io t data   technical 201Using amazon machine learning to identify trends in io t data   technical 201
Using amazon machine learning to identify trends in io t data technical 201
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
ClickHouse 2018.  How to stop waiting for your queries to complete and start ...ClickHouse 2018.  How to stop waiting for your queries to complete and start ...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 

More from Jim Mlodgenski

Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
Jim Mlodgenski
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
Jim Mlodgenski
 
Profiling PL/pgSQL
Profiling PL/pgSQLProfiling PL/pgSQL
Profiling PL/pgSQL
Jim Mlodgenski
 
Debugging Your PL/pgSQL Code
Debugging Your PL/pgSQL CodeDebugging Your PL/pgSQL Code
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL TriggersAn Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Jim Mlodgenski
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Jim Mlodgenski
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
Jim Mlodgenski
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 

More from Jim Mlodgenski (10)

Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
 
Profiling PL/pgSQL
Profiling PL/pgSQLProfiling PL/pgSQL
Profiling PL/pgSQL
 
Debugging Your PL/pgSQL Code
Debugging Your PL/pgSQL CodeDebugging Your PL/pgSQL Code
Debugging Your PL/pgSQL Code
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL TriggersAn Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL Triggers
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 

Recently uploaded

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

Scaling PostreSQL with Stado

  • 1. Scaling PostgreSQL with Stado
  • 2. Who Am I? • Jim Mlodgenski – Founder of Cirrus Technologies – Former Chief Architect of EnterpriseDB – Co-organizer of NYCPUG
  • 3. Agenda • What is Stado? • Architecture • Query Flow • Scaling • Limitations
  • 4. What is Stado? • Continuation of GridSQL • “Shared-Nothing”, distributed data architecture. – Leverage the power of multiple commodity servers while appearing as a single database to the application • Essentially... Open Source Greenplum, Netezza or Teradata
  • 5. Stado Details • Designed for Parallel Querying • Not just “Read-Only”, can execute UPDATE, DELETE • Data Loader for parallel loading • Standard connectivity via PostgreSQL compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
  • 6. What Stado is not? • A replication solution like Slony or Bucardo • A high availability solution like Synchronous Replication in PostgreSQL 9.1 • A scalable transactional solution like PostgresXC • An elastic, eventually consistent NoSQL database
  • 7. Architecture • Loosely coupled, shared- nothing architecture • Data repositories – Metadata database – Stado database • Stado processes – Central coordinator – Agents
  • 8. Configuration • Can be configured for multiple logical “nodes” per physical server – Take advantage of multi-core processors • Tables may be either replicated or partitioned • Replicated tables for static lookup data or dimensions – Partitioned tables for large fact tables
  • 9. Partitioning • Tables may simultaneously use Stado Partitioning with Constraint Exclusion Partitioning – Large queries scan a much smaller subset of data by using subtables – Since each subtable is also partitioned across nodes, they are scanned in parallel – Queries execute much faster
  • 10. Creating Tables • Tables can be partitioned or replicated CREATE TABLE STATE_CODES ( STATE_CD varchar(2) PRIMARY KEY, USPS_CD varchar(2), NAME varchar(100), GNISIS varchar(8)) REPLICATED;
  • 11. Creating Tables CREATE TABLE roads ( gid integer NOT NULL, statefp character varying(2), countyfp character varying(3), linearid character varying(22), fullname character varying(100), rttyp character varying(1), mtfcc character varying(5), the_geom geometry) PARTITIONING KEY gid ON ALL;
  • 12. Query Optimization • Cost Based Optimizer – Takes into account Row Shipping (expensive) • Looks for joins with replicated tables – Can be done locally – Looks for joins between tables on partitioned columns
  • 13. Two Phase Aggregation • SUM – SUM(stat1) – SUM2(SUM(stat1) • AVG – SUM(stat1) / COUNT(stat1) – SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
  • 14. Query 1 SELECT sum(st_length_spheroid(the_geom, 'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344 as interstate_miles FROM roads WHERE rttyp = 'I'; interstate_miles ------------------ 84588.5425986619 (1 row)
  • 15. Query 1 : Results 120 100 Nodes Actual (sec) 80 1 101.2080566 Time (seconds) 4 25.6410708 60 Linear Actual 8 14.3321144 40 12 5.4738612 16 4.8214672 20 0 1 4 8 12 16 Nodes
  • 16. Query 2 SELECT s.name as state, c.name as county, a.population, b.road_length, a.population/b.road_length as person_per_km FROM (SELECT state_cd, county_cd, sum(population) as population FROM census_tract GROUP BY 1, 2) a, (SELECT statefp, countyfp, sum(st_length_spheroid(the_geom, 'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length FROM roads GROUP BY 1, 2) b, state_codes s, county_codes c WHERE a.state_cd = b.statefp AND a.county_cd = b.countyfp AND a.state_cd = c.state_cd AND a.county_cd = c.county_cd AND c.state_cd = s.state_cd ORDER BY 5 DESC LIMIT 20;
  • 17. state | county | population | road_length | person_per_km ----------------------+-----------------+------------+------------------+------------------ New York | New York | 1537195 | 1465.35561969273 | 1049.02521909483 New York | Kings | 2465326 | 2785.37685011507 | 885.096032839562 New York | Bronx | 1332650 | 1638.47925579201 | 813.345665066614 New York | Queens | 2229379 | 4343.78066667893 | 513.234707521383 New Jersey | Hudson | 608975 | 1474.86512729116 | 412.902162191933 California | San Francisco | 776733 | 2125.05706617179 | 365.51159607175 Pennsylvania | Philadelphia | 1517550 | 5067.19918355051 | 299.484970894054 District of Columbia | Washington | 572059 | 2191.33029860109 | 261.055579054054 New York | Richmond | 443728 | 1758.77468237864 | 252.293829588156 Massachusetts | Suffolk | 689807 | 2805.37242915611 | 245.887851762877 New Jersey | Essex | 793633 | 3359.22581976629 | 236.254733257324 Virginia | Alexandria City | 128283 | 577.98117468444 | 221.950135434841 Puerto Rico | San Juan | 434374 | 1994.26820504899 | 217.811224638829 Virginia | Arlington | 189453 | 967.505165121908 | 195.816008874876 New Jersey | Union | 522541 | 2827.74655887522 | 184.790605919029 Maryland | Baltimore City | 651154 | 3707.01218958787 | 175.654669231717 Puerto Rico | Catano | 30071 | 174.765650431886 | 172.064704509654 Hawaii | Honolulu | 876156 | 5098.8482067881 | 171.834101441493 Puerto Rico | Toa Baja | 94085 | 558.532996996738 | 168.450208861249 Puerto Rico | Carolina | 186076 | 1122.20560229076 | 165.812752690026 (20 rows)
  • 18. Query 2 : Results 4500 4000 3500 Nodes Actual (sec) 3000 1 3983.1002548 Time (seconds) 2500 4 1007.1235182 Linear Actual 2000 8 563.6259202 12 365.152858 1500 16 282.7345952 1000 500 0 1 4 8 12 16 Nodes
  • 20. Limitations • SQL Support – Uses its own parser and optimizer so: • No Window Functions • No Stored Procedures • No Full Text Search
  • 21. Transaction Performance • Single row Insert, Update, or Delete are slow compared to a single PostgreSQL instance – The data must make an additional network trip to be committed – All partitioned rows must be hashed to be mapped to the proper node – All replicated rows must be committed to all nodes • Use “gs-loader” for bulk loading for better performance
  • 22. High Availability • No heartbeat or fail-over control in the coordinator – High Availability for each PostgreSQL node must be configured separately – Streaming replication can be ideal for this • Getting a consistent backup of the entire Stado database is difficult – Must ensure there are no transaction are occurring – Backup each node separately
  • 23. Adding Nodes • Requires Downtime – Data must be manually reloaded to partition the data to the new node • With planning, the process can be fast with no mapping of data – Run multiple PostgreSQL instances on each physical server and move the PostgreSQL instances to new hardware as needed
  • 24. Summary • Stado can improve performance tremendously of queries • Stado can scale linearly as more nodes are added • Stado is open source so if the limitations are an issue, submit a patch
  • 25. Download Stado at: http://stado.us Jim Mlodgenski Email: jim@cirrusql.com Twitter: @jim_mlodgenski NYC PostgreSQL User Group http://nycpug.org