SlideShare a Scribd company logo
© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0
Greenplum for PostgreSQL hackers
Heikki Linnakangas
Greenplum
Greenplum is PostgreSQL
● with Massively Parallel Processing bolted on
● with lots of other extra features and addons
Greenplum is an Open Source project
How MPP works?
CREATE TABLE employee
(
employee_id int8,
name text
)
DISTRIBUTED BY (employee_id)
Data distribution
Data distribution
● Data is distributed across all segments by hash
● Or randomly
QD
QE
QE
QE
QD = Query Dispatcher
QE = Query Executor
Server
process
Query Dispatching
Terminology QD/QE
Query Dispatcher (aka master)
● Accept client connections
● Manage distributed transactions
● Create query plans
Query Executor (aka segment)
● Store the data
● Exeute slices of the query
Deployment
● One QD, as many QE nodes as needed
● Multiple QEs on one physical server
○ To utilize Multiple CPU cores
● Gigabit ethernet
● Firewall
QD / QE roles
● Catalogs are the same in all nodes
○ OIDs of objects must match
● gpcheckcat to verify
● Utility mode
PGOPTIONS='-c gp_session_role=utility' psql postgres
Distributed plans
postgres=# explain select * from foo;
QUERY PLAN
-------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
-> Seq Scan on foo
● Query plan is dispatched from QD to QE nodes
● Query results are returned from QEs via a Motion node
More complicated query
postgres=# explain select * from foo, bar where bar.c = foo.a;
QUERY PLAN
--------------------------------------------------------------
Gather Motion 3:1 (slice2; segments: 3)
-> Hash Join (cost=2.44..5.65 rows=2 width=12)
Hash Cond: bar.c = foo.a
-> Seq Scan on bar
-> Hash
-> Broadcast Motion 3:3 (slice1; segments: 3)
-> Seq Scan on foo
Slices
postgres=# explain select * from foo, bar where bar.c = foo.a;
QUERY PLAN
--------------------------------------------------------------
Gather Motion 3:1 (slice2; segments: 3)
-> Hash Join (cost=2.44..5.65 rows=2 width=12)
Hash Cond: bar.c = foo.a
-> Seq Scan on bar
-> Hash
-> Broadcast Motion 3:3 (slice1; segments: 3)
-> Seq Scan on foo
QD
QE segment 1
QE segment 3
QE segment 2
Process
QE backend process
(slice 2)
QE backend process
(slice 2)
QE backend process
(slice 1)
QE backend process
(slice 2)
QE backend process
(slice 1)
QE backend process
(slice 1)
QD backend
process
Interconnect
● Data flows between Motion Sender and
Receiver via the Interconnect
● UDP based protocol
Slice 1Slice 2
QD
Hash
SeqScan
foo
Hash
Join
SeqScan
bar
Redist.
Motion
Send
Redist.
Motion
Recv
Gather
Motion
Send
Gather
Motion
Recv SeqScan
foo
Hash
Hash
Join
SeqScan
bar
Redist.
Motion
Send
Redist.
Motion
Recv
Gather
Motion
Send
SeqScan
foo
Hash
Hash
Join
SeqScan
bar
Redist.
Motion
Send
Redist.
Motion
Recv
Gather
Motion
Send
Executor
node
Planner
Distributed planning
● Prepare a non-distributed plan
● Keep track of distribution keys at each step
● Add Motion nodes to make it distributed
● Different strategies to distribute aggregates
Data flows in one direction
● Nested loops with executor parameters cannot be dispatched
across query slices
○ Need to be materialized
○ The planner heavily discourages Nested loops
○ Hash joins are preferred
● Subplans with parameters need to use seqscans and must be fully
materialized
○ “Skip-level” subplans not supported
ORCA
● Alternative optimizer
● Written in C++
● Shared with Apache HAWQ
● https://github.com/greenplum-db/gporca
● Raw query is converted to XML, sent to ORCA, and converted back
● Slow startup time, better plans for some queries
Distributed transactions
● Every transaction uses two-phase commit
● QD acts as the transaction manager
● Distributed snapshots to ensure consistency
across nodes
Other non-MPP features
Mirroring
● If one QE node goes down, no queries can be run:
postgres=# select * from foo;
ERROR: failed to acquire resources on one or more
segments
DETAIL: could not connect to server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 40002?
(seg2 127.0.0.1:40002)
Mirroring
● Each node has a mirror node as backup
● GPDB 5:
○ Custom “file replication” between master and mirror QE node
● Revamped in GPDB 6:
○ Uses PostgreSQL WAL replication
Partitioning
CREATE TABLE partition_test
(
col1 int,
col2 decimal,
col3 text
)
distributed by (col1)
partition by list(col2)
(
partition part1 VALUES(1,2,3,4,5,6,7,8,9,10),
partition part2 VALUES(11,12,13,14,15,16,17,18,19,20),
partition part3 VALUES(21,22,23,24,25,26,27,28,29,30),
partition part4 VALUES(31,32,33,34,35,36,37,38,39,40),
default partition def
);
Append-optimized Tables
● Alternative to normal heap tables
● CREATE TABLE … WITH (appendonly=true)
● Optimized for appending in large batches and sequential scans
● Compression
● Column- or row-oriented
Bitmap indexes
● More dense than B-tree for duplicates
● Good for columns with few distinct values
Show me the code!
The Code
https://github.com/greenplum-db/gpdb/
$ ls -l
drwxr-xr-x 5 heikki heikki 4096 Jun 30 11:37 concourse
drwxr-xr-x 2 heikki heikki 4096 Jun 30 11:37 config
-rwxr-xr-x 1 heikki heikki 528544 Jun 30 11:37 configure
-rw-r--r-- 1 heikki heikki 78636 Jun 30 11:37 configure.in
drwxr-xr-x 54 heikki heikki 4096 Jun 30 11:37 contrib
-rw-r--r-- 1 heikki heikki 1547 Jun 30 11:37 COPYRIGHT
drwxr-xr-x 3 heikki heikki 4096 Jun 30 11:37 doc
-rw-r--r-- 1 heikki heikki 8253 Jun 30 11:37 GNUmakefile.in
drwxr-xr-x 8 heikki heikki 4096 Jun 30 11:37 gpAux
drwxr-xr-x 4 heikki heikki 4096 Jun 30 11:37 gpdb-doc
drwxr-xr-x 7 heikki heikki 4096 Jun 30 11:37 gpMgmt
-rw-r--r-- 1 heikki heikki 11358 Jun 30 11:37 LICENSE
-rw-r--r-- 1 heikki heikki 1556 Jun 30 11:37 Makefile
-rw-r--r-- 1 heikki heikki 113093 Jun 30 11:37 NOTICE
-rw-r--r-- 1 heikki heikki 2051 Jun 30 11:37 README.amazon_linux
-rw-r--r-- 1 heikki heikki 1514 Jun 30 11:37 README.debian
-rw-r--r-- 1 heikki heikki 19812 Jun 30 11:37 README.md
-rw-r--r-- 1 heikki heikki 1284 Jun 30 11:37 README.PostgreSQL
drwxr-xr-x 14 heikki heikki 4096 Jun 30 11:37 src
Code layout
Same as PostgreSQL:
● src/, contrib/, configure
Greenplum-specific:
● gpMgmt/
● gpAux/
● concourse/
Catalog tables
● Extra catalog tables for GPDB functionality
○ pg_partition, pg_partitition_rule, gp_distribution_policy, …
● Extra columns in pg_proc, pg_class
○ a Perl script provides defaults for them at build time
Documentation
● doc/ contains the user manual in PostgreSQL
○ All but reference pages removed in Greenplum
● gpdb-doc/ contains the Greenplum manuals
Compile!
./configure
make install
Some features that are optional in PostgreSQL are mandatory in
Greenplum to run the regression tests.
Read the README.md for which flags to use
Initialize a cluster (wrong)
~/gpdb.master$ bin/initdb -D data
The files belonging to this database system will be owned by user "heikki".
This user must also own the server process.
…
Success. You can now start the database server using:
bin/postgres -D data
or
bin/pg_ctl -D data -l logfile start
~/gpdb.master$ bin/postgres -D data
2017-06-30 08:50:05.440267
GMT,,,p16890,th-751686272,,,,0,,,seg-1,,,,,"FATAL","22023","dbid (from -b
option) is not specified or is invalid. This value must be >= 0, or >= -1 in
utility mode. The dbid value to pass can be determined from this server's
entry in the segment configuration; it may be -1 if running in utility
mode.",,,,,,,,"PostmasterMain","postmaster.c",1174,
Creating a cluster
● gpinitsystem – runs initdb on each node
● gpstart / gpstop – runs pg_ctl start/stop on each node
● For quick testing and hacking on your laptop:
make cluster
Regression tests
● Regression test suite in src/test/regress
● Core is the same as in PostgreSQL
● Greatly extended to cover Greenplum-specific features
● Needs a running cluster
make -C src/test/regress installcheck
gpdiff.pl
● In GPDB, rows can arrive from QE nodes in any order
● Post-processing step in installcheck masks out the row-order
differences
gpdiff.pl directives
--start_ignore
DROP TABLE IF EXISTS T_a1 CASCADE;
DROP TABLE IF EXISTS T_b2 CASCADE;
DROP TABLE IF EXISTS T_random CASCADE;
--end_ignore
More tests
src/test/
● isolation/
● isolation2/
● tinc/
Use “make installcheck-world” to run most of these
Yet more tests
● Pivotal runs a Concourse CI instance to run “make
installcheck-world”
● And some additional tests that need special setup
● Scripts in concourse/ directory in the repository
● PRs
● Issues
Development on github
Greenplum mailing lists
Public mailing lists on Google Groups:
● gpdb-dev
● gpdb-users
greenplum.org
● News, events
● Links to the github project, mailing list, Concourse
instance, and more
That’s all, folks!
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018

More Related Content

What's hot

Scaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterScaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitter
lohitvijayarenu
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
Gal Vinograd
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
DataWorks Summit
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Yousun Jeong
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Databricks
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
Yahoo Developer Network
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com
 
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
Equnix Business Solutions
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
Nicolas Poggi
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
YugabyteDB
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
lohitvijayarenu
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
Yuval Carmel
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Clusterairbots
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
DataWorks Summit
 

What's hot (20)

Scaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterScaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitter
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 

Similar to Greenplum Overview for Postgres Hackers - Greenplum Summit 2018

Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...
Jeremy Eder
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
Fastly
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
Ashutosh Bapat
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
Ricardo Lourenço
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Citus Data
 
Getting data into Rudder
Getting data into RudderGetting data into Rudder
Getting data into Rudder
RUDDER
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
Renee Yao
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
DLow6
 
Percona XtraDB 集群安装与配置
Percona XtraDB 集群安装与配置Percona XtraDB 集群安装与配置
Percona XtraDB 集群安装与配置
YUCHENG HU
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
kawamuray
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
Stas Kelvich
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
Gluster.org
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
Mydbops
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
Ontico
 

Similar to Greenplum Overview for Postgres Hackers - Greenplum Summit 2018 (20)

Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
 
Getting data into Rudder
Getting data into RudderGetting data into Rudder
Getting data into Rudder
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Percona XtraDB 集群安装与配置
Percona XtraDB 集群安装与配置Percona XtraDB 集群安装与配置
Percona XtraDB 集群安装与配置
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
 

More from VMware Tanzu

Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
VMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
VMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
VMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
VMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 

More from VMware Tanzu (20)

Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
 
What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 

Recently uploaded

Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 

Recently uploaded (20)

Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 

Greenplum Overview for Postgres Hackers - Greenplum Summit 2018

  • 1. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Greenplum for PostgreSQL hackers Heikki Linnakangas
  • 2. Greenplum Greenplum is PostgreSQL ● with Massively Parallel Processing bolted on ● with lots of other extra features and addons Greenplum is an Open Source project
  • 4. CREATE TABLE employee ( employee_id int8, name text ) DISTRIBUTED BY (employee_id) Data distribution
  • 5. Data distribution ● Data is distributed across all segments by hash ● Or randomly
  • 6. QD QE QE QE QD = Query Dispatcher QE = Query Executor Server process Query Dispatching
  • 7. Terminology QD/QE Query Dispatcher (aka master) ● Accept client connections ● Manage distributed transactions ● Create query plans Query Executor (aka segment) ● Store the data ● Exeute slices of the query
  • 8. Deployment ● One QD, as many QE nodes as needed ● Multiple QEs on one physical server ○ To utilize Multiple CPU cores ● Gigabit ethernet ● Firewall
  • 9. QD / QE roles ● Catalogs are the same in all nodes ○ OIDs of objects must match ● gpcheckcat to verify ● Utility mode PGOPTIONS='-c gp_session_role=utility' psql postgres
  • 10. Distributed plans postgres=# explain select * from foo; QUERY PLAN ------------------------------------- Gather Motion 3:1 (slice1; segments: 3) -> Seq Scan on foo ● Query plan is dispatched from QD to QE nodes ● Query results are returned from QEs via a Motion node
  • 11. More complicated query postgres=# explain select * from foo, bar where bar.c = foo.a; QUERY PLAN -------------------------------------------------------------- Gather Motion 3:1 (slice2; segments: 3) -> Hash Join (cost=2.44..5.65 rows=2 width=12) Hash Cond: bar.c = foo.a -> Seq Scan on bar -> Hash -> Broadcast Motion 3:3 (slice1; segments: 3) -> Seq Scan on foo
  • 12. Slices postgres=# explain select * from foo, bar where bar.c = foo.a; QUERY PLAN -------------------------------------------------------------- Gather Motion 3:1 (slice2; segments: 3) -> Hash Join (cost=2.44..5.65 rows=2 width=12) Hash Cond: bar.c = foo.a -> Seq Scan on bar -> Hash -> Broadcast Motion 3:3 (slice1; segments: 3) -> Seq Scan on foo
  • 13. QD QE segment 1 QE segment 3 QE segment 2 Process QE backend process (slice 2) QE backend process (slice 2) QE backend process (slice 1) QE backend process (slice 2) QE backend process (slice 1) QE backend process (slice 1) QD backend process
  • 14. Interconnect ● Data flows between Motion Sender and Receiver via the Interconnect ● UDP based protocol
  • 15. Slice 1Slice 2 QD Hash SeqScan foo Hash Join SeqScan bar Redist. Motion Send Redist. Motion Recv Gather Motion Send Gather Motion Recv SeqScan foo Hash Hash Join SeqScan bar Redist. Motion Send Redist. Motion Recv Gather Motion Send SeqScan foo Hash Hash Join SeqScan bar Redist. Motion Send Redist. Motion Recv Gather Motion Send Executor node
  • 17. Distributed planning ● Prepare a non-distributed plan ● Keep track of distribution keys at each step ● Add Motion nodes to make it distributed ● Different strategies to distribute aggregates
  • 18. Data flows in one direction ● Nested loops with executor parameters cannot be dispatched across query slices ○ Need to be materialized ○ The planner heavily discourages Nested loops ○ Hash joins are preferred ● Subplans with parameters need to use seqscans and must be fully materialized ○ “Skip-level” subplans not supported
  • 19. ORCA ● Alternative optimizer ● Written in C++ ● Shared with Apache HAWQ ● https://github.com/greenplum-db/gporca ● Raw query is converted to XML, sent to ORCA, and converted back ● Slow startup time, better plans for some queries
  • 20. Distributed transactions ● Every transaction uses two-phase commit ● QD acts as the transaction manager ● Distributed snapshots to ensure consistency across nodes
  • 22. Mirroring ● If one QE node goes down, no queries can be run: postgres=# select * from foo; ERROR: failed to acquire resources on one or more segments DETAIL: could not connect to server: Connection refused Is the server running on host "127.0.0.1" and accepting TCP/IP connections on port 40002? (seg2 127.0.0.1:40002)
  • 23. Mirroring ● Each node has a mirror node as backup ● GPDB 5: ○ Custom “file replication” between master and mirror QE node ● Revamped in GPDB 6: ○ Uses PostgreSQL WAL replication
  • 24. Partitioning CREATE TABLE partition_test ( col1 int, col2 decimal, col3 text ) distributed by (col1) partition by list(col2) ( partition part1 VALUES(1,2,3,4,5,6,7,8,9,10), partition part2 VALUES(11,12,13,14,15,16,17,18,19,20), partition part3 VALUES(21,22,23,24,25,26,27,28,29,30), partition part4 VALUES(31,32,33,34,35,36,37,38,39,40), default partition def );
  • 25. Append-optimized Tables ● Alternative to normal heap tables ● CREATE TABLE … WITH (appendonly=true) ● Optimized for appending in large batches and sequential scans ● Compression ● Column- or row-oriented
  • 26. Bitmap indexes ● More dense than B-tree for duplicates ● Good for columns with few distinct values
  • 27. Show me the code!
  • 29. $ ls -l drwxr-xr-x 5 heikki heikki 4096 Jun 30 11:37 concourse drwxr-xr-x 2 heikki heikki 4096 Jun 30 11:37 config -rwxr-xr-x 1 heikki heikki 528544 Jun 30 11:37 configure -rw-r--r-- 1 heikki heikki 78636 Jun 30 11:37 configure.in drwxr-xr-x 54 heikki heikki 4096 Jun 30 11:37 contrib -rw-r--r-- 1 heikki heikki 1547 Jun 30 11:37 COPYRIGHT drwxr-xr-x 3 heikki heikki 4096 Jun 30 11:37 doc -rw-r--r-- 1 heikki heikki 8253 Jun 30 11:37 GNUmakefile.in drwxr-xr-x 8 heikki heikki 4096 Jun 30 11:37 gpAux drwxr-xr-x 4 heikki heikki 4096 Jun 30 11:37 gpdb-doc drwxr-xr-x 7 heikki heikki 4096 Jun 30 11:37 gpMgmt -rw-r--r-- 1 heikki heikki 11358 Jun 30 11:37 LICENSE -rw-r--r-- 1 heikki heikki 1556 Jun 30 11:37 Makefile -rw-r--r-- 1 heikki heikki 113093 Jun 30 11:37 NOTICE -rw-r--r-- 1 heikki heikki 2051 Jun 30 11:37 README.amazon_linux -rw-r--r-- 1 heikki heikki 1514 Jun 30 11:37 README.debian -rw-r--r-- 1 heikki heikki 19812 Jun 30 11:37 README.md -rw-r--r-- 1 heikki heikki 1284 Jun 30 11:37 README.PostgreSQL drwxr-xr-x 14 heikki heikki 4096 Jun 30 11:37 src
  • 30. Code layout Same as PostgreSQL: ● src/, contrib/, configure Greenplum-specific: ● gpMgmt/ ● gpAux/ ● concourse/
  • 31. Catalog tables ● Extra catalog tables for GPDB functionality ○ pg_partition, pg_partitition_rule, gp_distribution_policy, … ● Extra columns in pg_proc, pg_class ○ a Perl script provides defaults for them at build time
  • 32. Documentation ● doc/ contains the user manual in PostgreSQL ○ All but reference pages removed in Greenplum ● gpdb-doc/ contains the Greenplum manuals
  • 33. Compile! ./configure make install Some features that are optional in PostgreSQL are mandatory in Greenplum to run the regression tests. Read the README.md for which flags to use
  • 34. Initialize a cluster (wrong) ~/gpdb.master$ bin/initdb -D data The files belonging to this database system will be owned by user "heikki". This user must also own the server process. … Success. You can now start the database server using: bin/postgres -D data or bin/pg_ctl -D data -l logfile start ~/gpdb.master$ bin/postgres -D data 2017-06-30 08:50:05.440267 GMT,,,p16890,th-751686272,,,,0,,,seg-1,,,,,"FATAL","22023","dbid (from -b option) is not specified or is invalid. This value must be >= 0, or >= -1 in utility mode. The dbid value to pass can be determined from this server's entry in the segment configuration; it may be -1 if running in utility mode.",,,,,,,,"PostmasterMain","postmaster.c",1174,
  • 35. Creating a cluster ● gpinitsystem – runs initdb on each node ● gpstart / gpstop – runs pg_ctl start/stop on each node ● For quick testing and hacking on your laptop: make cluster
  • 36. Regression tests ● Regression test suite in src/test/regress ● Core is the same as in PostgreSQL ● Greatly extended to cover Greenplum-specific features ● Needs a running cluster make -C src/test/regress installcheck
  • 37. gpdiff.pl ● In GPDB, rows can arrive from QE nodes in any order ● Post-processing step in installcheck masks out the row-order differences
  • 38. gpdiff.pl directives --start_ignore DROP TABLE IF EXISTS T_a1 CASCADE; DROP TABLE IF EXISTS T_b2 CASCADE; DROP TABLE IF EXISTS T_random CASCADE; --end_ignore
  • 39. More tests src/test/ ● isolation/ ● isolation2/ ● tinc/ Use “make installcheck-world” to run most of these
  • 40. Yet more tests ● Pivotal runs a Concourse CI instance to run “make installcheck-world” ● And some additional tests that need special setup ● Scripts in concourse/ directory in the repository
  • 42. Greenplum mailing lists Public mailing lists on Google Groups: ● gpdb-dev ● gpdb-users
  • 43. greenplum.org ● News, events ● Links to the github project, mailing list, Concourse instance, and more That’s all, folks!