Greenplum: Building a Postgres Fabric for Large-Scale Analytical Computation
Greenplum Summit at PostgresConf US 2018
Elisabeth Hendrickson and Ivan Novick
19. Embarrassingly Parallel
19
• Spread out (shard) data across many databases
• Execute queries, in parallel, on every shard: spread the load
• Gather results
• Bonus: hide the complexity from the user
20. What Makes it Greenplum?
● MPP-aware optimizer (“Orca”)
● Interconnect (UDP-based protocol for communication between nodes)
● Executor “motion”
● Data management tools that take advantage of all that massive parallelism (e.g.
gpload, gptransfer)
(plus a few things Greenplum added independently that show up in later versions
Postgres such as a column store and a connection protocol similar to but different
from foreign data wrappers…worth noting things like this have made the merge
particularly challenging)