Implementing Parallelism in PostgreSQL - PGCon 2014


Published on

PostgreSQL's architecture is based heavily on the idea that each connection is served by a single backend process, but CPU core counts are rising much faster than CPU speeds, and large data sets can't be efficiently processed serially. Adding parallelism to PostgreSQL requires significant architectural changes to many areas of the system, including background workers, shared memory, memory allocation, locking, GUC, transactions, snapshots, and more.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Implementing Parallelism in PostgreSQL - PGCon 2014

  1. 1. © 2013 EDB All rights reserved. 1 Implementing Parallelism in PostgreSQL • Robert Haas | PGCon 2014
  2. 2. © 2014 EDB All rights reserved. 2 • Between 1996 and 2004, single-threaded CPU performance on SPECint and SPECfp benchmarks increased by >50% per year. Between 2004 and 2012, it increased by ~21% per year. − • Single-threaded 7-zip performance was only 39% faster on 2 x Intel Xeon L5640 (March 16, 2010; 2.7 GHz, 12 MB cache) than on 4 x AMD Opteron 880 (September 26, 2005; 2.4 GHz, 2MB cache). That's only 7.7% per year. − Parallelism: Why? (1)
  3. 3. © 2014 EDB All rights reserved. 3 • Dell Configuration Tool (as of 2014-05-15): − 2x Intel® Xeon® E7-4890 v2 Processor 2.8GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 15 Core, 155W [add $7,735.68] − 2x Intel® Xeon® E7-8893 v2 Processor 3.4GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 6 Core, 155W [add $8,410.30] Parallelism: Why? (2)
  4. 4. © 2014 EDB All rights reserved. 4 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • One backend could run the Seq Scan and apply the filter condition; it could then stream the results to another backend to perform the Hash Join. Parallel Query: Inter-Node
  5. 5. © 2014 EDB All rights reserved. 5 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • Multiple backends could cooperate to perform Seq Scan – or Hash Join. Parallel Query: Intra-Node
  6. 6. © 2014 EDB All rights reserved. 6 • CREATE INDEX − Parallel Heap Scan − Parallel Sort • VACUUM − Parallel Heap Scan − Worker Per Index (suggestion from Andres and Heikki) Parallel Maintenance / DDL Commands
  7. 7. © 2014 EDB All rights reserved. 7 • Processes – Not Threads − None of our fundamental subsystems are thread-safe (e.g. palloc/pfree, ereport, syscache, relcache, buffer manager). − Making them thread-safe would add synchronization overhead even in the single-threaded case – and also bugs. • Started By Postmaster – Not created via fork() − Can't fork() on Windows, where many of our users are. − Currently, all backends are direct children of the postmaster; seems preferable to keep it that way. Architectural Overview (1)
  8. 8. © 2014 EDB All rights reserved. 8 • Shared Memory – Not Pipes or Files − Files would cause more system calls and more I/O. − Pipes are a good paradigm, but shared memory is more flexible. − We can use shared memory to emulate a pipe if we need to – see shm_mq. (This also dodges platform dependencies.) • Dynamic Shared Memory – Not Main Segment − For an application such as parallel sort, we might need a LOT of memory, like a terabyte. We can't pre-reserve that! • Dynamic Shared Memory Could Be At a Different Address in Every Process − No good, general techniques for achieving this. Architectural Overview (2)
  9. 9. © 2014 EDB All rights reserved. 9 • Basic Facilities (done in 9.4) • Plumbing (some work done/in progress) • Parallel Environment (a little unpublished work done) • Parallel Execution (some study/thought) • Parallel Planning (no idea yet) What Do We Need To Build?
  10. 10. © 2014 EDB All rights reserved. 10 • Dynamic Background Workers (done in 9.4) • Dynamic Shared Memory (done in 9.4) Basic Facilities
  11. 11. © 2014 EDB All rights reserved. 11 • DSM Table of Contents (done in 9.4) − I just mapped this dynamic shared memory segment; how do I figure out what it contains? • Message Queueing (done in 9.4) − How does a background worker send tuples, errors, notices, etc. to a user backend? • Error Propagation (working on it) − Common infrastructure to make using message queueing easy. • Shared Memory Allocator (early draft posted) • Shared Hash Table (someday) Plumbing
  12. 12. © 2014 EDB All rights reserved. 12 • Make the Background Worker Look Enough Like a Regular User Backend To Do Useful Work − Copy Relevant State (e.g. User, Database, Snapshot) • Useful Work Doesn't Mean Everything − Some operations seem fundamentally unsafe in a parallel context (e.g. calling a user-defined function that sets a GUC). − Some operations could theoretically be made safe, but we might not bother (e.g. setseed() + random()). − Even if we share lots of state, arbitrary user-supplied code can never be safe; must label unsafe functions. Parallel Environment
  13. 13. © 2014 EDB All rights reserved. 13 • User ID and Database • GUCs • Transaction State • Current and Active Snapshot • Combo CID Hash Parallel Environment: What To Copy
  14. 14. © 2014 EDB All rights reserved. 14 • Sequence Operations • Generation of Invalidation Messages • Cursor Operations • Large Object Manipulation • LISTEN/NOTIFY • Access to Temporary Buffers • Prepared Statements Parallel Environment: What To Prohibit
  15. 15. © 2014 EDB All rights reserved. 15 • Background Workers Can't Rely on User Backend To Hold Necessary Locks − The user backend might die or be killed before the background worker terminates. • If Background Workers Re-Lock The Same Relations, Parallel Query Might Self Deadlock − User backend locks X; another process queues for a conflicting lock on X; background worker tries to re-lock X. • Probably Need a Concept of Locking Groups Inside the Lock Manager Parallel Environment: Lock Management
  16. 16. © 2014 EDB All rights reserved. 16 • This is the “easy” part. • Parallel sorting algorithms are described in the literature and well-understood. • For parallel sequential scan, grab blocks or block ranges in alternation. • Amdahl's Law: If α is the fraction of running time a program spends executing serially, the maximum speedup from parallelism is 1/α. Parallel Execution
  17. 17. © 2014 EDB All rights reserved. 17 • This is probably hard. • Right now, we do costing based on estimating the page access costs (CPU and I/O) and tuple processing costs. • For parallelism, need to consider worker startup costs and IPC costs. • A plan that's a little cheaper for me might be much more expensive in total. Parallel Query Planning
  18. 18. © 2014 EDB All rights reserved. 18 • Any questions? Thanks.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.