Your SlideShare is downloading. ×
  • Like
Scaling PostgreSQL on SMP Architectures
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Scaling PostgreSQL on SMP Architectures

  • 3,671 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,671
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
22
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scaling PostgreSQL on SMP Architectures Doug Tolbert, David Strong, Johney Tsai {doug.tolbert, david.strong, johney.tsai}@unisys.com PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 1
  • 2. Unisys Cellular Multi-Processor Architecture • Up to 32 CPU sockets C PU – In 4- or 8-CPU pods C PU C PU – Dual-core supported C PU Multi-core eventually C PU T LC C C PU – Dynamically partitionable PU C PU • 32-bit & 64-bit technologies T LC CPU C PU C PU 8 GB T LC C PU • Multilayered caches C PU M SU C PU C PU C PU C PU C PU C PU • Large cross-bar memories T LC C PU C PU – IA32: 64Gb (with PAE) PC I C PU C PU T LC PC I CP – EM64T, Opteron, Athlon: 256Tb PC I 8 GB T LCU C PU – Itanium: 1Pb PC I M SU C PU C PU PC I C PU – Theoretical: Multi-Pb PC I T LC C PU C PU C PU Processor family dependent DIB PC I PC I C PU • Up to 96 PCI Slots PC I 8 GB DIB PC I M SU PC I – Lower on newer models PC I PC I PC I PC I PC I DIB • Linux: SLES, RedHat PC I PC I DIB • Windows: Data Center 8 GB PC I DIB PC I M SU PC I DIB PC I PC I PC I PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 2
  • 3. Unisys Open And Secure Integrated Stack PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 3
  • 4. PostgreSQL ASC Lab Engagement • February 2005, Unisys Mission Viejo Facility – Simon Riggs, Mark Wong (OSDL) participated – PostgreSQL 8.0.1 measured – Summary at http://developer.osdl.org/markw/papers/PostgreSQL%20Visit%20Report%20External%20050513.pdf • Interesting findings – Locking protocol and granularity – Context switch “storms” (~80K/sec) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 4
  • 5. PostgreSQL 8.1 Enhancements • A few stand out from a performance perspective – Improve concurrent access to the shared buffer cache (Tom) – Allow index scans to use an intermediate in-memory bitmap (Tom) – Improve performance for partitioned tables (Simon) • Context switches reduced – From ~80K/sec to ~30K/sec • Other effects only seen in overall throughput improvements • Undertook more complete characterization of 8.1 performance improvements PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 5
  • 6. TPC-C Benchmark • Transaction Processing Performance Council (TPC) – Defines suite of benchmarks – Sole certifier of vendor-submitted benchmark results – Specs, results, more info: http://www.tpc.org • TPC-C benchmark is a complete order-entry environment – Industry standard measure of DBMS OLTP performance across vendors – Reports transactions per minute (tpmC) and cost metrics – Mix of 5 transactions • New Order (selects, updates, inserts) • Payment (selects, updates, inserts) • Order Status (selects) • Stock Level (selects) • Delivery (selects, updates, deletes), interactive & deferred variants • Data volumes, number of clients scaled by vendor’s tpmC goal – See spec for details • Current (June 2006) result ranges – Best performance: 3,210,540 tpmC @ $5.07/tpmC – Lowest performance: 9,347 tpmC – Best price/performance: 38,622 tpmC @ $0.99/tpmC PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 6
  • 7. Ah, these are NOT TPC-C numbers! Test Application • DBMS stressor application with well-known characteristics • TPCC4J = “TPC-C For Java” – Unisys-developed from TPC-C version 5.3 (April 2004) – Runs on 1.4.2 JVM (5.0 JVM works) – Small application footprint (~72K) – Very few garbage collection cycles – Client overhead low (7-9% for ~1000 users) • Deviates from TPC-C in important ways – Non-audited components and results – Data entry screens not populated, only SQL statements run – User think time eliminated – Reports TPS, not tpmC • No, you can’t multiply by 60 to get tpmC! – No checkpoints during benchmark runs – Not run for a minimum of 8 hours – Data volumes not scaled PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 7
  • 8. Test Configuration Client Driver: Unisys ES3040 PostgreSQL: Unisys ES7000/540 Storage: EMC CX-200 External Disk 4 x 3.0Ghz Intel Xeon MP CPUs (IA32) 32 x 3.0Ghz Intel Xeon MP CPUs (IA32) 8 x 36Gb drives (15K rpm) Hyper threading enabled Hyper threading disabled RAID 10 configured (striped mirrors) 4Gb RAM 16Gb RAM Small data volume: 50 TPC-C 1 x 36Gb boot driver (10K rpm) 1 x 36Gb boot driver (10K rpm) warehouses, ~5.4Gb (4.7Gb data + 1 Gbit NIC (copper) 1 Gbit NIC (copper) 700Mb indexes) OS: SLES 9 SP3 OS: SLES 9 SP3 No I/O issues in benchmarking PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 8
  • 9. Scaling -- Results Transactions per second 200 150 8.0.7 100 8.1.2 50 0 1 2 4 8 12 16 20 24 28 32 CPUs TPS 1 2 4 8 12 16 20 24 28 32 Users 8.0.7 29.13 38.12 39.99 36.14 28.28 27.32 26.35 26.02 23.72 20.82 2 x CPUs 8.1.2 31.42 55.61 86.4 103.78 117.83 153.42 144.39 129.71 118.7 111.33 6 x CPUs PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 9
  • 10. Scaling – Three Issues Transactions per second 200 Software Hardware Architecture Architecture 150 8.0.7 Performance 100 vs. 8.1.2 Scalability 50 0 1 2 4 8 12 16 20 24 28 32 CPUs TPS 1 2 4 8 12 16 20 24 28 32 Users 8.0.7 29.13 38.12 39.99 36.14 28.28 27.32 26.35 26.02 23.72 20.82 2 x CPUs 8.1.2 31.42 55.61 86.4 103.78 117.83 153.42 144.39 129.71 118.7 111.33 6 x CPUs PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 10
  • 11. Performance vs. Scalability • Performance and Scalability are not the same thing – But are closely related – Must be REALLY careful comparing anecdotal reports • Performance comparable on equal configurations – In this data set, the only equal configuration is 1 CPU – 8.1.2 exhibits a small (~8%) improvement over 8.0.7 • Scalability measures throughput as resources are added – 8.1.2 uses incremental resources more effectively than 8.0.7 – At 16 CPUs, 8.1.2 is 5.6 times better than 8.0.7 • Fair to conclude that 8.1.2 scales effectively to 16 CPUs – At least on this application and platform combination PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 11
  • 12. Hardware Architecture • Cache invalidations – Handled quickly when on-cell ES7000/540 Cell • No visible degradation up to 8 CPUs RAM – Slower across cells Memory • Cross-cell interactions required, but slower Controller • Responsible for dip at 12 CPUs Shared Cache – Processing power unbalanced across cells (8 + 4) CPU CPU • Recovers by 16 CPUs CPU CPU – Processing power balance across cells (8 + 8) CPU CPU – Reduces chance of cache misses CPU CPU • Cross-cell cache invalidations increase quickly when data structures are poorly aligned with cache line size • Every large-scale computing platform, regardless of vendor, will have some sort of hardware related behavior that needs to be considered in performance and scalability work PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 12
  • 13. Software Architecture • Degradation above 16 CPUs – Continued decline implies systemic origin – Characteristic of software algorithms or data structures reaching inherent scalability limits • Causes unclear, but something important is wrong • Possibilities include – The Usual Suspects: lock mgmt, buffer mgmt, etc. – Lack of Repeatable Read isolation level • A TPC-C requirement • Elevation to Serializable hurts overall throughput – See To Do List (a later slide) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 13
  • 14. It’s not just TPCC4J! • Another industry standard, transactional Java benchmark – ES7000 8x – 8.0: 34 TPS – 8.1: 544 TPS • Different benchmark structure • Different hardware platform – No cache issues, CPUs on one cell • Even more dramatic improvement (16x!) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 14
  • 15. Other Useful Tooling • gprof (standard Linux utility) A Billion calls! FunctionCall2 pfree int4eq DLMoveToFront BufferGetBlockNumber enlargeStringInfo _bt_checkkeys appendBinaryStringInfo _bt_step _bt_compare AllocSetAlloc LWLockRelease int4le LWLockAcquire int4ge check_stack_depth MemoryContextAlloc MemoryContextAllocZeroAligned btint4cmp hash_search _bt_next AllocSetFree PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 15
  • 16. Other Useful Tooling • VTune – Intel low-level performance monitor • http://www.intel.com/cd/software/products/asmo-na/eng/index.htm – High frequency functions, by clock ticks HeapTupleSatisfiesSnapshotData?? • HeapTupleSatisfiesSnapshotData • LWLockAcquire • XLogInsert • PinBuffer • hash_search • oprofile – Results pending PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 16
  • 17. To Do List • In-line some high frequency functions • Speed up CRC algorithm by rearranging loop structure • Improve data structure alignment to memory pages and cache lines – Possibly dynamically based on processor id and settings • Re-type byte structures to words for newer Intel processors • Investigate efficiency of bitmaps on newer Intel processors • Residual spin-lock implementation issues • Reduce query plan discards (very adventurous!) • Improve statistics infrastructure and real-time monitoring PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 17
  • 18. Conclusions • Scalability dramatically improved in 8.1 release – Scales to 16 CPUs (on test environment) – Context switches down – Supported by multiple benchmarks • Minor single CPU performance improvement • Hardware architecture can affect scalability – Important consideration for high-capacity, high-throughput applications – Systems of all sizes moving to multi-core processor chips • Substantial scalability and performance challenges remain – TPC-C required feature content missing (repeatable read) – Algorithm scalability limits exist (especially > 16 CPUs) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 18