SlideShare a Scribd company logo
1 of 18
Download to read offline
Scaling PostgreSQL on SMP Architectures
Doug Tolbert, David Strong, Johney Tsai
{doug.tolbert, david.strong, johney.tsai}@unisys.com




                      PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 1
Unisys Cellular Multi-Processor Architecture
• Up to 32 CPU sockets
                                                                  C PU
   – In 4- or 8-CPU pods                                                  C PU
                                                       C PU
   – Dual-core supported                                           C PU
         Multi-core eventually            C PU
                                                                            T LC
                               C                      C PU
    – Dynamically partitionable PU          C PU
• 32-bit & 64-bit technologies                           T LC CPU
                                                      C PU          C PU
                                                                         8 GB                               T LC                C PU
• Multilayered caches                                          C PU      M SU
                                          C PU                                                                          C PU                  C PU
                                  C PU                C PU                                                                           C PU
• Large cross-bar memories                                                                T LC             C PU
                                  C PU
   – IA32: 64Gb (with PAE)                                   PC I                                   C PU               C PU
                                               T LC     PC I                                                 CP
   – EM64T, Opteron, Athlon: 256Tb                 PC I           8 GB                                      T LCU               C PU
   – Itanium: 1Pb                         PC I                    M SU                                                  C PU                  C PU
                                     PC I                                                                                            C PU
   – Theoretical: Multi-Pb      PC I                                                        T LC             C PU
                                                                                                     C PU                C PU
         Processor family dependent                         DIB          PC I
                                                                    PC I                                      C PU
• Up to 96 PCI Slots                                           PC I              8 GB                       DIB                             PC I
                                                                                 M SU                                                PC I
   – Lower on newer models                           PC I                                                                     PC I
                                              PC I                                                                   PC I
                                         PC I                                               DIB
• Linux: SLES, RedHat                                                                                           PC I
                                                                                                           PC I
                                                            DIB
• Windows: Data Center                                                           8 GB                                                     PC I
                                                                                                            DIB                      PC I
                                                                                 M SU
                                                                                                                              PC I
                                                                                            DIB                        PC I
                                                                                                                  PC I
                                                                                                           PC I



                                      PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                                           Page 2
Unisys Open And Secure Integrated Stack




           PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 3
PostgreSQL ASC Lab Engagement

• February 2005, Unisys Mission Viejo Facility
  – Simon Riggs, Mark Wong (OSDL) participated
  – PostgreSQL 8.0.1 measured
  – Summary at
    http://developer.osdl.org/markw/papers/PostgreSQL%20Visit%20Report%20External%20050513.pdf


• Interesting findings
  – Locking protocol and granularity
  – Context switch “storms” (~80K/sec)




                      PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                Page 4
PostgreSQL 8.1 Enhancements

• A few stand out from a performance perspective
  – Improve concurrent access to the shared buffer cache (Tom)
  – Allow index scans to use an intermediate in-memory bitmap (Tom)
  – Improve performance for partitioned tables (Simon)

• Context switches reduced
  – From ~80K/sec to ~30K/sec
• Other effects only seen in overall throughput improvements
• Undertook more complete characterization of 8.1 performance
  improvements




                     PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 5
TPC-C Benchmark
• Transaction Processing Performance Council (TPC)
   – Defines suite of benchmarks
   – Sole certifier of vendor-submitted benchmark results
   – Specs, results, more info: http://www.tpc.org
• TPC-C benchmark is a complete order-entry environment
   – Industry standard measure of DBMS OLTP performance across vendors
   – Reports transactions per minute (tpmC) and cost metrics
   – Mix of 5 transactions
       •   New Order      (selects, updates, inserts)
       •   Payment        (selects, updates, inserts)
       •   Order Status   (selects)
       •   Stock Level    (selects)
       •   Delivery       (selects, updates, deletes), interactive & deferred variants

• Data volumes, number of clients scaled by vendor’s tpmC goal
   – See spec for details
• Current (June 2006) result ranges
   – Best performance: 3,210,540 tpmC @ $5.07/tpmC
   – Lowest performance: 9,347 tpmC
   – Best price/performance: 38,622 tpmC @ $0.99/tpmC



                               PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 6
Ah, these are
                                                                                            NOT TPC-C
                                                                                             numbers!
  Test Application
• DBMS stressor application with well-known characteristics
• TPCC4J = “TPC-C For Java”
   –   Unisys-developed from TPC-C version 5.3 (April 2004)
   –   Runs on 1.4.2 JVM (5.0 JVM works)
   –   Small application footprint (~72K)
   –   Very few garbage collection cycles
   –   Client overhead low (7-9% for ~1000 users)
• Deviates from TPC-C in important ways
   –   Non-audited components and results
   –   Data entry screens not populated, only SQL statements run
   –   User think time eliminated
   –   Reports TPS, not tpmC
        • No, you can’t multiply by 60 to get tpmC!
   – No checkpoints during benchmark runs
   – Not run for a minimum of 8 hours
   – Data volumes not scaled

                              PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                   Page 7
Test Configuration
Client Driver: Unisys ES3040           PostgreSQL: Unisys ES7000/540                    Storage: EMC CX-200 External Disk

4 x 3.0Ghz Intel Xeon MP CPUs (IA32)   32 x 3.0Ghz Intel Xeon MP CPUs (IA32)            8 x 36Gb drives (15K rpm)
Hyper threading enabled                Hyper threading disabled                         RAID 10 configured (striped mirrors)
4Gb RAM                                16Gb RAM
                                                                                        Small data volume: 50 TPC-C
1 x 36Gb boot driver (10K rpm)         1 x 36Gb boot driver (10K rpm)
                                                                                        warehouses, ~5.4Gb (4.7Gb data +
1 Gbit NIC (copper)                    1 Gbit NIC (copper)
                                                                                        700Mb indexes)
OS: SLES 9 SP3                         OS: SLES 9 SP3
                                                                                        No I/O issues in benchmarking




                               PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                              Page 8
Scaling -- Results
 Transactions per second

                           200

                           150
                                                                                                                                8.0.7
                           100
                                                                                                                                8.1.2
                            50

                                   0
                                       1       2        4       8 12 16 20 24 28 32
                                                                       CPUs
                           TPS             1        2           4        8       12       16       20       24      28       32 Users
                           8.0.7       29.13    38.12       39.99    36.14    28.28    27.32    26.35    26.02   23.72    20.82      2 x CPUs
                           8.1.2       31.42    55.61        86.4   103.78   117.83   153.42   144.39   129.71   118.7   111.33      6 x CPUs




                                               PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                                      Page 9
Scaling – Three Issues
 Transactions per second

                           200                                                          Software
                                            Hardware                                   Architecture
                                           Architecture
                           150
                                                                                                                                8.0.7
             Performance
                           100
                  vs.
                                                                                                                                8.1.2
              Scalability
                            50

                                   0
                                       1       2        4       8 12 16 20 24 28 32
                                                                       CPUs
                           TPS             1        2           4        8       12       16       20       24      28       32 Users
                           8.0.7       29.13    38.12       39.99    36.14    28.28    27.32    26.35    26.02   23.72    20.82      2 x CPUs
                           8.1.2       31.42    55.61        86.4   103.78   117.83   153.42   144.39   129.71   118.7   111.33      6 x CPUs




                                               PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                                  Page 10
Performance vs. Scalability

• Performance and Scalability are not the same thing
  – But are closely related
  – Must be REALLY careful comparing anecdotal reports
• Performance comparable on equal configurations
  – In this data set, the only equal configuration is 1 CPU
  – 8.1.2 exhibits a small (~8%) improvement over 8.0.7
• Scalability measures throughput as resources are added
  – 8.1.2 uses incremental resources more effectively than 8.0.7
  – At 16 CPUs, 8.1.2 is 5.6 times better than 8.0.7
• Fair to conclude that 8.1.2 scales effectively to 16 CPUs
  – At least on this application and platform combination

                  PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 11
Hardware Architecture
• Cache invalidations
   – Handled quickly when on-cell                                                       ES7000/540 Cell

      • No visible degradation up to 8 CPUs                                                  RAM

   – Slower across cells                                                                   Memory
      • Cross-cell interactions required, but slower                                       Controller


      • Responsible for dip at 12 CPUs                                                      Shared
                                                                                            Cache
         – Processing power unbalanced across cells (8 + 4)
                                                                                         CPU       CPU
      • Recovers by 16 CPUs
                                                                                         CPU       CPU
         – Processing power balance across cells (8 + 8)
                                                                                         CPU       CPU
         – Reduces chance of cache misses
                                                                                         CPU       CPU

• Cross-cell cache invalidations increase quickly when data
  structures are poorly aligned with cache line size
• Every large-scale computing platform, regardless of vendor,
  will have some sort of hardware related behavior that needs to
  be considered in performance and scalability work


                          PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006           Page 12
Software Architecture

• Degradation above 16 CPUs
  – Continued decline implies systemic origin
  – Characteristic of software algorithms or data structures
    reaching inherent scalability limits
• Causes unclear, but something important is wrong
• Possibilities include
  – The Usual Suspects: lock mgmt, buffer mgmt, etc.
  – Lack of Repeatable Read isolation level
     • A TPC-C requirement
     • Elevation to Serializable hurts overall throughput
  – See To Do List (a later slide)


                   PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 13
It’s not just TPCC4J!

• Another industry standard, transactional Java benchmark
  – ES7000 8x
  – 8.0: 34 TPS
  – 8.1: 544 TPS
• Different benchmark structure
• Different hardware platform
  – No cache issues, CPUs on one cell
• Even more dramatic improvement (16x!)




                   PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 14
Other Useful Tooling

• gprof (standard Linux utility)
                          A
                       Billion
                       calls!




    FunctionCall2                     pfree
    int4eq                            DLMoveToFront
    BufferGetBlockNumber              enlargeStringInfo
    _bt_checkkeys                     appendBinaryStringInfo
    _bt_step                          _bt_compare
    AllocSetAlloc                     LWLockRelease
    int4le                            LWLockAcquire
    int4ge                            check_stack_depth
    MemoryContextAlloc                MemoryContextAllocZeroAligned
    btint4cmp                         hash_search
    _bt_next
    AllocSetFree




                      PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 15
Other Useful Tooling

• VTune
   – Intel low-level performance monitor
      • http://www.intel.com/cd/software/products/asmo-na/eng/index.htm
   – High frequency functions, by clock ticks                             HeapTupleSatisfiesSnapshotData??

      •   HeapTupleSatisfiesSnapshotData
      •   LWLockAcquire
      •   XLogInsert
      •   PinBuffer
      •   hash_search

• oprofile
   – Results pending



                    PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006                              Page 16
To Do List
• In-line some high frequency functions
• Speed up CRC algorithm by rearranging loop structure
• Improve data structure alignment to memory pages and cache
  lines
   – Possibly dynamically based on processor id and settings
• Re-type byte structures to words for newer Intel processors
• Investigate efficiency of bitmaps on newer Intel processors
• Residual spin-lock implementation issues
• Reduce query plan discards (very adventurous!)
• Improve statistics infrastructure and real-time monitoring



                    PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 17
Conclusions
• Scalability dramatically improved in 8.1 release
  – Scales to 16 CPUs (on test environment)
  – Context switches down
  – Supported by multiple benchmarks
• Minor single CPU performance improvement
• Hardware architecture can affect scalability
  – Important consideration for high-capacity, high-throughput
    applications
  – Systems of all sizes moving to multi-core processor chips
• Substantial scalability and performance challenges remain
  – TPC-C required feature content missing (repeatable read)
  – Algorithm scalability limits exist (especially > 16 CPUs)


                  PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006   Page 18

More Related Content

More from elliando dias

Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introductionelliando dias
 
FleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in ClojureFleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in Clojureelliando dias
 
Clojure and The Robot Apocalypse
Clojure and The Robot ApocalypseClojure and The Robot Apocalypse
Clojure and The Robot Apocalypseelliando dias
 

More from elliando dias (20)

Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introduction
 
FleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in ClojureFleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in Clojure
 
Clojure and The Robot Apocalypse
Clojure and The Robot ApocalypseClojure and The Robot Apocalypse
Clojure and The Robot Apocalypse
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Scaling PostgreSQL on SMP Architectures

  • 1. Scaling PostgreSQL on SMP Architectures Doug Tolbert, David Strong, Johney Tsai {doug.tolbert, david.strong, johney.tsai}@unisys.com PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 1
  • 2. Unisys Cellular Multi-Processor Architecture • Up to 32 CPU sockets C PU – In 4- or 8-CPU pods C PU C PU – Dual-core supported C PU Multi-core eventually C PU T LC C C PU – Dynamically partitionable PU C PU • 32-bit & 64-bit technologies T LC CPU C PU C PU 8 GB T LC C PU • Multilayered caches C PU M SU C PU C PU C PU C PU C PU C PU • Large cross-bar memories T LC C PU C PU – IA32: 64Gb (with PAE) PC I C PU C PU T LC PC I CP – EM64T, Opteron, Athlon: 256Tb PC I 8 GB T LCU C PU – Itanium: 1Pb PC I M SU C PU C PU PC I C PU – Theoretical: Multi-Pb PC I T LC C PU C PU C PU Processor family dependent DIB PC I PC I C PU • Up to 96 PCI Slots PC I 8 GB DIB PC I M SU PC I – Lower on newer models PC I PC I PC I PC I PC I DIB • Linux: SLES, RedHat PC I PC I DIB • Windows: Data Center 8 GB PC I DIB PC I M SU PC I DIB PC I PC I PC I PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 2
  • 3. Unisys Open And Secure Integrated Stack PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 3
  • 4. PostgreSQL ASC Lab Engagement • February 2005, Unisys Mission Viejo Facility – Simon Riggs, Mark Wong (OSDL) participated – PostgreSQL 8.0.1 measured – Summary at http://developer.osdl.org/markw/papers/PostgreSQL%20Visit%20Report%20External%20050513.pdf • Interesting findings – Locking protocol and granularity – Context switch “storms” (~80K/sec) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 4
  • 5. PostgreSQL 8.1 Enhancements • A few stand out from a performance perspective – Improve concurrent access to the shared buffer cache (Tom) – Allow index scans to use an intermediate in-memory bitmap (Tom) – Improve performance for partitioned tables (Simon) • Context switches reduced – From ~80K/sec to ~30K/sec • Other effects only seen in overall throughput improvements • Undertook more complete characterization of 8.1 performance improvements PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 5
  • 6. TPC-C Benchmark • Transaction Processing Performance Council (TPC) – Defines suite of benchmarks – Sole certifier of vendor-submitted benchmark results – Specs, results, more info: http://www.tpc.org • TPC-C benchmark is a complete order-entry environment – Industry standard measure of DBMS OLTP performance across vendors – Reports transactions per minute (tpmC) and cost metrics – Mix of 5 transactions • New Order (selects, updates, inserts) • Payment (selects, updates, inserts) • Order Status (selects) • Stock Level (selects) • Delivery (selects, updates, deletes), interactive & deferred variants • Data volumes, number of clients scaled by vendor’s tpmC goal – See spec for details • Current (June 2006) result ranges – Best performance: 3,210,540 tpmC @ $5.07/tpmC – Lowest performance: 9,347 tpmC – Best price/performance: 38,622 tpmC @ $0.99/tpmC PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 6
  • 7. Ah, these are NOT TPC-C numbers! Test Application • DBMS stressor application with well-known characteristics • TPCC4J = “TPC-C For Java” – Unisys-developed from TPC-C version 5.3 (April 2004) – Runs on 1.4.2 JVM (5.0 JVM works) – Small application footprint (~72K) – Very few garbage collection cycles – Client overhead low (7-9% for ~1000 users) • Deviates from TPC-C in important ways – Non-audited components and results – Data entry screens not populated, only SQL statements run – User think time eliminated – Reports TPS, not tpmC • No, you can’t multiply by 60 to get tpmC! – No checkpoints during benchmark runs – Not run for a minimum of 8 hours – Data volumes not scaled PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 7
  • 8. Test Configuration Client Driver: Unisys ES3040 PostgreSQL: Unisys ES7000/540 Storage: EMC CX-200 External Disk 4 x 3.0Ghz Intel Xeon MP CPUs (IA32) 32 x 3.0Ghz Intel Xeon MP CPUs (IA32) 8 x 36Gb drives (15K rpm) Hyper threading enabled Hyper threading disabled RAID 10 configured (striped mirrors) 4Gb RAM 16Gb RAM Small data volume: 50 TPC-C 1 x 36Gb boot driver (10K rpm) 1 x 36Gb boot driver (10K rpm) warehouses, ~5.4Gb (4.7Gb data + 1 Gbit NIC (copper) 1 Gbit NIC (copper) 700Mb indexes) OS: SLES 9 SP3 OS: SLES 9 SP3 No I/O issues in benchmarking PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 8
  • 9. Scaling -- Results Transactions per second 200 150 8.0.7 100 8.1.2 50 0 1 2 4 8 12 16 20 24 28 32 CPUs TPS 1 2 4 8 12 16 20 24 28 32 Users 8.0.7 29.13 38.12 39.99 36.14 28.28 27.32 26.35 26.02 23.72 20.82 2 x CPUs 8.1.2 31.42 55.61 86.4 103.78 117.83 153.42 144.39 129.71 118.7 111.33 6 x CPUs PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 9
  • 10. Scaling – Three Issues Transactions per second 200 Software Hardware Architecture Architecture 150 8.0.7 Performance 100 vs. 8.1.2 Scalability 50 0 1 2 4 8 12 16 20 24 28 32 CPUs TPS 1 2 4 8 12 16 20 24 28 32 Users 8.0.7 29.13 38.12 39.99 36.14 28.28 27.32 26.35 26.02 23.72 20.82 2 x CPUs 8.1.2 31.42 55.61 86.4 103.78 117.83 153.42 144.39 129.71 118.7 111.33 6 x CPUs PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 10
  • 11. Performance vs. Scalability • Performance and Scalability are not the same thing – But are closely related – Must be REALLY careful comparing anecdotal reports • Performance comparable on equal configurations – In this data set, the only equal configuration is 1 CPU – 8.1.2 exhibits a small (~8%) improvement over 8.0.7 • Scalability measures throughput as resources are added – 8.1.2 uses incremental resources more effectively than 8.0.7 – At 16 CPUs, 8.1.2 is 5.6 times better than 8.0.7 • Fair to conclude that 8.1.2 scales effectively to 16 CPUs – At least on this application and platform combination PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 11
  • 12. Hardware Architecture • Cache invalidations – Handled quickly when on-cell ES7000/540 Cell • No visible degradation up to 8 CPUs RAM – Slower across cells Memory • Cross-cell interactions required, but slower Controller • Responsible for dip at 12 CPUs Shared Cache – Processing power unbalanced across cells (8 + 4) CPU CPU • Recovers by 16 CPUs CPU CPU – Processing power balance across cells (8 + 8) CPU CPU – Reduces chance of cache misses CPU CPU • Cross-cell cache invalidations increase quickly when data structures are poorly aligned with cache line size • Every large-scale computing platform, regardless of vendor, will have some sort of hardware related behavior that needs to be considered in performance and scalability work PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 12
  • 13. Software Architecture • Degradation above 16 CPUs – Continued decline implies systemic origin – Characteristic of software algorithms or data structures reaching inherent scalability limits • Causes unclear, but something important is wrong • Possibilities include – The Usual Suspects: lock mgmt, buffer mgmt, etc. – Lack of Repeatable Read isolation level • A TPC-C requirement • Elevation to Serializable hurts overall throughput – See To Do List (a later slide) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 13
  • 14. It’s not just TPCC4J! • Another industry standard, transactional Java benchmark – ES7000 8x – 8.0: 34 TPS – 8.1: 544 TPS • Different benchmark structure • Different hardware platform – No cache issues, CPUs on one cell • Even more dramatic improvement (16x!) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 14
  • 15. Other Useful Tooling • gprof (standard Linux utility) A Billion calls! FunctionCall2 pfree int4eq DLMoveToFront BufferGetBlockNumber enlargeStringInfo _bt_checkkeys appendBinaryStringInfo _bt_step _bt_compare AllocSetAlloc LWLockRelease int4le LWLockAcquire int4ge check_stack_depth MemoryContextAlloc MemoryContextAllocZeroAligned btint4cmp hash_search _bt_next AllocSetFree PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 15
  • 16. Other Useful Tooling • VTune – Intel low-level performance monitor • http://www.intel.com/cd/software/products/asmo-na/eng/index.htm – High frequency functions, by clock ticks HeapTupleSatisfiesSnapshotData?? • HeapTupleSatisfiesSnapshotData • LWLockAcquire • XLogInsert • PinBuffer • hash_search • oprofile – Results pending PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 16
  • 17. To Do List • In-line some high frequency functions • Speed up CRC algorithm by rearranging loop structure • Improve data structure alignment to memory pages and cache lines – Possibly dynamically based on processor id and settings • Re-type byte structures to words for newer Intel processors • Investigate efficiency of bitmaps on newer Intel processors • Residual spin-lock implementation issues • Reduce query plan discards (very adventurous!) • Improve statistics infrastructure and real-time monitoring PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 17
  • 18. Conclusions • Scalability dramatically improved in 8.1 release – Scales to 16 CPUs (on test environment) – Context switches down – Supported by multiple benchmarks • Minor single CPU performance improvement • Hardware architecture can affect scalability – Important consideration for high-capacity, high-throughput applications – Systems of all sizes moving to multi-core processor chips • Substantial scalability and performance challenges remain – TPC-C required feature content missing (repeatable read) – Algorithm scalability limits exist (especially > 16 CPUs) PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 18