Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An In-Depth Look at SAP SQL Anywhere Performance Features

2,296 views

Published on

This presentation examines the internal mechanism behind SQL Anywhere’s self-management and self-tuning functionality. Topics covered will illustrate how the various performance features, such as server cache management, self-healing statistics and dynamic multiprogramming level, work in concert to provide a robust data management solution in zero-administration environments.

Published in: Technology
  • Be the first to comment

An In-Depth Look at SAP SQL Anywhere Performance Features

  1. 1. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Performance Features Jason Hinsperger Product Manager SAP
  2. 2. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important Query processing in SQL Anywhere SQL Anywhere performance Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  3. 3. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Design Goals of SQL Anywhere Ease of administration Comprehensive yet comprehensible tools Good out-of-the-box performance “Embeddability” features  self-tuning Many environments have no DBA’s Cross-platform support Interoperability A Holistic Approach to Autonomic Database Management
  4. 4. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Autonomic Database Management Self-Managing/Self-Configuring Self-Tuning/Self-Adapting Self-Healing Monitoring and correcting/advising on problems Self-protecting Ease of administration Goal: zero (manual) administration Design Automation Management Tools Index consultant, application profiler, …
  5. 5. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Why is Self-management Important? In a word: complexity Application development is becoming more complex: new development paradigms such as ORM toolkits, distributed computation with synchronization amongst database replicas, and so on Databases are now ubiquitous in IT because they solve a variety of difficult problems Yet most companies continue to own and manage a variety of different DBMS products, which increases administrative costs Ubiquity brings scale, in several ways To keep TCO constant, one must improve the productivity of each developer
  6. 6. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  7. 7. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Query Processing in SQL Anywhere QOG Build Execute Close Pre-Optimization ScanSQL Parse Semantic Transformations Prepare Parse Tree Cursor Join Enumeration DFO Build Open Execute Close Post-Optimization
  8. 8. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Performance SQL Anywhere is designed to get good performance with very little tuning • Many auto-tuning and self-management capabilities designed to adapt: Self-managing buffer pool: size and contents Dynamic tuning of multi-programming level Automatic statistics gathering, monitoring and healing Self-tuning query optimization Query optimization bypass for simple statements Intra-query parallelism Adaptive query execution IO intelligence for certain operations Cache warming on startup and to steady state
  9. 9. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Performance BUT… Auto-tuning and self-management capabilities are designed to adapt to: Hardware – CPU, I/O, memory of the machine Queries being requested Application logic and concurrency attributes SQL Anywhere will adapt to different deployment environments BUT: some adaptations may produce unacceptable performance Eg. Low memory execution strategies  Many things can be done at development time to improve performance of application and database interactions Capacity planning, Performance analysis and improvements, Scalability testing
  10. 10. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Reading Data – Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  11. 11. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 How Fast is SQL Anywhere? Typical conversation with regards to performance: C: “So, how fast is SQL Anywhere?” S: “Well, it depends on a variety of factors. Your database design, number of concurrent users, hardware, server cache contents, etc…” C: “I understand those things are important, but can’t you just tell me how many rows the server can fetch per second?” S: “Well, in this test, we can fetch between 80 and 30 million rows per second” C: “What? That makes no sense. My application can’t get anywhere near 30 million rows per second. Something must be broken. Can you fix it?” S: “Well, it depends …” …
  12. 12. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 How Many Rows Can We Read per Second? 8,161 31,272 125 5,083 1,111 5,616 0.08 0 0 0 1 10 100 1,000 10,000 100,000 Seq Scan cold Seq Scan Hot Non Clust IDX 1% cold Non Clust IDX 1% hot Clustered IDX 10% cold Clustered IDX 10% hot One row statement Thousands Rows Read Per Second on Z820 Server (256GB, 32 threads, SSD)
  13. 13. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 How Many Rows Can We Read per Second? (cont) 733 704 1 3,933 108 3,163 5.10 0 0 0 1 10 100 1,000 10,000 100,000 Seq Scan cold Seq Scan Hot Non Clust IDX 1% cold Non Clust IDX 1% hot Clustered IDX 10% cold Clustered IDX 10% hot One row statement Thousands Rows Read Per Second on T520 Laptop (8GB, 4 thread, HDD)
  14. 14. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Cold Cache Performance on Two Hosts I/O dominates cold cache performance On HDD, sequential is much faster Clustering of indexes is very important Buffer size affects how many pages are re-read SSD has much better performance Excellent throughput and random seeks CPU speed/number is important when I/O fast 81.9 0.1 492.0 3.5 55.4 0.00 200.00 400.00 600.00 Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10% T520: Laptop, 8GB, 4-thread, HDD 7.4 0.0 4.8 0.4 5.4 0.00 2.00 4.00 6.00 8.00 Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10% Z820: Server, 256GB, 32-thread, SSD
  15. 15. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Warm Cache Performance On Two Hosts When data is in cache, CPU is the major factor Parallelism is available but has overheads Clock speed is an important factor Buffer pool contents have huge impact 85.3 0.0 0.2 0.2 1.9 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10% T520: Laptop, 8GB, 4-thread, HDD 1.9 0.0 0.1 0.2 1.1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10% Z820: Server, 256GB, 32-thread, SSD
  16. 16. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Access Methods Full Table Scan Index Scan Index
  17. 17. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Deciding About Access Method Full Table Scan Reads all pages in a table => unnecessary I/O  Processes all rows => more CPU  But it benefits from sequential I/O  Index Scan Reads only required pages  Processes only required rows => Less CPU  Suffers from Random I/O  Needs to read index pages in addition to table pages  Might need to re-fetch the same table pages  When selectivity is large enough, it might need to read the entire table pages  0 10 20 30 40 50 60 0 20 40 60 80 100 Runtime Selectivity (%) Index Scan Full Table Scan Selectivity Break-even Point
  18. 18. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Factors For Choosing Between Index and Table Scan Selectivity Larger selectivity  Table Scan Small selectivity  Index Scan Row size (the number of rows per page) Larger row size  Shifts the break even point toward right (index scan performs better) Cache contents With more of the table in cache, more reads are satisfied from the cache Available memory Larger available memory  Shifts the break even point toward right (index scan performs better) What about I/O Parallelism?
  19. 19. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Parallel Index Scan in SAP SQL Anywhere Leaf Node Leaf Node Index
  20. 20. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 0 20 40 60 80 100 120 140 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 Time(Second) Selectivity % IS PIS32 FTS PFTS32 HDD – Parallel Index Scan Moves Break-Even Point Parallel Break-even point Non-parallel Break-even point
  21. 21. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 0 2 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 Time(Second) Selectivity % IS PIS32 FTS PFTS32 SSD – Break-Even Point Moves Further to Right Non-parallel Break-even point Parallel Break-even point
  22. 22. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Shift in Break Even Point NP-HDD P-HDD NP-SSD P-SSD RPP=1 0.55% 1.4% 8% 48% RPP=33 0.02% 0.05% 0.4% 2.1% RPP=500 0.0045% 0.005% 0.15% 0.5% On SSD the magnitude of shift in selectivity break-even point is significantly higher So the query optimizer needs to be aware of the impact of parallel I/O Otherwise, we will end up with non optimal execution plans up to ~20 times worse than optimal Takeaway: consider ALTER DATABASE CALIBRATE SERVER If you know the database will run on one configuration, consider calibrating If you have little control over disk configuration, default calibration does best we can
  23. 23. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Reading Data – Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  24. 24. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Elements of a Database System Client Process Server Process DB Statement, parameters Status, results Network Buffer Pool
  25. 25. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Anatomy of a single statement Client ServerNetwork Form SQL Input Prepare Prepare Describe Describe Execute Execute …Read results … Format Output Output Close Close Open
  26. 26. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Concurrent requests C1 C2 CN Server
  27. 27. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Scheduler
  28. 28. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Task Scheduling Database server has a worker-per-request A worker pool and a request queue Each worker picks and complete one request at a time. No guarantee that the same worker will service the same connection A small pool of workers executes requests Scheduler dynamically assigns work across workers - Cooperative multitasking Unassigned requests wait for an available thread Server supports dynamic intra-query parallelism Degree of parallelism varies based on available resources Pool size establishes the multiprogramming level SA Default: 20 Priorities can be set on connections Adjusts the number of time slices that any given request will get
  29. 29. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Worker-per-Request Architecture How to choose the size of the worker pool? A large worker pool: Increases the concurrency level of the server Increases contention on server resources Increases working set size of server A smaller worker pool: Under utilization of hardware resources Limit concurrency level of workload Possibility of a server hang due to no workers available to handle outstanding requests
  30. 30. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Dynamic Worker Pool Management Dynamically adjust the size of the worker pool Based on workload throughput monitoring and number of requests pending Benefits of dynamic MPL: One less parameter for DBAs to worry about Improve server throughput for different workloads Better handling of changes in workload transaction mix
  31. 31. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Reading Data – Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  32. 32. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Dynamic Memory Management A SQL Anywhere server will grow and shrink the buffer pool as necessary to accommodate both Database server load Physical memory requirements of other applications Enabled by default on all supported platforms User can set lower, upper bounds, initial size
  33. 33. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Dynamic Memory Management – Adjust Buffer Pool Size Basic idea: match buffer pool size to SA's working set as determined by the operating system plus the OS free pool • Feedback control loop Buffer Pool Governor Buffer Pool Manager New Buffer Pool Size Operating System Buffer Miss Rate OS Working Set Size Adjusted Memory Target Grow/Shrink Amount Amount of Free Physical Memory Database file sizes
  34. 34. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Memory Management Single heterogeneous buffer pool with few predefined limits Buffer pool comprises • Table and index data pages • Checkpoint log pages • Bitmap pages • Heap pages (data structures for query execution plans, optimization graphs, connection structures, stored procedures, triggers) • Free (available) pages All page frames are the same size Fully contained memory manager • Self managed memory foot-print
  35. 35. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Cache Warming Startup cache warming Record the pages referenced during the “startup period” Read these pages in on future startups Meant to quickly load data needed for the first few requests Steady state cache warming Record an approximation of the steady state of the cache After startup warming is done, in the background load up pages expected to be needed  Should be included in V17
  36. 36. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Cache contents estimation Every table and index maintains a count of pages currently in cache • This is incremented/decremented when pages are read/evicted The cost model estimates how many disk reads are needed • Estimates the number of distinct pages referenced by the plan • Estimates how many are likely already in the buffer pool • Estimates how many of those read multiple times will remain in the buffer pool Takeaway: Consider buffer pool contents when evaluating performance • Consider flushing or warming cache before experiments to stabilize state
  37. 37. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Reading Data – Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  38. 38. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Query Optimizer SA optimizes requests each time they are executed Takes into account server context Optimization process includes both heuristic and cost-based rewrites No hard limits – tested with 500 quantifiers in a single block Advantages Plans are responsive to server environment, buffer pool contents/size, data skew No need to administer ‘packages’ (pre-optimized SQL) Optimization effort adapts to expected query cost and benefit of optimization Simple statements bypass optimizer Cheap but complex statements use plan cache Optimizer considers multiple join enumeration approaches depending on expected benefit
  39. 39. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Bypassing the Query Optimizer Single-table queries without “complications” bypass the optimizer: If they have a specific form (select * from T where pk = value), use a single “bypass cache” plan If there is only one reasonable plan (WHERE clause specifies a unique row), bypass heuristic Otherwise, “bypass costed” compares alternative indexes and sequential scan A subset are “bypass costed simple” where we can skip trying predicate optimizations or semantic transforms If the bypass optimizer finds a plan > 5 seconds, it re-optimizes with full optimizer
  40. 40. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Plan Caching and Auto-Parameterization Access plans for queries in stored procedures/triggers/events are cached and reused for future executions Plans undergo a ‘training period’ where plan variance is determined If no variance (even without variable values), plan is cached and reused Query is periodically re-optimized on a logarithmic scale to ensure plan does not become sub-optimal Improvements in V16 and V17 avoid plan caching when it degrades performance Takeaway: Do not set max_plans_cached=0
  41. 41. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Adaptive Query Processing Alternative access plans can be executed if actual intermediate result sizes are poorly estimated Server switches to alternative plan automatically at run time Low-memory strategies used when buffer pool utilization is high Parallelize access plan when doing so is advantageous The degree of parallelism is determined based on cost during enumeration process Work is partitioned independently of worker pool size Plans are largely self-tuning with respect to degree of parallelism Prevents starvation of query fragments when the number of available workers is less than optimal for some period
  42. 42. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Reading Data – Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management
  43. 43. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Automatic Statistics Management Self-tuning column histograms On both base and temporary tables Statistics are updated on-the-fly automatically Join histograms built for intermediate result analysis during an optimization process Not persisted Server maintains persistent index statistics in real-time Index sampling during optimization If there is no histogram or it reports “no confidence” If there is an index with two or more predicates covered (better than combining single- column estimates)
  44. 44. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Column Histograms Updated in real-time with the results of predicate evaluation and update DML statements By default, statistics are computed during the execution of every DML request Histograms computed automatically on LOAD TABLE or CREATE INDEX statements Can be created/dropped explicitly if necessary Retained by default across unload/reload
  45. 45. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Motivation for Self Healing Statistics Quality of self tuned statistics can degrade arbitrarily Can get out of sync in the face of rollbacks Statistics generation looks at data once, out of order Goal is not to be perfect with self tuning Can get out of sync in the face of severe data skew Self-tuning may not be able to “keep up” on busy servers The system needs to monitor and correct itself
  46. 46. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Self Healing Statistics An internal system of background server processes Low overhead to the engine and query execution Statistics Governor Categorize and record estimation errors during QP Self-monitors “quality” of statistics as they are used Self-heals “poor” statistics Removes “bad” statistics
  47. 47. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 SQL Anywhere Solution Statistics Flusher Unloads unused statistics from memory Advises on the health of column statistics Advises on column statistics usage Advises on whether to create or drop statistics Runs every 30 minutes Statistics Cleaner Triggered by the flusher process to fix statistics that cannot be fixed otherwise Keeps track of the table IDs where bad statistics is found Runs with background priority
  48. 48. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Fixing Statistics Several methods used for automatically improving quality of statistics Piggyback off user queries Exploit access plans that see a large portion of the table Perform in-line statistics collection during query execution Replace or fix in-situ Recreate from indexes Fallback mechanism for piggybacking Use a shallow index scan to recreate histogram Perform a sampled table scan If the table column does not have an index, then we must scan the table to get the statistics Read a random sample of small number of table pages Detect pathological situations and prevent self-healing or, even, drop histograms
  49. 49. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Agenda Review SQL Anywhere design goals Why self-management is important SQL Anywhere performance Reading Data – Sequential scans vs index scans Multiprogramming level Cache management Adaptive query execution Statistics management Conclusion
  50. 50. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Conclusions How fast is SQL Anywhere? “It depends” is the right answer! The optimizer is co-ordinating changing data from multiple sources in real- time in order to provide/maintain the best performance it can at that point in time But it is not perfect! Specific Takeaways Consider ALTER DATABASE CALIBRATE SERVER Consider buffer pool contents when evaluating performance Do not set max_plans_cached=0
  51. 51. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 Questions? Jason Hinsperger jason.hinsperger@sap.com

×