Forecasting database performance


Published on

this is database performance forecast and capacity analysis for freshman.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Forecasting database performance

  1. 1. Forecasting Database Performance<br /> Du Shenglin <br />June 6th 2011<br />
  2. 2. What do Capacity do?<br />Manager ask <br /> “Can our database survive in next year or the new promotion program”?<br />Performance Tuning != Capacity planning<br />DBA is not totally equal to Capacity Analyst<br />
  3. 3. What do Capacity do?<br />How Much headroom do we still have for further increasing, how many days can we hold without add/upgrade hardware?<br />What’s the costs or impact to the site with adding or changing application code<br />What kind of platform/database/OS should we use for the new introduced applications<br />How to survive for the sudden performance deviations?<br />
  4. 4. Agenda<br />Resource<br />Model and Theory<br />Response Time Analysis<br />Steps to do Capacity Analysis<br />Case Study<br />
  5. 5. Resource <br />Site Level <br /> Machine, License, database, Storage, Manpower…<br />System Level <br /> CPU, IO, Memory, Disk, Network, Kernel Settings<br />Database Level <br /> Latch, Enqueue, Lock, Physical IO, Logical IO…<br />
  6. 6. Modeling - Making the Complex Simple <br />The world is much too complex for us to understand. <br />Mathematical Model<br />Queue theory<br />Line modeling<br />Regression analysis<br />Utilization <br />Baseline<br />Model is not perfect, not 100% precision<br />
  7. 7. The Linear modeling<br />
  8. 8. Linear Regression – Scalability<br />
  9. 9. The Response Time Curve<br />Response Time=Service Time + Queue Time<br />
  10. 10. Queuing Theory<br />Remain in the queue until its turn to be serviced <br />Common FIFO or priority queue<br />Queue length<br />Wait times and wait events<br />CPU queue and IO queue<br />
  11. 11. Response Time Drill Down<br />Response Time=Service Time + Queue Time<br />Rt =St + Qt<br />CPU<br />Queue<br />Network<br />Transfer<br />CPU<br />Usr+Sys<br />Memory<br />Queue<br />Memory<br />Access<br />Disk<br />Queue<br />Disk<br />Transfer<br />Network<br />Queue<br />
  12. 12. Utilization and Headroom<br />Headroom is available usable resources<br />-Total Capacity minus Peak Utilization and Margin<br />-Applies to CPU, RAM, Net, Disk and OS<br />-Can be very complex to determine, it depends<br />
  13. 13. CPU Capacity Measurements<br />CPU utilization is defined as busy time divided by elapsed time for each CPU<br />CPU time = CPU Queue + CPU usr+sys<br />Processes wait on a run queue, causing high load averages, then run on a CPU in user and system mode. <br />More CPUs reduce queue wait. Faster CPUs reduce usr+sys time.<br />
  14. 14. CPU Capacity Measurements<br />U=λ*St*M<br />CPU Utilization=Arrival rate* cpu_time_exec(us)/POWER(10,6)/number_of_CPU<br />CPU Utilization=buffer_gets* buffer_gets_time_per_exec(us)/POWER(10,6)/number_of_CPU<br />We can use this format for many cases<br />
  15. 15. The Response Time Curve - Multiple CPU<br />
  16. 16. IO Response Time Profile<br /><ul><li>RAM 60ns
  17. 17. HDD 5-10ms
  18. 18. SSD – 100 -500us
  19. 19. IO Service Time Includes 3 Components:</li></ul> - Access Time – Time It Takes To Move Heads To The Desired Track. <br /> - Rotation Time – Time It Takes To Locate The Desired Sector on The Track.<br /> -Transfer Time – Time It Takes To Read/Write The Data<br /><ul><li>Access Time Constitutes 70% of The Service Time</li></li></ul><li>IO Average Wait Time<br /><ul><li> db file sequential read – less than 15 ms
  20. 20. log file sync – less than 4ms</li></li></ul><li>Trend Capacity Measurements<br />SQL Execution Increase -> Traffic Growth<br />Buffer Gets Per Execution Fluctuates -> Normal Buffer Contentions<br />Buffer Gets Per Execution Increase Gradually -> SQL Efficiency Change<br />CPU Increase Only -> System Overhead Increase, Latch Spinning…etc<br />Find the commons during daily, weekly and yearly exexutions<br />
  21. 21. Daily Executions<br />
  22. 22. WoW Executions<br />
  23. 23. YOY Executions<br />
  24. 24. Data Collection<br />What kind of data to collect<br />When to collect the data<br />Where to put the data<br />How often to collect the data<br />How long to keep the data<br />How to interpret and present the data<br /> A Picture Is Worth A Thousand Words <br /> Script and automate is necessary<br />
  25. 25. Capacity Monitoring in database level<br />Peak executions <br />Sessions<br />Shared pool usage<br />LIO/exec<br />PIO/exec<br />CPU_time/LIO<br />Redo size<br />Free memory<br />Commits<br />Disk space usage<br />…..<br />
  26. 26. Risk Mitigation Strategies<br />Capacity Analyst is not only DBA<br />Tuning/fixing the issue - DBA or SA’s task?<br />Balancing existing workload<br />Upgrade and buy more CPU capacity <br />Split and Sharding<br />
  27. 27. Steps to take for Capacity Analysis<br />1. Determine the question<br />2. Gather workload data<br /> - What, how and how often<br />3. Characterize the workload data<br /> - Map, Interpret the data <br />4. Develop and use appropriate model<br /> - Present your data, Graph<br />5. Validate the forecast<br />6. Forecast<br />
  28. 28. Case Study #1 – Delete Performance<br />Questions:<br />How many data can we delete every day?<br />If the delete will catch up in no-peak time?<br />How many thread can we use to do delete?<br />What’s the main cost for delete job?<br />
  29. 29. Case Study #1 – Delete Performance<br />delete performance is decided by IO response time/ PIO_Per_row.<br />SNAP_TIME EXEC_PER_SEC LIO Per Exec PIO Per Exec Rows Per Exec<br />-------------------- ------------ ------------ ------------ -------------<br />2011/02/10 15:49 .04 10194.44 2323.02 1000<br />2011/02/10 16:04 .05 10200.82 2322 1000<br />2011/02/10 16:19 .06 10198.03 1967.9 999.9<br />2011/02/10 16:34 .06 10201.81 1985.98 1000<br />2011/02/10 16:49 .06 10194.11 2088.38 999.9<br />1/6m/(2323/1000)=1000000/6/2323=71 rows<br />The real case:<br />deletion started at: 2011-02-10 15:38:45<br />rows to delete: 1232177<br />rows deleted: 1232171<br />deletion ended at: 2011-02-10 20:47:51<br />1232171/((TO_DATE('2011-02-1020:47:51','YYYY-MM-DDHH24:MI:SS')-TO_DATE('2011-02-1015:38:45','YYYY-MM-DDHH24:MI:SS'))*24*60*60)<br />66.4386391<br />
  30. 30. Case Study #2 – Using MySQL<br />What capacity analysis Should we do to evaluate MySQL ?<br />MySQL version<br />Machine <br />OS<br />LOCAL DISK/SSD<br />Kernel configuration<br />MySQL Parameters<br />mysqlMySQL InnoDB setup best practice.doc<br />
  31. 31. Answers to what Capacity need to do<br />Measure the capacity of the site correctly and accurately.<br />Be able to predict the growth of site, identify future performance problem<br />Define what is balance and find a strategy to keep dynamic balance.<br />Impact analysis of system level change.<br />Identify dangerous performance deviations.<br />