Performance
Instrumentation
beyond what
you do now

Cary Millsap
cary.millsap@method-r.com

Percona Performance Conference...
Introductions




                2
Cary Millsap

     carymillsap.blogspot.com

     cary_millsap



                                 3
1986

1989




1999




2008




       4
1986

              1989



 Software
 Developer
              1999
and

Performance
Analyst
              2008




      ...
5
Method R Corporation
http://method-r.com




                       6
What we do at Method R Corporation…


• Write code for you
• Troubleshoot performance problems
• Teach you how to do what ...
Thinking clearly
about
performance




                   8
Performance is HARD




                      9
“Our users say that
  everything is slow, but I
don’t know where to begin.”



                              10
“Our users are complaining,
but all our dials are green.”



                                11
A story.




           12
In the beginning...


   (1989: Oracle 6.0.26)
                           13
“Tuning” was…




                14
bstat.sql
    ...
 estat.sql
report.txt

             15
16
V$PARAMETER       sar
             V$DB_OBJECT_CACHE
      ps             iostat
                   V$OPEN_CURSOR
      V$...
People looked for “bad
      numbers.”




                         17
Ineficiencies.




                   18
But how can you know what
causes a specific task to be
           slow?



                              19
20
21
It's
latches




          21
It's
          I/O
  It's
latches




                 21
It's
          I/O
  It's              It's
latches          always I/
                     O




                        ...
It's
          It's
                 bad SQL
          I/O
  It's              It's
latches          always I/
           ...
It's
          It's                  It's
                 bad SQL
                              always
          I/O
    ...
It's
        It's                  It's
               bad SQL
                            always
        I/O
            ...
It's
       It's                It's
              bad SQL
                         always
       I/O
                    ...
My problem…




              22
How can you possibly

know          that?



                       23
Reminded me of…




                  24
25
vailroger.googlepages.com/orionconstellation
You do see it...

    Right?


                   26
27
vailroger.googlepages.com/orionconstellation
27
vailroger.googlepages.com/orionconstellation
But who says
        that
is what you have to see?



                           28
29
29
Why not?




           30
Performance is hard.




                       31
A good pilot makes it look
easy.

                 —Van R. Millsap
                         1936–2004




                ...
Performance is EASY




                      33
How?




       34
It’s the

         user’s
       experience
              that matters.


                          35
36
A user’s performance experience
   consists of two elements…




                                  37
1.   a task
2.    time

              38
Task




       39
The things we used to “computerize”…
tasks.
http://olathe.lib.ks.us/images/Image/Computer%20User.jpg




                 ...
A task is a business unit of work.


• Post to the General Ledger
• Enter an order
• Look up a book by author




        ...
Tasks can nest.




                            Posting



                  PO   AP   AR        …   FA




              ...
Tasks can nest.


• Print Addresses is a task




                                        Posting



                     ...
Tasks can nest.


• Print Addresses is a task
• Print Address #42 is a
  (sub)task



                                    ...
Tasks can nest.


• Print Addresses is a task
• Print Address #42 is a
  (sub)task



                                    ...
Tasks can nest.


• Print Addresses is a task
• Print Address #42 is a
  (sub)task

• Often, a program is a task
         ...
Tasks can nest.


• Print Addresses is a task
• Print Address #42 is a
  (sub)task

• Often, a program is a task
• Often, ...
it.
Tasks are




   Business people don’t care
   about the “system” except
 through execution of the tasks
  that make u...
it.
Tasks are




      Tasks     are what
       system owners care
             about.

                            44
Time




       45
time.
Performance is about




                               46
How fast: “Daddy, can your car go 500
miles?”
He meant “500 miles per hour.”
To talk about performance (speed), you have t...
Two ways to measure
  performance…




                      48
49
tasks per time




                 49
tasks per time
  (that’s throughput)




                        49
tasks per time
  (that’s throughput)




                        49
tasks per time
  (that’s throughput)




time per task



                        49
tasks per time
   (that’s throughput)




time per task
  (that’s response time)




                           49
Throughput and response time…




                                50
Throughput and response time…


• Throughput (X)
 – The tasks-per-time way
 – Number of task executions completed in a giv...
Throughput and response time…


• Throughput (X)
 – The tasks-per-time way
 – Number of task executions completed in a giv...
Throughput and response time…


• Throughput (X)
  – The tasks-per-time way
  – Number of task executions completed in a g...
51
X = 1/R




          51
X = 1/R




          51
X = 1/R

 (kind of)




             51
Average throughput is the inverse of average response
time.




                                                    52
Average throughput is the inverse of average response
time.




               X = 1,000 txn/sec?




                    ...
Average throughput is the inverse of average response
time.




               X = 1,000 txn/sec?

 Then R = (1 sec)/(1,00...
53
…Adding load to create
  higher throughput
changes     response time.



                             53
…Which leads to a whole ’nother conversation I’d love
         to have with you some other time.




                     ...
Sequence Diagram




                   55
A simple way to view response time is with
a UML sequence diagram.




                                 RA




http://www....
More complicated systems have nested levels of
suppliers and consumers.




                            RA       RB




ht...
The tiers represent the way your system is
constructed.




                         RUser




http://www.websequencediagr...
This sequence diagram shows the complicated
interactions among consumers and suppliers.




                   RUser




h...
The sequence diagram is a
       conceptual
good                   tool.



                               60
But when you need to analyze thousands of calls,
you need something else.




                                            ...
Profile




         62
A profile is a complete account of a task’s response
time.

  Response time # Calls R/call           Call name
    (seconds...
You’ve done this before,
   if you’ve ever used…
              gcc –pg …; gprof …
      java –prof …; java ProfilerViewer …...
Profile


• Full account of response time   • Contributions as %R
  – Spanning (sum ≮ R)            • Duration per call
   ...
Response Time




                66
To optimize throughput, you
           response
must analyze

         time.

                              67
(Proof)




          68
(Proof)




You cannot optimize X for a task that’s ineficient.




                                                      ...
(Proof)




You cannot optimize X for a task that’s ineficient.




                                                      ...
(Proof)




   You cannot optimize X for a task that’s ineficient.

You cannot measure a task’s eficiency without measurin...
(Proof)




   You cannot optimize X for a task that’s ineficient.

You cannot measure a task’s eficiency without measurin...
(Proof)




   You cannot optimize X for a task that’s ineficient.

You cannot measure a task’s eficiency without measurin...
The universal experience of
programmers who have been
using measurement tools has
been that their intuitive
guesses fail.
...
(Programmers aren’t very good at
guessing where their code spends time.)




                                      70
To optimize performance (throughput or response time),


                           profiles.
              need
people



...
Performance is EASY




                      72
Performance is easy if you can

stop guessing where your code is
                 slow.




                              ...
When you have profiles for task
 response times, performance

           cannot hide
problems
           from you.


      ...
Some surprising things I’ve
 learned by measuring R…




                              75
Disk I/O is often less
               important
          than people think.
http://carymillsap.blogspot.com/2009/04/cary-...
Common performance problems:




                               77
Common performance problems:



           CPU




                               77
Common performance problems:



           CPU




                               77
Common performance problems:



           CPU

      Network I/O



                               77
Common performance problems:



           CPU

      Network I/O



                               77
Common performance problems:



           CPU

      Network I/O

Software serialization
                               77
The point…




             78
Your problems have nothing to
 do with experiences I’ve had.


          measure.
     So


                             79
Finding what you
need to see




                   80
How are you supposed to
               profiles?
create these



                          81
You have to insist on seeing
where time goes for any task
you think is important.



                               82
To drill down, you need
  call-by-call data.
  (NOT data about aggregations of calls.)




                               ...
In Oracle, we do it with a feature called extended SQL
tracing.

• For Developers: Making
  Friends with the Oracle
  Data...
The stu you need…




                      85
Feature (attribute)         Oracle         MySQL   App tier
Task identification             y
Call-by-call coverage        ...
Recap




        87
Here’s what I hope
you take away today…




                       88
Performance is about
  time and tasks.




                       89
If you’re interested in performance, then

read Goldratt’s The Goal.




                                            90
91
Don’t guess; you’re probably wrong.




                                      91
Don’t guess; you’re probably wrong.



   Measure response time
before you optimize anything.




                        ...
Don’t guess; you’re probably wrong.



   Measure response time
before you optimize anything.


      Insist on it.

     ...
Performance is easy
        (and fun!)
when code measures its own
     time and tasks.


                             92
93
Upcoming SlideShare
Loading in …5
×

Performance Instrumentation Beyond What You Do Now

1,012 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,012
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Performance Instrumentation Beyond What You Do Now

  1. 1. Performance Instrumentation beyond what you do now Cary Millsap cary.millsap@method-r.com Percona Performance Conference Santa Clara, California 9:00a–9:55a Thursday 23 April 2009 1
  2. 2. Introductions 2
  3. 3. Cary Millsap carymillsap.blogspot.com cary_millsap 3
  4. 4. 1986 1989 1999 2008 4
  5. 5. 1986 1989 Software Developer 1999 and Performance Analyst 2008 4
  6. 6. 5
  7. 7. Method R Corporation http://method-r.com 6
  8. 8. What we do at Method R Corporation… • Write code for you • Troubleshoot performance problems • Teach you how to do what we do • Write software tools that make your work easier 7
  9. 9. Thinking clearly about performance 8
  10. 10. Performance is HARD 9
  11. 11. “Our users say that everything is slow, but I don’t know where to begin.” 10
  12. 12. “Our users are complaining, but all our dials are green.” 11
  13. 13. A story. 12
  14. 14. In the beginning... (1989: Oracle 6.0.26) 13
  15. 15. “Tuning” was… 14
  16. 16. bstat.sql ... estat.sql report.txt 15
  17. 17. 16
  18. 18. V$PARAMETER sar V$DB_OBJECT_CACHE ps iostat V$OPEN_CURSOR V$SESSTAT netstat V$FIXED_VIEW_DEFINITION V$LATCH nfsstat V$TRANSACTION V$PROCESS V$FILESTAT V$LOCK vmstat V$SQL V$SESSION V$SYSSTAT V$SQLTEXT V$SESS_IO V$LIBRARYCACHE V$ROLLSTAT V$ROWCACHE V$WAITSTAT pstat V$TIMER 16
  19. 19. People looked for “bad numbers.” 17
  20. 20. Ineficiencies. 18
  21. 21. But how can you know what causes a specific task to be slow? 19
  22. 22. 20
  23. 23. 21
  24. 24. It's latches 21
  25. 25. It's I/O It's latches 21
  26. 26. It's I/O It's It's latches always I/ O 21
  27. 27. It's It's bad SQL I/O It's It's latches always I/ O 21
  28. 28. It's It's It's bad SQL always I/O bad SQL It's It's latches always I/ O 21
  29. 29. It's It's It's bad SQL always I/O bad SQL It's It's latchesThere's always I/ not O enough memory 21
  30. 30. It's It's It's bad SQL always I/O bad SQL It's It's latchesThere's always I/ There's not O never enough enough memory memory 21
  31. 31. My problem… 22
  32. 32. How can you possibly know that? 23
  33. 33. Reminded me of… 24
  34. 34. 25 vailroger.googlepages.com/orionconstellation
  35. 35. You do see it... Right? 26
  36. 36. 27 vailroger.googlepages.com/orionconstellation
  37. 37. 27 vailroger.googlepages.com/orionconstellation
  38. 38. But who says that is what you have to see? 28
  39. 39. 29
  40. 40. 29
  41. 41. Why not? 30
  42. 42. Performance is hard. 31
  43. 43. A good pilot makes it look easy. —Van R. Millsap 1936–2004 32
  44. 44. Performance is EASY 33
  45. 45. How? 34
  46. 46. It’s the user’s experience that matters. 35
  47. 47. 36
  48. 48. A user’s performance experience consists of two elements… 37
  49. 49. 1. a task 2. time 38
  50. 50. Task 39
  51. 51. The things we used to “computerize”… tasks. http://olathe.lib.ks.us/images/Image/Computer%20User.jpg 40
  52. 52. A task is a business unit of work. • Post to the General Ledger • Enter an order • Look up a book by author 41
  53. 53. Tasks can nest. Posting PO AP AR … FA 42
  54. 54. Tasks can nest. • Print Addresses is a task Posting PO AP AR … FA 42
  55. 55. Tasks can nest. • Print Addresses is a task • Print Address #42 is a (sub)task Posting PO AP AR … FA 42
  56. 56. Tasks can nest. • Print Addresses is a task • Print Address #42 is a (sub)task Posting PO AP AR … FA 42
  57. 57. Tasks can nest. • Print Addresses is a task • Print Address #42 is a (sub)task • Often, a program is a task Posting PO AP AR … FA 42
  58. 58. Tasks can nest. • Print Addresses is a task • Print Address #42 is a (sub)task • Often, a program is a task • Often, a tiny part of a Posting program is a task PO AP AR … FA 42
  59. 59. it. Tasks are Business people don’t care about the “system” except through execution of the tasks that make up their business. 43
  60. 60. it. Tasks are Tasks are what system owners care about. 44
  61. 61. Time 45
  62. 62. time. Performance is about 46
  63. 63. How fast: “Daddy, can your car go 500 miles?” He meant “500 miles per hour.” To talk about performance (speed), you have to talk about time. 47
  64. 64. Two ways to measure performance… 48
  65. 65. 49
  66. 66. tasks per time 49
  67. 67. tasks per time (that’s throughput) 49
  68. 68. tasks per time (that’s throughput) 49
  69. 69. tasks per time (that’s throughput) time per task 49
  70. 70. tasks per time (that’s throughput) time per task (that’s response time) 49
  71. 71. Throughput and response time… 50
  72. 72. Throughput and response time… • Throughput (X) – The tasks-per-time way – Number of task executions completed in a given duration • “orders/second” 50
  73. 73. Throughput and response time… • Throughput (X) – The tasks-per-time way – Number of task executions completed in a given duration • “orders/second” 50
  74. 74. Throughput and response time… • Throughput (X) – The tasks-per-time way – Number of task executions completed in a given duration • “orders/second” • Response time (R) – The time-per-task way – Elapsed duration of an execution of a given task • “seconds/order” 50
  75. 75. 51
  76. 76. X = 1/R 51
  77. 77. X = 1/R 51
  78. 78. X = 1/R (kind of) 51
  79. 79. Average throughput is the inverse of average response time. 52
  80. 80. Average throughput is the inverse of average response time. X = 1,000 txn/sec? 52
  81. 81. Average throughput is the inverse of average response time. X = 1,000 txn/sec? Then R = (1 sec)/(1,000 txn) = .001 sec/txn But… 52
  82. 82. 53
  83. 83. …Adding load to create higher throughput changes response time. 53
  84. 84. …Which leads to a whole ’nother conversation I’d love to have with you some other time. 54
  85. 85. Sequence Diagram 55
  86. 86. A simple way to view response time is with a UML sequence diagram. RA http://www.websequencediagrams.com 56
  87. 87. More complicated systems have nested levels of suppliers and consumers. RA RB http://www.websequencediagrams.com 57
  88. 88. The tiers represent the way your system is constructed. RUser http://www.websequencediagrams.com 58
  89. 89. This sequence diagram shows the complicated interactions among consumers and suppliers. RUser http://www.websequencediagrams.com 59
  90. 90. The sequence diagram is a conceptual good tool. 60
  91. 91. But when you need to analyze thousands of calls, you need something else. 61
  92. 92. Profile 62
  93. 93. A profile is a complete account of a task’s response time. Response time # Calls R/call Call name (seconds) (seconds) 0.769 50.3% 5,003 0.000154 unaccounted-for between dbcalls 0.393 25.7% 5,010 0.000078 SQL*Net message from client 0.381 24.9% 5,013 0.000076 CPU service, execute calls 0.090 5.9% 11 0.008194 CPU service, prepare calls 0.027 1.8% 1 0.027396 log file sync 0.008 0.5% 5,010 0.000002 SQL*Net message to client 0.000 0.0% 9 0.000000 CPU service, fetch calls –0.138 –9.1% 5,031 –0.000028 unaccounted-for within dbcalls 1.530 100.0% Total 63
  94. 94. You’ve done this before, if you’ve ever used… gcc –pg …; gprof … java –prof …; java ProfilerViewer … perl –d:Dprof …; dprofpp … dbms_monitor.session_trace_enable(…); p5prof … 64
  95. 95. Profile • Full account of response time • Contributions as %R – Spanning (sum ≮ R) • Duration per call Mean, minimum, maximum, … – Non-overlapping (sum ≯ R) Skew • Sorted by descending R • Drill-down • Useful dimension Individual call level of detail – Flat profile Maybe even deeper – Call graph 65
  96. 96. Response Time 66
  97. 97. To optimize throughput, you response must analyze time. 67
  98. 98. (Proof) 68
  99. 99. (Proof) You cannot optimize X for a task that’s ineficient. 68
  100. 100. (Proof) You cannot optimize X for a task that’s ineficient. 68
  101. 101. (Proof) You cannot optimize X for a task that’s ineficient. You cannot measure a task’s eficiency without measuring its R. 68
  102. 102. (Proof) You cannot optimize X for a task that’s ineficient. You cannot measure a task’s eficiency without measuring its R. 68
  103. 103. (Proof) You cannot optimize X for a task that’s ineficient. You cannot measure a task’s eficiency without measuring its R. Therefore, to optimize X, you must first analyze R. 68
  104. 104. The universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. —Donald Knuth 69
  105. 105. (Programmers aren’t very good at guessing where their code spends time.) 70
  106. 106. To optimize performance (throughput or response time), profiles. need people 71
  107. 107. Performance is EASY 72
  108. 108. Performance is easy if you can stop guessing where your code is slow. 73
  109. 109. When you have profiles for task response times, performance cannot hide problems from you. 74
  110. 110. Some surprising things I’ve learned by measuring R… 75
  111. 111. Disk I/O is often less important than people think. http://carymillsap.blogspot.com/2009/04/cary-on-joel-on-ssd.html 76
  112. 112. Common performance problems: 77
  113. 113. Common performance problems: CPU 77
  114. 114. Common performance problems: CPU 77
  115. 115. Common performance problems: CPU Network I/O 77
  116. 116. Common performance problems: CPU Network I/O 77
  117. 117. Common performance problems: CPU Network I/O Software serialization 77
  118. 118. The point… 78
  119. 119. Your problems have nothing to do with experiences I’ve had. measure. So 79
  120. 120. Finding what you need to see 80
  121. 121. How are you supposed to profiles? create these 81
  122. 122. You have to insist on seeing where time goes for any task you think is important. 82
  123. 123. To drill down, you need call-by-call data. (NOT data about aggregations of calls.) 83
  124. 124. In Oracle, we do it with a feature called extended SQL tracing. • For Developers: Making Friends with the Oracle Database for Fast, Scalable Applications – Cary Millsap http://method-r.com/downloads/doc_details/10-for- developers-making-friends-with-the-oracle- database-cary-millsap • Optimizing Oracle Performance – Cary Millsap with Je Holt 84
  125. 125. The stu you need… 85
  126. 126. Feature (attribute) Oracle MySQL App tier Task identification y Call-by-call coverage 98%+ DB call begin sequence partly derivable DB call begin time partly derivable DB call end time y DB call context info y OS call begin sequence partly derivable OS call begin time derivable OS call end time y OS call context info y Call SQL context y Call CPU (sys mode) - Call CPU (usr mode) - Call CPU (total) y SQL execution plans y 86
  127. 127. Recap 87
  128. 128. Here’s what I hope you take away today… 88
  129. 129. Performance is about time and tasks. 89
  130. 130. If you’re interested in performance, then read Goldratt’s The Goal. 90
  131. 131. 91
  132. 132. Don’t guess; you’re probably wrong. 91
  133. 133. Don’t guess; you’re probably wrong. Measure response time before you optimize anything. 91
  134. 134. Don’t guess; you’re probably wrong. Measure response time before you optimize anything. Insist on it. 91
  135. 135. Performance is easy (and fun!) when code measures its own time and tasks. 92
  136. 136. 93

×