Performance Monitoring in the Cloud - Gluecon 2011

1,179 views

Published on

Talk at GlueCon 2011 on Performance Monitoring and the Cloud

Topics
What is performance monitoring
How does the cloud change things
What should developers do?
The ideal operations dashboard

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,179
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • 1:20\n
  • 1:20\n
  • 1:20\n
  • 1:20\n
  • 1:20\n
  • \n
  • web #s - tps, webapp load time, txn per CU, MTTF, etc\n2:20\n
  • web #s - tps, webapp load time, txn per CU, MTTF, etc\n2:20\n
  • web #s - tps, webapp load time, txn per CU, MTTF, etc\n2:20\n
  • web #s - tps, webapp load time, txn per CU, MTTF, etc\n2:20\n
  • monitor cause you want to know what’s going on, so you can do the right thing\n3:20\n
  • monitor cause you want to know what’s going on, so you can do the right thing\n3:20\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • fix somethings = geeking out, typing into shell windows, increase buffers, change heap size, nuke an outlier\ndesign something = add a caching layer, decompose an app into services\n4:30\n
  • \n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • This becomes one first-order purpose of performance monitoring - to inform and enable your auto-scaling process - and to keep a check on it\nincidentally you also don’t finetune for particular hardware - you launch an instance, run your app and then try it on a different one\nand you don’t need a giant headroom cushion\ntimescale compression happens at different levels\n6:20\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • was really hard to accidentally incur massive overage charges in traditional model - approvals, finance boards, etc in the way\n7:40\n
  • \n
  • example of payments svcs - they care about being able to send a payment online, having the money leave their acct and go to the other party - they don’t care about CPU on your servers, or bandwidth on your ISLs\nservices not servers\n9:40\n
  • example of payments svcs - they care about being able to send a payment online, having the money leave their acct and go to the other party - they don’t care about CPU on your servers, or bandwidth on your ISLs\nservices not servers\n9:40\n
  • example of payments svcs - they care about being able to send a payment online, having the money leave their acct and go to the other party - they don’t care about CPU on your servers, or bandwidth on your ISLs\nservices not servers\n9:40\n
  • example of payments svcs - they care about being able to send a payment online, having the money leave their acct and go to the other party - they don’t care about CPU on your servers, or bandwidth on your ISLs\nservices not servers\n9:40\n
  • example of payments svcs - they care about being able to send a payment online, having the money leave their acct and go to the other party - they don’t care about CPU on your servers, or bandwidth on your ISLs\nservices not servers\n9:40\n
  • 11\n
  • 11\n
  • 11\n
  • 11\n
  • 11\n
  • 11\n
  • app metrics = tps, q2q latency, etc - very useful, under your control\nserver metrics = almost useless with virtualization - vmstat/iostat no longer what they were\n11:42\n
  • 11:57\n
  • 11:57\n
  • 11:57\n
  • 11:57\n
  • 11:57\n
  • handling unanticipated conditions is an extra-credit exercise\n14:32\n
  • handling unanticipated conditions is an extra-credit exercise\n14:32\n
  • handling unanticipated conditions is an extra-credit exercise\n14:32\n
  • handling unanticipated conditions is an extra-credit exercise\n14:32\n
  • handling unanticipated conditions is an extra-credit exercise\n14:32\n
  • handling unanticipated conditions is an extra-credit exercise\n14:32\n
  • \n
  • under exceptional circumstances, other data can be shown\n16:10\n
  • under exceptional circumstances, other data can be shown\n16:10\n
  • under exceptional circumstances, other data can be shown\n16:10\n
  • 17\n
  • 17\n
  • 17\n
  • 17\n
  • 17\n
  • 17\n
  • 19:34\n
  • \n
  • \n
  • Performance Monitoring in the Cloud - Gluecon 2011

    1. 1. PerformanceMonitoring inthe Cloud Paul Guth Technical Operations
    2. 2. Agenda Performance and Monitoring and Performance Monitoring How THE CLOUD changes things What you should do (you = cloud developers) What I’d like to seeGluecon - 2011 Cloudscaling - Paul Guth 2
    3. 3. Agenda Performance and Monitoring and Performance Monitoring How THE CLOUD changes things What you should do (you = cloud developers) What I’d like to see We are NOT going to talk about using the cloud to do performance testingGluecon - 2011 Cloudscaling - Paul Guth 2
    4. 4. What Cloudscaling - Paul Guth 3
    5. 5. What is Performance?Gluecon - 2011 Cloudscaling - Paul Guth 4
    6. 6. What is Performance? Numbers speed - rate (184.7 mph) time per unit work (0-60 in 4.1s, 3:04.0min lightning lap) ef ciency (23 mpg) stability (1.00g skidpad) internals (550hp, 510 lb-ft) throughput (4 seats, 13.4 cu ft trunk)Gluecon - 2011 Cloudscaling - Paul Guth 4
    7. 7. What is Performance? Numbers speed - rate (184.7 mph) time per unit work (0-60 in 4.1s, 3:04.0min lightning lap) ef ciency (23 mpg) stability (1.00g skidpad) internals (550hp, 510 lb-ft) throughput (4 seats, 13.4 cu ft trunk) Numbers aren’t everything RWD, live rear axle, 56/44 f/r, airbags, LATCH, ABS - also matterGluecon - 2011 Cloudscaling - Paul Guth 4
    8. 8. What is Performance? Numbers speed - rate (184.7 mph) time per unit work (0-60 in 4.1s, 3:04.0min lightning lap) ef ciency (23 mpg) stability (1.00g skidpad) internals (550hp, 510 lb-ft) throughput (4 seats, 13.4 cu ft trunk) Numbers aren’t everything RWD, live rear axle, 56/44 f/r, airbags, LATCH, ABS - also matterGluecon - 2011 Cloudscaling - Paul Guth 4
    9. 9. What is Monitoring?Gluecon - 2011 Cloudscaling - Paul Guth 5
    10. 10. What is Monitoring? Observing system state through measurements (metrics)Gluecon - 2011 Cloudscaling - Paul Guth 5
    11. 11. What is Monitoring? Observing system state through measurements (metrics) Why? There’s more than one purpose. Detect problems for immediate action Oh noes! Response time just doubled! Provide data for diagnosing problems WTH changed in the last ten minutes? Inform decisions for long-term action What is the current constraint on total throughput? Forecasts for demand are Y by ChristmasGluecon - 2011 Cloudscaling - Paul Guth 5
    12. 12. What Do We Do In Old IT?Gluecon - 2011 Cloudscaling - Paul Guth 6
    13. 13. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include:Gluecon - 2011 Cloudscaling - Paul Guth 6
    14. 14. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacityGluecon - 2011 Cloudscaling - Paul Guth 6
    15. 15. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacity Turn off featuresGluecon - 2011 Cloudscaling - Paul Guth 6
    16. 16. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacity Turn off features Throttle incoming demandGluecon - 2011 Cloudscaling - Paul Guth 6
    17. 17. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacity Turn off features Throttle incoming demand “Fix something”Gluecon - 2011 Cloudscaling - Paul Guth 6
    18. 18. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacity Turn off features Throttle incoming demand “Fix something” Longer term actions include:Gluecon - 2011 Cloudscaling - Paul Guth 6
    19. 19. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacity Turn off features Throttle incoming demand “Fix something” Longer term actions include: Deploying new infrastructure (or removing unneeded)Gluecon - 2011 Cloudscaling - Paul Guth 6
    20. 20. What Do We Do In Old IT? Immediate actions triggered by performance monitoring include: Activate standby capacity Turn off features Throttle incoming demand “Fix something” Longer term actions include: Deploying new infrastructure (or removing unneeded) “Design something”Gluecon - 2011 Cloudscaling - Paul Guth 6
    21. 21. What Do We Do In Old IT? Common Theme: CapacityGluecon - 2011 Cloudscaling - Paul Guth 6
    22. 22. The Cloud Cloudscaling - Paul Guth 7
    23. 23. Something Cloudy This Way Comes 8Gluecon - 2011 Cloudscaling - Paul Guth
    24. 24. Something Cloudy This Way Comes Traditional IT Adding capacity is: • Expensive • Has a long lead time • Non-reversible • Cheaper when done in large batches • Requires capex outlay 8Gluecon - 2011 Cloudscaling - Paul Guth
    25. 25. Something Cloudy This Way Comes Traditional IT The Cloud Adding capacity is: Adding capacity is: • Expensive • Low marginal cost • Has a long lead time • Quick • Non-reversible • Reversible • Cheaper when done in • Same price in small large batches batches • Requires capex outlay • All opex 8Gluecon - 2011 Cloudscaling - Paul Guth
    26. 26. Something Cloudy This Way Comes Traditional IT The Cloud Adding capacity is: Adding capacity is: • Expensive • Low marginal cost • Has a long lead time • Quick • Non-reversible • Reversible • Cheaper when done in • Same price in small large batches batches • Requires capex outlay • All opex What this means is you can add capacity as an immediate activity, when it used to be a long-term activity. 8Gluecon - 2011 Cloudscaling - Paul Guth
    27. 27. Something Cloudy This Way Comes Traditional IT The Cloud Adding capacity is: Adding capacity is: • Expensive • Low marginal cost • Has a long lead time • Quick • Non-reversible • Reversible • Cheaper when done in • Same price in small large batches batches • Requires capex outlay • All opex What this means is you can add capacity as an immediate activity, when it used to be a long-term activity. In fact, you can automate it. 8Gluecon - 2011 Cloudscaling - Paul Guth
    28. 28. Paradise - The End?Gluecon - 2011 Cloudscaling - Paul Guth 9
    29. 29. Paradise - The End? Wait, all is not perfectGluecon - 2011 Cloudscaling - Paul Guth 9
    30. 30. Paradise - The End? Wait, all is not perfect Sometimes adding capacity is not the right answer Some problems autoscale to in nity Fixing ef ciency may be requiredGluecon - 2011 Cloudscaling - Paul Guth 9
    31. 31. Paradise - The End? Wait, all is not perfect Sometimes adding capacity is not the right answer Some problems autoscale to in nity Fixing ef ciency may be required Adding capacity is cheap but not free Spin up 10k new instances in a day and your controller will want an explanation Transparently balance cost vs bene tGluecon - 2011 Cloudscaling - Paul Guth 9
    32. 32. Paradise - The End? Wait, all is not perfect Sometimes adding capacity is not the right answer Some problems autoscale to in nity Fixing ef ciency may be required Adding capacity is cheap but not free Spin up 10k new instances in a day and your controller will want an explanation Transparently balance cost vs bene t You still need some bufferGluecon - 2011 Cloudscaling - Paul Guth 9
    33. 33. Paradise - The End? Wait, all is not perfect Sometimes adding capacity is not the right answer Some problems autoscale to in nity Fixing ef ciency may be required Adding capacity is cheap but not free Spin up 10k new instances in a day and your controller will want an explanation Transparently balance cost vs bene t You still need some buffer Your automation needs limits Just Say No to SkynetGluecon - 2011 Cloudscaling - Paul Guth 9
    34. 34. Paradise - The End? Wait, all is not perfect Sometimes adding capacity is not the right answer Some problems autoscale to in nity Fixing ef ciency may be required Adding capacity is cheap but not free Spin up 10k new instances in a day and your controller will want an explanation Transparently balance cost vs bene t You still need some buffer Your automation needs limits Just Say No to Skynet “Auto-scale me if you want to live!”Gluecon - 2011 Cloudscaling - Paul Guth 9
    35. 35. What’s a CloudDeveloper to Do? Cloudscaling - Paul Guth 10
    36. 36. What Do You Do?Gluecon - 2011 Cloudscaling - Paul Guth 11
    37. 37. What Do You Do? What to measure? iops, cpu util, memfree, queue latency? USELESS! (*)Gluecon - 2011 Cloudscaling - Paul Guth 11
    38. 38. What Do You Do? What to measure? iops, cpu util, memfree, queue latency? USELESS! (*) First measure what your customers care about What do they pay you for? Response time, load time, functional correctnessGluecon - 2011 Cloudscaling - Paul Guth 11
    39. 39. What Do You Do? What to measure? iops, cpu util, memfree, queue latency? USELESS! (*) First measure what your customers care about What do they pay you for? Response time, load time, functional correctness Monitor services (from customer perspective), not serversGluecon - 2011 Cloudscaling - Paul Guth 11
    40. 40. What Do You Do? What to measure? iops, cpu util, memfree, queue latency? USELESS! (*) First measure what your customers care about What do they pay you for? Response time, load time, functional correctness Monitor services (from customer perspective), not servers Monitor your cost as well (COGS) - costs more variable nowGluecon - 2011 Cloudscaling - Paul Guth 11
    41. 41. What Do You Do? What to measure? iops, cpu util, memfree, queue latency? USELESS! (*) First measure what your customers care about What do they pay you for? Response time, load time, functional correctness Monitor services (from customer perspective), not servers Monitor your cost as well (COGS) - costs more variable now (*) NOTE: Not actually uselessGluecon - 2011 Cloudscaling - Paul Guth 11
    42. 42. Thought ExperimentGluecon - 2011 Cloudscaling - Paul Guth 12
    43. 43. Thought Experiment 0300 SundayGluecon - 2011 Cloudscaling - Paul Guth 12
    44. 44. Thought Experiment 0300 Sunday CPU Utilization on your cluster increases to 100%Gluecon - 2011 Cloudscaling - Paul Guth 12
    45. 45. Thought Experiment 0300 Sunday CPU Utilization on your cluster increases to 100% External service monitoring shows no problemsGluecon - 2011 Cloudscaling - Paul Guth 12
    46. 46. Thought Experiment 0300 Sunday CPU Utilization on your cluster increases to 100% External service monitoring shows no problems What do yo do?Gluecon - 2011 Cloudscaling - Paul Guth 12
    47. 47. Thought Experiment 0300 Sunday CPU Utilization on your cluster increases to 100% External service monitoring shows no problems What do yo do? Go back to sleep!Gluecon - 2011 Cloudscaling - Paul Guth 12
    48. 48. Thought Experiment 0300 Sunday CPU Utilization on your cluster increases to 100% External service monitoring shows no problems What do yo do? Go back to sleep! Please investigate on Monday, you’re probably wasting moneyGluecon - 2011 Cloudscaling - Paul Guth 12
    49. 49. Monitoring Hierarchy Customer Services Application Metrics Server MetricsGluecon - 2011 Cloudscaling - Paul Guth 13
    50. 50. Other Things to DoGluecon - 2011 Cloudscaling - Paul Guth 14
    51. 51. Other Things to Do Record religiously all data around resource callsGluecon - 2011 Cloudscaling - Paul Guth 14
    52. 52. Other Things to Do Record religiously all data around resource calls Put in monitoring/metrics from the start!Gluecon - 2011 Cloudscaling - Paul Guth 14
    53. 53. Other Things to Do Record religiously all data around resource calls Put in monitoring/metrics from the start! Make it trivial (for devs) to record metrics, and incentivize them collectd, graphite, etc Build it into your framework/platform of choiceGluecon - 2011 Cloudscaling - Paul Guth 14
    54. 54. Other Things to Do Record religiously all data around resource calls Put in monitoring/metrics from the start! Make it trivial (for devs) to record metrics, and incentivize them collectd, graphite, etc Build it into your framework/platform of choice Make sure this monitoring scales out automatically when new instances appear and is retained when instances disappear As much of this monitoring as possible should be at the cluster, not instance levelGluecon - 2011 Cloudscaling - Paul Guth 14
    55. 55. Other Things to Do Record religiously all data around resource calls Put in monitoring/metrics from the start! Make it trivial (for devs) to record metrics, and incentivize them collectd, graphite, etc Build it into your framework/platform of choice Make sure this monitoring scales out automatically when new instances appear and is retained when instances disappear As much of this monitoring as possible should be at the cluster, not instance level Use much more care when guring out what to alert about - start with just customer services False positives can be killers Use data for diagnosis rst, then learn when to alertGluecon - 2011 Cloudscaling - Paul Guth 14
    56. 56. Other Other Things to DoGluecon - 2011 Cloudscaling - Paul Guth 15
    57. 57. Other Other Things to Do Treat dependencies as if they’re vitally important! Have model. Have API. Use model API Have tools to visualize the model Leverage it in your other toolsGluecon - 2011 Cloudscaling - Paul Guth 15
    58. 58. Other Other Things to Do Treat dependencies as if they’re vitally important! Have model. Have API. Use model API Have tools to visualize the model Leverage it in your other tools Automate handling all anticipated conditionsGluecon - 2011 Cloudscaling - Paul Guth 15
    59. 59. Other Other Things to Do Treat dependencies as if they’re vitally important! Have model. Have API. Use model API Have tools to visualize the model Leverage it in your other tools Automate handling all anticipated conditions Learn learn learnGluecon - 2011 Cloudscaling - Paul Guth 15
    60. 60. Other Other Things to Do Treat dependencies as if they’re vitally important! Have model. Have API. Use model API Have tools to visualize the model Leverage it in your other tools Automate handling all anticipated conditions Learn learn learn Measure customer experience!Gluecon - 2011 Cloudscaling - Paul Guth 15
    61. 61. Other Other Things to Do Treat dependencies as if they’re vitally important! Have model. Have API. Use model API Have tools to visualize the model Leverage it in your other tools Automate handling all anticipated conditions Learn learn learn Measure customer experience! Test in production (learn from production at least)Gluecon - 2011 Cloudscaling - Paul Guth 15
    62. 62. Other Other Things to Do Treat dependencies as if they’re vitally important! Have model. Have API. Use model API Have tools to visualize the model Leverage it in your other tools Automate handling all anticipated conditions Learn learn learn Measure customer experience! Test in production (learn from production at least)Gluecon - 2011 Cloudscaling - Paul Guth 15
    63. 63. What The World Needs Now Cloudscaling - Paul Guth 16
    64. 64. The Ideal Monitoring DashboardGluecon - 2011 Cloudscaling - Paul Guth 17
    65. 65. The Ideal Monitoring Dashboard OK $/sec (in): 1,000 $/sec (out): 1,000Gluecon - 2011 Cloudscaling - Paul Guth 17
    66. 66. DashboardsGluecon - 2011 Cloudscaling - Paul Guth 18
    67. 67. Dashboards Actionable data! Long lists of stuff that’s OK are useless “Event consoles” are useless - I care about current state, not what happened ve minutes ago Too much data is worse than no dataGluecon - 2011 Cloudscaling - Paul Guth 18
    68. 68. Dashboards Actionable data! Long lists of stuff that’s OK are useless “Event consoles” are useless - I care about current state, not what happened ve minutes ago Too much data is worse than no data At top-level, show only customer servicesGluecon - 2011 Cloudscaling - Paul Guth 18
    69. 69. Dashboards Actionable data! Long lists of stuff that’s OK are useless “Event consoles” are useless - I care about current state, not what happened ve minutes ago Too much data is worse than no data At top-level, show only customer services Drill-down to what you wantGluecon - 2011 Cloudscaling - Paul Guth 18
    70. 70. Dashboards Actionable data! Long lists of stuff that’s OK are useless “Event consoles” are useless - I care about current state, not what happened ve minutes ago Too much data is worse than no data At top-level, show only customer services Drill-down to what you want Filter easily to narrow inGluecon - 2011 Cloudscaling - Paul Guth 18
    71. 71. Dashboards Actionable data! Long lists of stuff that’s OK are useless “Event consoles” are useless - I care about current state, not what happened ve minutes ago Too much data is worse than no data At top-level, show only customer services Drill-down to what you want Filter easily to narrow in Save the trees, have a search boxGluecon - 2011 Cloudscaling - Paul Guth 18
    72. 72. Dashboards Actionable data! Long lists of stuff that’s OK are useless “Event consoles” are useless - I care about current state, not what happened ve minutes ago Too much data is worse than no data At top-level, show only customer services Drill-down to what you want Filter easily to narrow in Save the trees, have a search box Add arbitrary time-series data to any chart - including changelogs, business metricsGluecon - 2011 Cloudscaling - Paul Guth 18
    73. 73. One Size Fits One Use different interfaces for different purposes and/or different audiences vsGluecon - 2011 Cloudscaling - Paul Guth 19
    74. 74. Summary Performance Monitoring and Capacity Management are joined at the hip The Cloud enables automated, immediate capacity xes The price of automation is eternal vigilance Monitor your customer-facing services rst Make it so easy to collect metrics that you’ll have tons and tons of them Magic dashboard make Paul happy!Gluecon - 2011 Cloudscaling - Paul Guth 20
    75. 75. Thank You! g < at > cloudscaling d0t com @pguthebGluecon - 2011 Cloudscaling - Paul Guth 21

    ×