Keynote presentation from CMG Conference explaining the challenges in management and now monitoring and business visibility provided by modern APM tools is critical to business execution
7. 74%of users will leave a
website if it doesn’t load
in under 5 seconds>5 seconds
8. - Goldsmiths, App Attention Span
86%of users have uninstalled
at least one mobile app,
after just 1 use due to
performance problems
9. 81%of buyers will pay more for a
better customer experience…
- Forbes, Customer Experience Index
…but only1%of customers
feel their expectations are being met
- Forbes, Customer Experience Index
12. How fast is fast enough?
• Performance is key to a great user experience
– Under 100ms is perceived as reacting instantaneously
– A 100ms to 300ms delay is perceptible
– 1 second is about the limit for the user's flow of thought to stay
uninterrupted
• Users expect a site to load in 2 seconds
• After 3 seconds, 40% will abandon your site.
– 10 seconds is about the limit for keeping the user's attention
• Modern applications spend more time in the browser than on the
server-side
22. No end-to-end perspective No situational awareness
Long time to troubleshoot and resolve issues
Escalate
Escalate
Time
Resolution
War Room
L2 Troubleshoot
L1
Troubleshoot
“Checkout is Slow”
Reactive problem identification
27. What is APM?
• Real user experience
• Synthetic availability monitoring
• Code level visibility
• Transaction tracing
• Metric collection from associated components
• Analytics of collected data
• Other:
– You may have some APM functions in network tools, but they fail to
meet all criteria.
28. What is not APM?
• Server monitoring
– Application instance monitoring can provide some application
metrics, but none are detailed
• Network monitoring
• Storage monitoring
• Infrastructure specific metric collection
31. Monitor the end user experience
• Real User Monitoring vs Synthetic Monitoring
– Synthetic tests provide 24/7 assurance
– RUM provides insights into actual users
• Mobile device segmentation
• Unexpected behavior/trends
• Real User Monitoring
– Navigation Timing API
– Resource Timing API
– User Timing API
– Javascript Errors
32. Moving from reactive to proactive
• Automatic discovery of environment and application changes
– New APIs, transactions, services, clouds
• Leverage analytics to be smarter about using the data you
already have
– System Logs, Metrics from events and infrastructure stats
– Transactions with request parameters + User state from
cookies/sessions
• Performance monitoring isn’t just about the tech
– Visibility into the impact of business - alerting when revenue is down
35. Moving from reactive to proactive
• Resolving before the red = fixing in the yellow
• Intelligent anomaly detection across end-user, application,
database, server metrics
– Automatically calculates dynamic baselines for all of your metrics,
which, based on actual usage, define what is "normal" for each metric
– Smart alerting based on any deviation from the baselines
• Understand trends and patterns in failures - automatically learn
from the past
– Understand what are the most impactful issues to resolve
– Often times external services are the root cause with limited visibility
• Enforce SLAs
37. Leading companies invest in performance
• Etsy = Kale = Statsd + Skyline + Oculus (stats collection + anomaly
detection/correlation)
• Netflix = PCP + Vector + Servo + Atlas (dashboards, data collection,
root cause analysis)
• Twitter = Zipkin (distributed tracing)
38. Key takeaways
• Treat performance as a feature
– Create a performance budget with milestones, speed index, page speed
– Capacity plan and load test the server-side
– Optimize and performance test the client-side
• Monitor performance in development and production
– Instrument everything
– Measure the difference of every change
– Understand how failures impact performance
• Make monitoring critical and test in your continuous delivery process
• Connect the biz/dev/ops performance perspectives to align on
business impact metrics and KPIs
Would you pay an extra $1 to jump the queue
Or to improve processing time / performance?
These statistics highlight the magnitude of the growth opportunity before us. What if you just increased the percentage of consistently happy customers by 5%? For any company, large or small, that would be a game-changer in terms of revenue and profit
It’s clear – more emphasis will be on the experiences a company delivers to create a competitive advantage.
SOURCES:
AppDynamics App Attention Span
Read paragraph #2 - http://www.forbes.com/sites/christinecrandell/2013/01/21/customer-experience-is-it-the-chicken-or-egg/
http://www.walkerinfo.com/customers2020/
https://econsultancy.com/blog/10936-site-speed-case-studies-tips-and-tools-for-improving-your-conversion-rate/
https://econsultancy.com/blog/66121-improving-the-multichannel-customer-experience/
http://www.websitemagazine.com/images/blog/RadwareSiteSpeed.png
Uptime is critical. Performance is an advantage. Enterprises require fault-tolerance.
Amazon has performed experiments showing that for every 100ms delay, its sales decreased by 1%.
Yahoo found that a one-second additional server delay resulted in a 2.8% decrease in revenue and an almost two-second increase in time to click.
Microsoft has performed experiments showing that for every 100ms speed increase it was able to improve its revenue by 0.6% as a direct result.
Google has performed experiments showing that slowing down the search results page by between 100ms to 400ms impacts the number of searches done per user by −0.2% to −0.6%.
—
Gartner: How Performance Affects User Experience and Your Bottom Line, and What to Do About It
Published: 8 September 2014Analyst(s): Magnus Revang, Ray Valdes, Jonah Kowall
—
1 A. Beaujon, "Washington Post Announces Plans to Hire Bloggers, Redesign Site," Poynter., 29 January 2014.
2 Between 500 and 600 Gartner client interactions a year on user experience.
3 G. Linden, "Slides From My Talk at Stanford," Geeking With Greg, 4 December 2006.
4 M. Goldin, "Amazon Dominated Online Retail Sales in 2013," Mashable, 8 May 2014.
5 S. Stefanov, "Don't Make Me Wait! or Building High-Performance Web Applications," 19 August 2009.
6 R. Kohavi, A. Deng, R. Longbotham and Y. Xu, "Seven Rules of Thumb for Web Site Experimenters," To appear in KDD 2014.
7 J. Brutlag, "Speed Matters," Google Research Blog, 24 June 2009.8 D. Barton, "Decoding Google's Revenue," Southern Web, 22 July 2013.
9 A. Nassar, "Performance-Based Design — Linking Performance to Business Metrics," Velocity, the O'Reilly conference, 23 June 2009.
10 Blue Triangle Technologies relayed this information to Gartner.11 A. Bouch, A. Kuchinsky, N, Bhatti, "Quality Is in the Eye of the Beholder: Meeting Users'
Requirements for Internet Quality of Service," HP Laboratories Palo Alto, January 2000.
12 B.J. Fogg, T. Kameda, J. Boyd, J. Marshall, R. Sethi, M. Sockol, and T. Trowbridge, "Stanford- Makovsky Web Credibility Study 2002: Investigating What Makes Web Sites Credible Today," Stanford University, 2002.
13 Akamai14 J. Ramsay, "A Psychological Investigation of Long Retrieval Times on the World Wide Web,"
ScienceDirect, 23 June 1998.
15 Y. Skadberg and J. Kimmel, "Visitors' Flow Experience While Browsing a Web Site: Its Measurement, Contributing Factors and Consequences," ScienceDirect, 5 July 2003.
16 A. Jain and M. Tikir, "Is the Web Getting Faster?," Google Analytics Blog,15 April 2013.
17 HTTP Archive
18 HTTP Archive
19 "Enterprise Software: Why the User Experience Matters," Deloitte CIO Journal, 10 September 2012.
It is really about the users perception of performance. Slow checkout anyone? Users lose
faith quickly. It is even worse on mobile.
http://larahogan.me/design/
Akamai’s study shows us some very strong facts about percieved performance, like:
47% of people expect a web page to load in 2 seconds or less.
40% will abandon a web page if it takes more than 3 seconds to load.
52% of online shoppers claim that quick page loads are important for their loyalty to a site.
14% will start shopping at a different site if page loads are slow, 23% will simply stop shopping.
64% of shoppers who are dissatisfied with their site visit will go somewhere else to shop next time.
http://www.akamai.com/dl/reports/Site_Abandonment_Final_Report.pdf
http://timkadlec.com/2014/11/performance-budget-metrics/
http://danielmall.com/articles/how-to-make-a-performance-budget/
http://www.nngroup.com/articles/response-times-3-important-limits/
Card, S. K., Robertson, G. G., and Mackinlay, J. D. (1991). The information visualizer: An information workspace. Proc. ACM CHI'91 Conf. (New Orleans, LA, 28 April-2 May), 181-188.
Miller, R. B. (1968). Response time in man-computer conversational transactions. Proc. AFIPS Fall Joint Computer Conference Vol. 33, 267-277.
Myers, B. A. (1985). The importance of percent-done progress indicators for computer-human interfaces. Proc. ACM CHI'85 Conf. (San Francisco, CA, 14-18 April), 11-17.
In the early 2000s, application architectures were fairly simplistic consisting of a monolithic 3-tier architecture - with a user request resulting in a call to an application server and then a query to some backend database
Over time, the application architectures and operating environments have grown in complexity. While these shifts have been good for application developer productivity and agility, they have made modern applications more difficult to manage.
The shifts that have had the most impact on IT Operations & App Support teams include
SOA: Service Oriented Architecture
Cloud Capacity: Usage of Cloud Capacity from providers like Amazon EC2 and private clouds
Big Data: Surge in data volumes popularizing Big Data and NoSQL technologies such as Hadoop, Cassandra and MongoDB
Mobile: In addition, Businesses are looking at iOS and Android devices as new channels to market
Agile: And to complicate things even further, more frequent code release cycles with the adoption of agile development
[BUILD BUSINESS TRANSACTION IS THE ONLY CONSTANT]
All of these technologies have created the perfect storm for operations and development trying to manage the performance and availability of their application due to the high rate of change these teams are facing. To add to this challenge, legacy monitoring approaches weren’t built to support these environments.
Throughout this change and all future change. The only constant is the Business Transaction which is the main unit of measurement within AppDynamics
“And this is reality. This is a real customers application”
Either just show 1 or flick through 2 or 3 flow maps quickly and stop on one to talk about.
This is reality.
It's an actual customer application.
It's obviously a very complex environment, but this is what applications look like today
and what you are looking at is a map of all the transactions that are flowing through that app (or for some examples i would say it is just a single transactions)
in the past customers drew a diagram like this manually and it was out of date as soon as it was finished
Here we auto discover this environment by mapping the transactions as they flow through the application automatically
I'll tell you a bit more about how we do this in a moment (leaving some intrigue on the table)
DO NOT NAME CUSTOMERS HERE!
“And this is reality. This is a real customers application”
Either just show 1 or flick through 2 or 3 flow maps quickly and stop on one to talk about.
This is reality.
It's an actual customer application.
It's obviously a very complex environment, but this is what applications look like today
and what you are looking at is a map of all the transactions that are flowing through that app (or for some examples i would say it is just a single transactions)
in the past customers drew a diagram like this manually and it was out of date as soon as it was finished
Here we auto discover this environment by mapping the transactions as they flow through the application automatically
I'll tell you a bit more about how we do this in a moment (leaving some intrigue on the table)
DO NOT NAME CUSTOMERS HERE!
“And this is reality. This is a real customers application”
Either just show 1 or flick through 2 or 3 flow maps quickly and stop on one to talk about.
This is reality.
It's an actual customer application.
It's obviously a very complex environment, but this is what applications look like today
and what you are looking at is a map of all the transactions that are flowing through that app (or for some examples i would say it is just a single transactions)
in the past customers drew a diagram like this manually and it was out of date as soon as it was finished
Here we auto discover this environment by mapping the transactions as they flow through the application automatically
I'll tell you a bit more about how we do this in a moment (leaving some intrigue on the table)
DO NOT NAME CUSTOMERS HERE!
77% of the time at least 5+ people hours needed
Image : http://bit.ly/1FUSQl4
Image Courtesy of Docklandsboy: http://bit.ly/1tMnHcy
Too Many Graphs, Too Much Time Wasted
This typical NOC has a wall which looks like this, it's extremely inefficient since you are staring and loads of data, graphs, and other dashboards. Engineers love this stuff, but it's not digestible. People are inundated with alerts, emails, and pages. Cutting this down to what matters should be a focus, but finding the right tools and analytics are a challenge today. In most web-scale shops they build their own tools often cobbled together with very primitive underpinnings and capabilities. The problem with commercial tools is the cost begins to get too high for many organizations, while others invest in them.
Moving to monitoring systems instead of servers
Applying data science and statistics across operational information
New ways to explore complex system data
Bringing together metrics and events for a unified look at your system
Image : http://bit.ly/1FUUMKi
Customers of AppDynamics understand this impact, and in real-time
Here is an example dashboard taken from a US eCommerce customer of AppDynamics, highlighting the real time correlation between Application Errors, Response Time, and the Revenue generated by one of the critical Business Transactions
<CLICK>
At approximately 18:30 we can clearly see that there has been a significant event that has occurred
<CLICK>
The Application Response time has jumped from 100 ms up to 10.1seconds (100x increase)
<CLICK>
And at the same time we can see the revenues being generated dropped from $65k per minute down to $12k
This dashboard shows the real time business impact of poor performance enabling everyone within the organization to plan, troubleshoot and remediate in the most appropriate way.
AppDynamics is proven in enterprise production environments and can support applications with thousands of nodes or significant transaction throughput.
Here are some of our largest deployments.
ExactTarget was deployed across 5,000 servers in just 30 days, Orbitz also deployed in just 15 days which shows how easy AppDynamics is to scale across your organization.
https://codeascraft.com/2013/06/11/introducing-kale/
https://github.com/etsy/statsd
https://github.com/etsy/oculus
https://github.com/etsy/skyline
http://techblog.netflix.com/2015/04/introducing-vector-netflixs-on-host.html
Vector is an open source on-host performance monitoring framework which exposes hand picked high resolution system and application metrics to every engineer’s browser. Having the right metrics available on-demand and at a high resolution is key to understand how a system behaves and correctly troubleshoot performance issues.
Vector provides a simple way for users to visualize and analyze system and application-level metrics in near real-time. It leverages the battle tested open source system monitoring framework, Performance Co-Pilot (PCP), layering on top a flexible and user-friendly UI. - http://pcp.io/
http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflixs.html
http://techblog.netflix.com/2014/01/improving-netflixs-operational.html
Real-time Event Management System (SURO)
Event Stream Aggregation and Dashboard (Hysterix and Turbine)
Configuration Management using Asgard, Edda and MyEdda
Continuous Optimization using Conformity Monkey and Janitor Monkey
Netflix Ice: Cloud Spend and Usage Analytics (FinOps)