SlideShare a Scribd company logo
1 of 35
Object, Measure Thyself
Greg Opaczewski – Orbitz Worldwide
Michael Ducy – BMC Software
Open Source
• ERMA Project :
http://launchpad.net/erma
• Graphite Project :
http://launchpad.net/graphite
Complex Environment
$10.8 Billion in Gross Bookings in 2007
Myths of Instrumentation
• No Time For Instrumentation
• No Value ($) in Instrumentation
• Instrumentation Causes Bugs
Myth: No Time For
Instrumentation
ERMA
Extremely Reusable Monitoring API
TransactionMonitor monitor =
new TransactionMonitor(“HotelService.purchase”);
try {
response = hotelSupplier.reserve(hotel);
monitor.succeeded();
} catch (ServiceException e) {
monitor.failedDueTo(e);
throw e;
} finally {
monitor.done();
}
ERMA
Self-Instrumentation by:
• Hooks – Interceptors and Listeners
• Abstraction – Abstract the details away
from developers
• AOP – Aspect Oriented Programming
Frameworks - Hooks
• Spring Framework
Frameworks - Abstraction
Self-Instrumentation by:
• Aspect Oriented Programming (AOP)
<aop:config>
<aop:aspect id="transactionMonitorActionAspect"
ref="transactionMonitorActionAdvice">
<aop:pointcut id="transactionMonitorActionPointcut“
expression="target(org.springframework.webflow.execution.Action)
and args(context)"/>
<aop:around pointcut-ref="transactionMonitorActionPointcut“
method="invoke"/>
</aop:aspect>
</aop:config>
Myth: No Time For
Instrumentation
Myth: No Value ($) in
Instrumentation
Event Aggregation
Event Aggregation
Storage and Visualization: Graphite
Graphite
Graphite
Graphite Demo
Value to the Business
• Fixing Production Problems Fast
• Capacity Planning
• Business Product teams rely on ERMA
data
Myth: No Value ($) in
Instrumentation
Myth: Instrumentation Causes
Bugs
Avoid Boilerplate
@Monitored
public interface HotelService {
void purchase(Itinerary itinerary);
void cancel(Itinerary itinerary);
}
Avoid Boilerplate
public interface HotelService {
@Monitored(includeArguments = true)
void purchase(Itinerary itinerary);
void cancel(Itinerary itinerary);
}
Uncovers Bugs
• Allows you to base line across builds
• MASF and SPC
• Event Pattern Monitoring
Base Lining
• Compare present performance vs.
historical performance
• Validate testing via theoretical models
MASF and SPC
Need for Abstraction
abstraction
Webapp
Travel Business Services
Switching Services
Transaction Services
Suppliers
Event Pattern Monitoring
wl|httpIn.shop.search.air.redirect_searchFailure
wl|AirSearchExecuteAction.search
wl|com.orbitz.ojf.OJFClient.getInternal
wl|jiniOut_ShopService_createResultSet
tbs-shop|jiniIn_ShopService_createResultSet
tbs-shop|jiniOut_LowFareSearchService_execute
air-search|jiniIn_LowFareSearchService_execute
air-search|com.orbitz.afo.lib.SearchFilter
air-search|com.orbitz.afo.lib.LowFareSearchServiceImpl.execute
air-search|jiniOut_AirportLookupService_findLocationByIATACode
market|jiniIn_LocationService|DbPoolExhaustedException
Myth: Instrumentation Causes
Bugs
Final Thought
Performance monitoring is easy when the
objects practically measure themselves.
Thank You
• Special thanks to:
– Fellow Co-Authors – Matthew O’Keefe and
Stephen Mullins
– Neil Gunther – Mentoring and Candid Editorial
Review
– Lead Graphite Developer – Chris Davis
Websites
• ERMA Project :
http://launchpad.net/erma
• Graphite Project :
http://launchpad.net/graphite
?
michael@ducy.org
gopaczewski@orbitz.com

More Related Content

Similar to Object, measure thyself

HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
Yuji Kubota
 
Efficient use of NodeJS
Efficient use of NodeJSEfficient use of NodeJS
Efficient use of NodeJS
Yura Bogdanov
 

Similar to Object, measure thyself (20)

Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Developing of a high load java script framework
Developing of a high load java script frameworkDeveloping of a high load java script framework
Developing of a high load java script framework
 
Neotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon Wright
 
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checketts
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Reactive Programming in Java 8 with Rx-Java
Reactive Programming in Java 8 with Rx-JavaReactive Programming in Java 8 with Rx-Java
Reactive Programming in Java 8 with Rx-Java
 
Django In The Real World
Django In The Real WorldDjango In The Real World
Django In The Real World
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
OSMC 2021 | Robotmk: You don’t run IT – you deliver services!
OSMC 2021 | Robotmk: You don’t run IT – you deliver services!OSMC 2021 | Robotmk: You don’t run IT – you deliver services!
OSMC 2021 | Robotmk: You don’t run IT – you deliver services!
 
Google App Engine Java, Groovy and Gaelyk
Google App Engine Java, Groovy and GaelykGoogle App Engine Java, Groovy and Gaelyk
Google App Engine Java, Groovy and Gaelyk
 
Hacking Robots for Fun and Profit
Hacking Robots for Fun and ProfitHacking Robots for Fun and Profit
Hacking Robots for Fun and Profit
 
Hacking Robots for Fun and Profit
Hacking Robots for Fun and ProfitHacking Robots for Fun and Profit
Hacking Robots for Fun and Profit
 
AWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFxAWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFx
 
OGRE: Qt & OGRE for Multimedia Creation
OGRE: Qt & OGRE for Multimedia CreationOGRE: Qt & OGRE for Multimedia Creation
OGRE: Qt & OGRE for Multimedia Creation
 
Efficient use of NodeJS
Efficient use of NodeJSEfficient use of NodeJS
Efficient use of NodeJS
 
OWASP Poland Day 2018 - Andrzej Dyjak - Zero Trust Theorem
OWASP Poland Day 2018 - Andrzej Dyjak - Zero Trust TheoremOWASP Poland Day 2018 - Andrzej Dyjak - Zero Trust Theorem
OWASP Poland Day 2018 - Andrzej Dyjak - Zero Trust Theorem
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Ruby on Rails Penetration Testing
Ruby on Rails Penetration TestingRuby on Rails Penetration Testing
Ruby on Rails Penetration Testing
 

More from Michael Ducy

The Road to Hybrid Cloud is Paved with Automation
The Road to Hybrid Cloud is Paved with AutomationThe Road to Hybrid Cloud is Paved with Automation
The Road to Hybrid Cloud is Paved with Automation
Michael Ducy
 

More from Michael Ducy (20)

Automating Security Response with Serverless
Automating Security Response with ServerlessAutomating Security Response with Serverless
Automating Security Response with Serverless
 
Rethinking Open Source in the Age of Cloud
Rethinking Open Source in the Age of CloudRethinking Open Source in the Age of Cloud
Rethinking Open Source in the Age of Cloud
 
Open source security tools for Kubernetes.
Open source security tools for Kubernetes.Open source security tools for Kubernetes.
Open source security tools for Kubernetes.
 
Container Runtime Security with Falco
Container Runtime Security with FalcoContainer Runtime Security with Falco
Container Runtime Security with Falco
 
DevOps in a Cloud Native World
DevOps in a Cloud Native WorldDevOps in a Cloud Native World
DevOps in a Cloud Native World
 
Securing your Container Environment with Open Source
Securing your Container Environment with Open SourceSecuring your Container Environment with Open Source
Securing your Container Environment with Open Source
 
Sysdig Open Source Intro
Sysdig Open Source IntroSysdig Open Source Intro
Sysdig Open Source Intro
 
Monitoring & Securing Microservices in Kubernetes
Monitoring & Securing Microservices in KubernetesMonitoring & Securing Microservices in Kubernetes
Monitoring & Securing Microservices in Kubernetes
 
Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27
 
Principles of Monitoring Microservices
Principles of Monitoring MicroservicesPrinciples of Monitoring Microservices
Principles of Monitoring Microservices
 
Survey of Container Build Tools
Survey of Container Build ToolsSurvey of Container Build Tools
Survey of Container Build Tools
 
Monoliths, Myths, and Microservices - CfgMgmtCamp
Monoliths, Myths, and Microservices - CfgMgmtCampMonoliths, Myths, and Microservices - CfgMgmtCamp
Monoliths, Myths, and Microservices - CfgMgmtCamp
 
Monoliths, Myths, and Microservices
Monoliths, Myths, and MicroservicesMonoliths, Myths, and Microservices
Monoliths, Myths, and Microservices
 
Why Pipelines Matter
Why Pipelines MatterWhy Pipelines Matter
Why Pipelines Matter
 
The Future of Everything
The Future of EverythingThe Future of Everything
The Future of Everything
 
Improving Goat Production
Improving Goat ProductionImproving Goat Production
Improving Goat Production
 
The Road to Hybrid Cloud is Paved with Automation
The Road to Hybrid Cloud is Paved with AutomationThe Road to Hybrid Cloud is Paved with Automation
The Road to Hybrid Cloud is Paved with Automation
 
The Velocity of Bureaucracy
The Velocity of BureaucracyThe Velocity of Bureaucracy
The Velocity of Bureaucracy
 
The Goat and the Silo
The Goat and the SiloThe Goat and the Silo
The Goat and the Silo
 
Little Tech, Big Impact - Monktoberfest 2013
Little Tech, Big Impact - Monktoberfest 2013Little Tech, Big Impact - Monktoberfest 2013
Little Tech, Big Impact - Monktoberfest 2013
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 

Object, measure thyself

Editor's Notes

  1. MD/GAO Hello, I ’m Mike Ducy… Hello, I ’m Greg Opaczewski a Tech Lead at Orbitz WorldWide. I’m a part of development team named Operations Architecture. We develop site health and performance monitoring tools, primarily for operations teams. But also tools for development teams that need to know how their applications are performing in production.
  2. GO I ’m excited to say that two of the major technologies in the Orbitz monitoring platform are now open source software. I encourage you to check out the project sites on launchpad at the URLs listed on the screen. We welcome any feedback you might have for the projects. We will display these again at the end of the presentation as well
  3. MD SOA/Distrubuted architectures create problems for administration, support and development teams. Instrumentation of the various applications can provide valuable insights into how they interact and can ease the administration headaches. Orbitz Worldwide (OWW) operates dozens of applications running on hundreds of servers connected in a multi-layered Jini network. In this kind of environment it can be difficult to obtain consistent, uniform instrumentation at all application process boundaries. It is also not ideal to require each and every application development team to become experts at leveraging an instrumentation API and monitoring tools.
  4. GO The Orbitz technology platform is very large and the business has grown as well. Orbitz WorldWide operates websites around the world in over a dozen locales. These point of sale and internationalization variables can make service operations even more challenging due to the number of key metrics that must be monitored. In 2007 the sales of over $10 Billion in travel products were dependent on the health of our technology platform. Therefore, OWW has made substantial investments in technology to detect problems early and minimize mean-time-to-repair.
  5. MD It can be difficult for a technology organization to commit to provide the level of application instrumentation required to effectively monitor availability, reliability and performance. In this presentation we will examine several myths and explain how our technology overcame them. We ’re huge fans of the Mythbusters show on the Discovery channel, hopefully we have some fans in the audience as well.
  6. GO Some believe that it has to be a time consuming process to apply monitoring code. In many approaches , every method call has to be wrapped with instrumentation code. Additionally, standards need to be defined for how the instrumentation will be applied consistently across a system. This obviously requires additional effort on the part of the development teams as well as technical leaders responsible for ensuring the standards are being followed. At Orbitz we ’ve observed that instrumentation (and monitoring concerns in general) are often the last concerns of developers. Naturally a majority or all of the development cycle producing code for new features.
  7. We addressed this need to make the process of applying instrumentation simple by creating the Extremely Reusable Monitoring API (ERMA). ERMA consists of an API used for instrumenting Java applications and a library used to process the data produced by the instrumentation. This separation of concerns makes it easy for developers to apply the instrumentation without needing to be concerned with the details of how the data will be consumed.
  8. GO Monitor objects in ERMA are Plain Old Java Objects (POJOs). To instrument a transaction, you construct a TransactionMonitor. Upon construction a stopwatch is started, that is used to measure latency. The code to be monitored is surrounded with try/catch/finally blocks. If the business code executes without exception, succeeded is invoked on the TM. However, if an exception is caught, it is recorded in the failedDueTo method. In the finally block, done is invoked in order to stop the stopwatch and pass the Monitor to the MonitoringEngine for processing. This is the handoff point to the processors implemented in the ERMA library.
  9. GO So what I ’ve shown you on the previous slide is ERMA applied explicitly, wrapped around the business logic by a developer. The API is simple enough to use on its own. But we wanted to make the application of monitoring even easier. So we have implemented several techniques of self-instrumentation in order to achieve monitoring of the business objects with a minimal amount of effort.
  10. GO We use the Spring Framework throughout our system. Spring is a popular open-source framework in the Java development community. Spring MVC and Spring Web Flow (SWF) are used in the web application architecture. Both of these frameworks provide hooks that can be used for monitoring. For example, there is a HandlerInterceptor interface in Spring MVC that we ’ve implemented and configured such that each and every web request is intercepted. ERMA is applied to these requests in a consistent and reusable manner. Spring webflow acts as the controller – it allows you to define flows between components such as actions and views in a webapp. WebFlow provides the FlowExecutionListener. By implementing this listener interface, we provide detailed metrics on how users are interacting with these flows in production.
  11. GO Abstraction is another important technique we have used. Orbitz applications are networked together using Jini technology. Jini provides for dynamic service discovery and remote invocation. In a service oriented architecture, applications need a way to find out where the services they depend on are running. Jini provides this as well the ability to add and remove services from the network seamlessly. We created the Orbitz Jini Framework (OJF) in order to abstract away the details of our Jini service network from end developers. The abstraction layer contains a FilterChain facility that we have leveraged for monitoring. ERMA filters are executed both on the client and server side for each and every request. Because OJF is a shared library used consistently across our system, all developers get monitoring of remote method calls for free.
  12. GO In the absence of hooks in the form of APIs that can be leveraged for monitoring, Aspect Oriented Programming (AOP) is another good option for providing reusable monitoring code. Spring provides integration with the popular AspectJ AOP framework. We have implemented an ERMA aspect that applies monitoring to all Action component invocations with just a few lines of reusable XML configuration. Spring creates a dynamic proxy for each Action object once at startup, and overhead at runtime is minimal as just one extra method invocation through the proxy is involved.
  13. GO As a result of these techniques for applying reusable instrumentation, a developer at Orbitz needs to spend almost no time at all to get basic monitoring coverage. The frameworks that we use were instrumented by a small group of platform developers, many other development teams benefit without the need to spend any additional development time. So have we have BUSTED this myth of no time for instrumentation
  14. MD From a standard ROI perspective, instrumentation does not provide real dollars back for the money invested in it ’s development. The value provided is often in reduced downtime, better understanding of code performance, better understanding of code dependencies and interactions of systems, opportunities to increase application performance and enhance the customer experience. While from a long term perspective these enhancements can provide increased revenue, it is not as immediate as implementing something like a new feature with has a more immediate ROI.
  15. MD
  16. MD The ERMA Instrumented applications sends monitoring data back to the Event Processor engine. The data is sent by a background thread in the ERMA instrumented application which prevents latency from being introduced for the other incoming remote service calls. Since Event Processing is done outside of the instrumented application, this helps to reduce the introduction of latency in the instrumented application. The event processor aggregates and summarizes the various metrics, computing summary statistics (Average, Standard Deviation, % Fail, % Success, etc), and sends these metrics over to Graphite for storage and visualization. The event processor is also capable of sending SNMP alarms when Aggregated data points exceed certain thresholds (e.g. latency is high, or rate of failures is high).
  17. MD Graphite consists of several components. The 2 primary components are Carbon and the Web Application. The Carbon component is responsible for reading data into the system and storing it in fixed size database files (similar to RRD files). The web application then reads these files to graphically represent the data for the end user.
  18. MD The Graphite composer interface allows you to browse various metrics available for reporting in a hierarchical tree. When a metric is selected a graph of that metric ’s data is drawn in the composer interface. The user can manipulate the graph by selecting size, duration of the data to be graphed, as well as other elements.
  19. MD The Graphite Command Line Interface allows a user to draw graphs in individual windows. These windows can be arranged and sized within the browser window. The window layout can also be saved which allows a user to create “dashboard” of commonly used graphs.
  20. MD
  21. MD Value is not in the instrumentation itself, but in the data that the instrumentation provides. Gartner estimates that on average an hour of downtime can cost an organization $42,000 per hour. Instrumentation data can help reduce the length of outages by making it easier for Operators to locate the problem (via SNMP alarms), and through the tools used to visualize the data.
  22. MD Instrumentation provides a Return On Investment by maximizing the ROI of the applications that are monitored.
  23. GO Another myth that we ’d like to address is the belief that instrumentation only causes bugs. Boiler plate code often used to apply instrumentation makes code harder to read and maintain. This has a direct effect on developer productivity. It also gives developers an argument to not add the instrumentation at all. We use several techniques that allow our developers to avoid the need to write boilerplate code. We provide reusable, well tested instrumentation packaged in libraries and applied via hooks, abstraction and AOP as described previously.
  24. GO Another good option to avoid boilerplate code with ERMA is Annotations. This feature, supported with Java 5 and above, applies instrumentation at build time and requires no ERMA code to be mixed with business code. The example shown here will apply an ERMA TransactionMonitor to each method in this service.
  25. GO This example will wrap a TransactionMonitor around only the purchase method. Setting includeArguments to true will include method parameters in the monitor object as an attribute. The nice thing about this approach is how cleanly separated the business code is from the monitoring, it is simply declarative monitoring versus intrusive instrumentation
  26. MD Our use of ERMA has introduced very few bugs. In fact, far more bugs have been uncovered using the ERMA data.
  27. MD Instrumentation data allows to base line your current application performance against historical data. You can also use instrumentation data to build theoretical models to help verify that testing tools are correctly measuring application performance.
  28. MD Historical instrumentation data can be used to build models based on Multivariate Adaptive Statistical Filtering and Statistical Process Control. This allows you to determine if your current application is performing within historical bounds and if something has changed.
  29. GO For a large system there is a need to provide an abstraction for monitoring so that developers, operators and business analysts can all share the same language for describing system functionality. Example abstractions from our domain are “hotel search”, “air purchase”, “package selection”, etc. ERMA has some unique design features that enable detailed monitoring put in the context of these abstractions. ERMA assembles hierarchies of events transparently within its MonitoringEngine component. It does this by maintaining a stack of Monitors for each application thread. Whenever a new monitor is created during request processing, a parent-child relationship is introduced with the Monitor previously on top of the stack. At the completion of request processing, the result is a tree data structure can be analyzed to find event patterns. Our Jini framework passes monitoring data back and forth, allowing these event patterns to even span the boundaries of all applications involved in servicing a user request.. As a result, we can accelerate root cause analysis by delivering alarms to our operations teams that contain both the low level root cause of a problem and the impact to our customer.
  30. GO What you are looking at here is an example of an ERMA event pattern captured from an air search request in our system. We present these patterns to the operator in such a way that is obvious where an exception originated and how it bubbled up through the stack. Yellow represents any monitor that has recorded a failure and red represents the lowest level failure. We use this information to zero in on the application and component that is contributing most significantly to a site issue. An e.g. alarm that may be sent to our operations center based on this data would read “Air search is failing at 80% due to a maket application DbPoolExhaustedException.” So it is very clear as to the top-level impact (air search is failing) and points to the underlying issue (likely that there are no available database connections). Before we implemented this approach an alarm would be generated for every failure in this pattern and our operators would be left trying to figure out the bigger picture. The improved alarms help ensure proper development resources are engaged quickly when support teams are troubleshooting production issues. They also help to prioritize action on alarm conditions by making clear the impact to our customers. ERMA patterns also enable you to drill down into latency metrics in order to see which components are contributing the most to latency. We are working on a user interface that will make it easier to visualize this kind of data. For example, by generating dynamic UML sequence diagrams based on the runtime behavior of the system.
  31. GO So in our experience many more bugs are uncovered using the data produced by instrumentation than are caused by it. Bugs that would otherwise be difficult or impossible to diagnose without the instrumentation. So the myth that instrumentation only causes bugs is busted.
  32. MD ??? GO Pragmatic in the design of our monitoring platform / tools. We have acknowledged developers are focused on implementing new features and site improvements. So monitoring of all core metrics is already in place. These include JVM and machine-level statistics such as: cpu, memory and threads. Resource pools such as database connections, also network connections to external suppliers. Frameworks contain monitoring of our business services and detailed monitoring of every request into the web application. Lastly, we have invested in tools that allow us to get tremendous value out of the instrumentation. These tools translate detailed metric data into an improved customer experience.
  33. GAO I want to thank CMG for allowing us to share our story with everyone here. I too want to thank Neil Gunther again for all of his help in putting the paper and this presentation together
  34. MD Both ERMA and Graphite have been open sourced by Orbitz Worldwide and the teams welcome your feedback and contributions.