Presented at That Conference 2017 (https://www.thatconference.com/sessions/session/11460).
Monitoring systems are like any other large code project: they need maintenance and the occasional refactor. Doing it right means knowing where you're going. Knowing where your going also means knowing how to get the project approved. Let me help you with that.
Dynatrace: The untouchables - the Dynatrace offering here and nowDynatrace
It's almost impossible to keep up with the rate of innovation that our global R&D teams deliver, and 2017 has been one for the record books. In this session a collection of our 'untouchable' tech geniuses are going to serve you up a rapid fire run-down on what's hot right now in Dynatrace.
Barbri barbri's journey from on-prem to cloud, featuring auto-remediation wi...Laura Stack
3+ years into our journey with Dynatrace AI has provided a path full of learnings. Moving our application environment from an onPrem data center to a cloud only environment taught us lessons we will share with you. Lessons around the importance of ease of use, sizing challenges and automation required to run, not only crawl, in the cloud. Auto remediation is no longer a buzz word but leveraging SALT it became reality for us.
Critical online success factors with dynatraceDynatraceANZ
This webinar discusses three critical success factors for online success: event readiness, third party readiness, and mobile channel readiness. It emphasizes the importance of load testing websites, understanding the impact of third party content, and optimizing websites and apps for mobile. The webinar provides examples and recommendations for monitoring and improving performance across these three factors.
Customer case - Dynatrace Monitoring RedefinedMichel Duruel
One of the largest Airline in the world chose Dynatrace, here is the customer case.
Including:
Vision and Goal / Challenges / Requirements / Why Dynatrace is Unique / ROI and TCO / Rollout Status / Solution Screenshots
Dynatrace redefined monitoring with AI powered 3rd Generation APM, User Experience Monitoring & Continuous Improvement, Cloud-native, Full Stack, Auto Everything, End-to-End, Easiest to Implement, Use and Maintain
This document discusses making automated full-stack monitoring a platform feature using BOSH. It recommends uploading the Dynatrace BOSH release, updating the runtime configuration, and deploying to automatically monitor all components and containers. This provides full-stack visibility into performance and availability across development, test, and production environments.
The Digital Experience Report: Best of the Web 2016Dynatrace
The Digital Revolution is transforming the way every industry engages with customers, making the user experience a critical competitive differentiator.
From retail to banking to media, business results and reputation depend on consistently delivering good performance. So who are the Best of the Web?
Dynatrace reveals the companies in retail, banking, insurance, brokerage, news media, air travel, and hotels who exceeded their peers in web performance, availability, and user experience in 2015.
• Learn which companies earned the distinction “Best of the Web”.
• Discover unique industry performance standards.
• Hear the best practices that enabled leaders to outpace their competitors.
• Analyze user experience trends – how did 2015 compare with prior results?
Learn what it takes to deliver the best digital performance to customers!
Cloud-Native Workshop New York- DynatraceVMware Tanzu
Dynatrace is an enterprise digital performance management company and leader in application performance monitoring. It supports over 8,000 customers including 75% of the Fortune 500. The document discusses how Dynatrace has evolved with application development from traditional applications to modern cloud-native and microservices architectures. It highlights how Dynatrace complements Pivotal Cloud Foundry by providing full-stack visibility, automated problem detection and resolution, and faster deployment of applications on PCF through its BOSH add-on.
Paypal, Barbri: Lost in the cloud? Top challenges facing CIOs in a cloud nati...Dynatrace
At the heart of this panel-lead discussion sits a new piece of global research that looks at the big issues faced by today's technical leaders. From growing IT complexity, silo monitoring, limited resources and the rapid pace of new tech adoption - this discussion will hit all the hot topics, but it will also serve up lessons and learnings for everyone.
Dynatrace: The untouchables - the Dynatrace offering here and nowDynatrace
It's almost impossible to keep up with the rate of innovation that our global R&D teams deliver, and 2017 has been one for the record books. In this session a collection of our 'untouchable' tech geniuses are going to serve you up a rapid fire run-down on what's hot right now in Dynatrace.
Barbri barbri's journey from on-prem to cloud, featuring auto-remediation wi...Laura Stack
3+ years into our journey with Dynatrace AI has provided a path full of learnings. Moving our application environment from an onPrem data center to a cloud only environment taught us lessons we will share with you. Lessons around the importance of ease of use, sizing challenges and automation required to run, not only crawl, in the cloud. Auto remediation is no longer a buzz word but leveraging SALT it became reality for us.
Critical online success factors with dynatraceDynatraceANZ
This webinar discusses three critical success factors for online success: event readiness, third party readiness, and mobile channel readiness. It emphasizes the importance of load testing websites, understanding the impact of third party content, and optimizing websites and apps for mobile. The webinar provides examples and recommendations for monitoring and improving performance across these three factors.
Customer case - Dynatrace Monitoring RedefinedMichel Duruel
One of the largest Airline in the world chose Dynatrace, here is the customer case.
Including:
Vision and Goal / Challenges / Requirements / Why Dynatrace is Unique / ROI and TCO / Rollout Status / Solution Screenshots
Dynatrace redefined monitoring with AI powered 3rd Generation APM, User Experience Monitoring & Continuous Improvement, Cloud-native, Full Stack, Auto Everything, End-to-End, Easiest to Implement, Use and Maintain
This document discusses making automated full-stack monitoring a platform feature using BOSH. It recommends uploading the Dynatrace BOSH release, updating the runtime configuration, and deploying to automatically monitor all components and containers. This provides full-stack visibility into performance and availability across development, test, and production environments.
The Digital Experience Report: Best of the Web 2016Dynatrace
The Digital Revolution is transforming the way every industry engages with customers, making the user experience a critical competitive differentiator.
From retail to banking to media, business results and reputation depend on consistently delivering good performance. So who are the Best of the Web?
Dynatrace reveals the companies in retail, banking, insurance, brokerage, news media, air travel, and hotels who exceeded their peers in web performance, availability, and user experience in 2015.
• Learn which companies earned the distinction “Best of the Web”.
• Discover unique industry performance standards.
• Hear the best practices that enabled leaders to outpace their competitors.
• Analyze user experience trends – how did 2015 compare with prior results?
Learn what it takes to deliver the best digital performance to customers!
Cloud-Native Workshop New York- DynatraceVMware Tanzu
Dynatrace is an enterprise digital performance management company and leader in application performance monitoring. It supports over 8,000 customers including 75% of the Fortune 500. The document discusses how Dynatrace has evolved with application development from traditional applications to modern cloud-native and microservices architectures. It highlights how Dynatrace complements Pivotal Cloud Foundry by providing full-stack visibility, automated problem detection and resolution, and faster deployment of applications on PCF through its BOSH add-on.
Paypal, Barbri: Lost in the cloud? Top challenges facing CIOs in a cloud nati...Dynatrace
At the heart of this panel-lead discussion sits a new piece of global research that looks at the big issues faced by today's technical leaders. From growing IT complexity, silo monitoring, limited resources and the rapid pace of new tech adoption - this discussion will hit all the hot topics, but it will also serve up lessons and learnings for everyone.
SAP: How SAP fully automates the provisioning and operations of its dynatrace...Dynatrace
SAP Cloud Platform is using Dynatrace Managed for internal monitoring of its infrastructure, services and SAP’s own applications all across the world. The multi-IaaS strategy of SAP Cloud Platform brings it beyond SAP’s own data-centers to multiple IaaS providers. SAP’s high standards in terms of governance and data privacy demand that monitoring data sits side-by-side with the systems it is collected from, which means that there are Dynatrace Managed clusters beside most SAP Cloud Platform Landscapes. These Dynatrace clusters are installed, updated and managed the same way as all other components of SAP Cloud Platform. This includes validation pipelines that distinguish between development, testing, pre-production and production landscapes. Join us to gain insight on how to automatically deploy and operate Dynatrace clusters at scale reducing the amount of manual labor and focusing on the unique insights that Digital Performance Management delivers.
Experian: Dynatrace real time feedback changed the development culture at exp...Dynatrace
At Experian, we unlock the power of data to create opportunities for consumers, businesses and society. Our customers rely on us for a positive user experience as we help them make decisions at some of the most crucial times in their professional lifespan. The faster the application process - the better the user experience. Essentially when we do not process fast enough, it affects not only our customers financial bottom line but their reputation. Since Dynatrace was introduced into the SDLC, this added tool in our toolbox, has completely changed our engineering and business culture. The Operations Teams have benefited enormously thanks to its real-time usage, performance and quality feedback from the Application Development Teams. By allowing Experian to find, react faster, and in a more automated fashion; we are able to ensure the health of our ongoing applications whilst also applying better quality and improved performance in our application releases. In this session we learn more about how Real User Monitoring Data is leveraged within Experian, how we monitor 3rd party REST providers, and how we enforce our own as well as 3rd party SLAs across the DevOps Pipeline!
Virgin Money: Virgin Money's quest for digital performance perfectionDynatrace
With more than 3.2 million customers and a vastly complex tech landscape, Virgin Money's IT team faces huge pressure to provide the ultimate digital banking experience. In this candid Q&A session, Andy Lofthouse will dive into the company's journey from alert storm and countless hours of problem hunting, to rapid release cycles and precise digital experience insights, which has saved the company inordinate amounts of time and money.
Best Practices for Continuous Delivery in Financial ServicesDynatrace
The document discusses best practices for continuous delivery in financial services. It advocates for starting with the customer experience rather than the application, preventing problems before they impact customers. Gap-free data across the entire delivery chain is essential. The goal is enabling DevOps rather than just operations. Dynatrace provides unified user and application insights to help optimize spend, improve release quality, and keep up with changing customer expectations.
New Farming Methods in the Epistemological Wasteland of Application SecurityJames Wickett
Over the years, application security (appsec) has made progress, but it has also made some considerable mis-steps. Appsec focuses almost solely on developer awareness and secure development training as remediation. This isn't sustainable and arguably does little good. There is a better way, but we have to separate ourselves from the core assumptions we have made that got us here. Lets journey together to find old truths and better approaches.
We will explore ways to make a change for the better across all levels of the development lifecycle, but we will focus on security testing early on in the development process. From this session, you will learn pragmatic approaches and tooling that will affect your development processes and delivery pipelines. You will walk away with code examples and tools that you can put into practice right away for security and rugged testing.
http://lascon.org
http://lascon2015.sched.org/event/175e3c828095386b2fa0fc660b2502a3
This document discusses Go-Jek's approach to reliability at scale. It outlines four iterations of improving reliability: (1) initializing infrastructure with CI/CD, monitoring, and microservices; (2) improving incident handling with reporting, reviews, and defining response processes; (3) enhancing monitoring and alerting with routing, categorization, and embedded SREs; and (4) engineering for failures through techniques like containerization, circuit breakers, and reducing dependencies. The document concludes by noting reliability is iterative and emphasizes allowing mistakes to improve. Go-Jek has grown significantly from 4 products in 2015 to 18 products today while scaling infrastructure from 100 to over 8,000 instances across datacenters.
Zurich: Monitoring a sales force-based insurance application using dynatrace ...Dynatrace
Digital transformation drives us at Zurich North America’s (ZNA) to provide a universal desktop for insurance underwriting. The system utilizes a common SalesForce front-end to access numerous legacy applications. With high demands on performance and usability, monitoring is key but not easy. In this session we will look into what ZNA’s monitoring strategy looks like. Why we picked a layered approach including in-depth legacy application monitoring, operational monitoring of the SalesForce Lightning front-end and end-to-end business process monitoring.
Case Study: Citrix Adopts DevOps Principles to Gain Efficiency and Speed Soft...CA Technologies
Excited by the promise of DevOps and Continuous Delivery principles, Citrix turned to CA Release Automation to get them started. Learn how Citrix was able to speed application deployment times by 80% and address key pain points with their manual and scripted processes, while working to shift their culture to better embrace DevOps principles.
For more information, please visit http://cainc.to/Nv2VOe
PPT Presented at Morton's Steak House in San Francisco. It covers the Monitoring Redefined message as well as how Dynatrace transformed to maintain market share in the new world.
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...apidays
The document discusses using Open Policy Agent (OPA) for microservice application authorization. It describes the new authorization challenges of moving from monoliths to microservices, and how OPA can be used to enforce consistent authorization policies across microservices through a service mesh. It provides examples of how OPA policies can be used for user authorization, service authorization, and context-aware authorization.
Motadata - Unified Product Suite for IT Operations and Big Data Analyticsnovsela
Motadata is a unified IT Infrastructure Monitoring, Log & Flow Management and IT Service Management Platform, offering operational insights into your IT infrastructure and its performance and is designed to identify & resolve complex problems faster that ensures 100% uptime of all business critical components. Motadata enables you to make more informed business decisions by offering complete visibility into the health and key performance indicators (KPIs) of IT services. It helps in reducing CAPEX, offers Agility to resolve issues faster, is compatible in a hybrid ecosystem, and offers ease of integration with existing and future platforms.
In summary, with Motadata, Mindarray Systems offers the perfect solution needed to confidently handle the challenges of today’s increasingly complex business operations and IT infrastructure management.
For more information: nov.sela@gmail.com
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace
Dynatrace announced new features for their AI assistant Davis including notifications for Amazon Echo and Slack. They also discussed plans to further integrate Davis with Dynatrace search capabilities. The company announced a new Innovator Program providing hardware, workshops and early access to new features for 15 participants paying an annual $25k fee. Finally, they demonstrated a new integration with Microsoft HoloLens and discussed how Davis is built on Dynatrace APIs to provide multimodal interfaces for the future.
The document discusses the Government Digital Service's Performance Platform. It provides an overview of the platform and what it can do, including automating data collection, simplifying data presentation, and enabling evidence-based decision making. Case studies are presented showing how different government services have engaged with the Performance Platform, and the benefits they have experienced, such as increased efficiency and compliance. Advice is provided on how services can connect to the platform and have their own customized dashboards built.
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services
The document discusses AWS IoT Analytics and provides an overview of its components and capabilities. It describes how AWS IoT Analytics can be used to collect, preprocess, enrich, store, analyze and visualize IoT device data at scale. Examples are given of how various customers like Valmet and iDevices are using AWS IoT Analytics for applications like predictive maintenance, product optimization, and gaining business insights.
Join us for a closer look at new IT analytics solutions from CA that will help you reduce costs and optimize the customer experience by increasing resource utilization, reducing system outages and allowing for better capacity planning of mainframe resources. See how you can perform root cause analysis in addition to correlating and analyzing data from multiple IT sources to provide better management understanding and real-time prediction of system performance conflicts while lowering MTTR and enabling more efficient mainframe operations. Take part in this highly interactive session, learn how customer-driven agile development capabilities are being prioritized and be a part of shaping the future of new IT analytics innovations at CA.
For more information, please visit http://cainc.to/Nv2VOe
CWIN17 telford api management, practical implementation experience - david ru...Capgemini
This document summarizes a presentation on API management, shift left and right testing strategies, and lessons learned. The presentation discusses managing APIs throughout the development lifecycle, from API-first design to continuous integration and testing in production environments. It promotes testing approaches like test-driven development (TDD) to enable rapid delivery and monitoring APIs in production to rapidly resolve issues. The presentation concludes that recognizing APIs as enabling business change, adopting API-first architectural strategies, and embracing both shift left and right testing are important to effective API management.
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenDigipolis Antwerpen
natural language interface to all automation
Autonomous cloud: self-driving operations with AI
Security: autonomous detection and response
Together, these initiatives will transform IT and enable autonomous clouds
DEM04 Fearless: From Monolith to Serverless with DynatraceAmazon Web Services
When you break your monolith into components, services, or functions, you must understand where and how to break your existing code base and architecture into smaller units so that it scales, performs, and is easy to operate. In this session, Andreas Grabner, technical AWS advocate, shows you how Dynatrace redefined its architecture. He discusses the migration capabilities Dynatrace engineers built into their product and explains how the lessons learned can help you fearlessly transition from monolith to serverless. This session is brought to you by AWS Partner, Dynatrace.
DEM09 [Repeat] Fearless: From Monolith to Serverless with DynatraceAmazon Web Services
Dynatrace is a monitoring platform that can help companies migrate from monolithic architectures to microservices and serverless architectures. It uses AI to automatically map dependencies, detect where to split up monoliths, validate performance and scalability at each step, and provide automated root cause analysis. Dynatrace monitoring and APIs help optimize architectures, automate deployments, and enable self-healing throughout the migration process.
MuleSoft Surat Meetup#39 - Pragmatic API Led ConnectivityJitendra Bafna
This document provides an overview of a MuleSoft meetup on API-led connectivity. It includes introductions of the organizers and agenda. The agenda discusses designing RESTful reusable APIs, how API-led fits into architecture, and an example use case. It also covers when system APIs may be useful, such as to address security issues, improve error handling, reduce complexity, and improve third party APIs. The document emphasizes that core or business APIs are the essential layer and other layers like system or process APIs are optimizations that need only be built if necessary.
SAP: How SAP fully automates the provisioning and operations of its dynatrace...Dynatrace
SAP Cloud Platform is using Dynatrace Managed for internal monitoring of its infrastructure, services and SAP’s own applications all across the world. The multi-IaaS strategy of SAP Cloud Platform brings it beyond SAP’s own data-centers to multiple IaaS providers. SAP’s high standards in terms of governance and data privacy demand that monitoring data sits side-by-side with the systems it is collected from, which means that there are Dynatrace Managed clusters beside most SAP Cloud Platform Landscapes. These Dynatrace clusters are installed, updated and managed the same way as all other components of SAP Cloud Platform. This includes validation pipelines that distinguish between development, testing, pre-production and production landscapes. Join us to gain insight on how to automatically deploy and operate Dynatrace clusters at scale reducing the amount of manual labor and focusing on the unique insights that Digital Performance Management delivers.
Experian: Dynatrace real time feedback changed the development culture at exp...Dynatrace
At Experian, we unlock the power of data to create opportunities for consumers, businesses and society. Our customers rely on us for a positive user experience as we help them make decisions at some of the most crucial times in their professional lifespan. The faster the application process - the better the user experience. Essentially when we do not process fast enough, it affects not only our customers financial bottom line but their reputation. Since Dynatrace was introduced into the SDLC, this added tool in our toolbox, has completely changed our engineering and business culture. The Operations Teams have benefited enormously thanks to its real-time usage, performance and quality feedback from the Application Development Teams. By allowing Experian to find, react faster, and in a more automated fashion; we are able to ensure the health of our ongoing applications whilst also applying better quality and improved performance in our application releases. In this session we learn more about how Real User Monitoring Data is leveraged within Experian, how we monitor 3rd party REST providers, and how we enforce our own as well as 3rd party SLAs across the DevOps Pipeline!
Virgin Money: Virgin Money's quest for digital performance perfectionDynatrace
With more than 3.2 million customers and a vastly complex tech landscape, Virgin Money's IT team faces huge pressure to provide the ultimate digital banking experience. In this candid Q&A session, Andy Lofthouse will dive into the company's journey from alert storm and countless hours of problem hunting, to rapid release cycles and precise digital experience insights, which has saved the company inordinate amounts of time and money.
Best Practices for Continuous Delivery in Financial ServicesDynatrace
The document discusses best practices for continuous delivery in financial services. It advocates for starting with the customer experience rather than the application, preventing problems before they impact customers. Gap-free data across the entire delivery chain is essential. The goal is enabling DevOps rather than just operations. Dynatrace provides unified user and application insights to help optimize spend, improve release quality, and keep up with changing customer expectations.
New Farming Methods in the Epistemological Wasteland of Application SecurityJames Wickett
Over the years, application security (appsec) has made progress, but it has also made some considerable mis-steps. Appsec focuses almost solely on developer awareness and secure development training as remediation. This isn't sustainable and arguably does little good. There is a better way, but we have to separate ourselves from the core assumptions we have made that got us here. Lets journey together to find old truths and better approaches.
We will explore ways to make a change for the better across all levels of the development lifecycle, but we will focus on security testing early on in the development process. From this session, you will learn pragmatic approaches and tooling that will affect your development processes and delivery pipelines. You will walk away with code examples and tools that you can put into practice right away for security and rugged testing.
http://lascon.org
http://lascon2015.sched.org/event/175e3c828095386b2fa0fc660b2502a3
This document discusses Go-Jek's approach to reliability at scale. It outlines four iterations of improving reliability: (1) initializing infrastructure with CI/CD, monitoring, and microservices; (2) improving incident handling with reporting, reviews, and defining response processes; (3) enhancing monitoring and alerting with routing, categorization, and embedded SREs; and (4) engineering for failures through techniques like containerization, circuit breakers, and reducing dependencies. The document concludes by noting reliability is iterative and emphasizes allowing mistakes to improve. Go-Jek has grown significantly from 4 products in 2015 to 18 products today while scaling infrastructure from 100 to over 8,000 instances across datacenters.
Zurich: Monitoring a sales force-based insurance application using dynatrace ...Dynatrace
Digital transformation drives us at Zurich North America’s (ZNA) to provide a universal desktop for insurance underwriting. The system utilizes a common SalesForce front-end to access numerous legacy applications. With high demands on performance and usability, monitoring is key but not easy. In this session we will look into what ZNA’s monitoring strategy looks like. Why we picked a layered approach including in-depth legacy application monitoring, operational monitoring of the SalesForce Lightning front-end and end-to-end business process monitoring.
Case Study: Citrix Adopts DevOps Principles to Gain Efficiency and Speed Soft...CA Technologies
Excited by the promise of DevOps and Continuous Delivery principles, Citrix turned to CA Release Automation to get them started. Learn how Citrix was able to speed application deployment times by 80% and address key pain points with their manual and scripted processes, while working to shift their culture to better embrace DevOps principles.
For more information, please visit http://cainc.to/Nv2VOe
PPT Presented at Morton's Steak House in San Francisco. It covers the Monitoring Redefined message as well as how Dynatrace transformed to maintain market share in the new world.
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...apidays
The document discusses using Open Policy Agent (OPA) for microservice application authorization. It describes the new authorization challenges of moving from monoliths to microservices, and how OPA can be used to enforce consistent authorization policies across microservices through a service mesh. It provides examples of how OPA policies can be used for user authorization, service authorization, and context-aware authorization.
Motadata - Unified Product Suite for IT Operations and Big Data Analyticsnovsela
Motadata is a unified IT Infrastructure Monitoring, Log & Flow Management and IT Service Management Platform, offering operational insights into your IT infrastructure and its performance and is designed to identify & resolve complex problems faster that ensures 100% uptime of all business critical components. Motadata enables you to make more informed business decisions by offering complete visibility into the health and key performance indicators (KPIs) of IT services. It helps in reducing CAPEX, offers Agility to resolve issues faster, is compatible in a hybrid ecosystem, and offers ease of integration with existing and future platforms.
In summary, with Motadata, Mindarray Systems offers the perfect solution needed to confidently handle the challenges of today’s increasingly complex business operations and IT infrastructure management.
For more information: nov.sela@gmail.com
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace
Dynatrace announced new features for their AI assistant Davis including notifications for Amazon Echo and Slack. They also discussed plans to further integrate Davis with Dynatrace search capabilities. The company announced a new Innovator Program providing hardware, workshops and early access to new features for 15 participants paying an annual $25k fee. Finally, they demonstrated a new integration with Microsoft HoloLens and discussed how Davis is built on Dynatrace APIs to provide multimodal interfaces for the future.
The document discusses the Government Digital Service's Performance Platform. It provides an overview of the platform and what it can do, including automating data collection, simplifying data presentation, and enabling evidence-based decision making. Case studies are presented showing how different government services have engaged with the Performance Platform, and the benefits they have experienced, such as increased efficiency and compliance. Advice is provided on how services can connect to the platform and have their own customized dashboards built.
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services
The document discusses AWS IoT Analytics and provides an overview of its components and capabilities. It describes how AWS IoT Analytics can be used to collect, preprocess, enrich, store, analyze and visualize IoT device data at scale. Examples are given of how various customers like Valmet and iDevices are using AWS IoT Analytics for applications like predictive maintenance, product optimization, and gaining business insights.
Join us for a closer look at new IT analytics solutions from CA that will help you reduce costs and optimize the customer experience by increasing resource utilization, reducing system outages and allowing for better capacity planning of mainframe resources. See how you can perform root cause analysis in addition to correlating and analyzing data from multiple IT sources to provide better management understanding and real-time prediction of system performance conflicts while lowering MTTR and enabling more efficient mainframe operations. Take part in this highly interactive session, learn how customer-driven agile development capabilities are being prioritized and be a part of shaping the future of new IT analytics innovations at CA.
For more information, please visit http://cainc.to/Nv2VOe
CWIN17 telford api management, practical implementation experience - david ru...Capgemini
This document summarizes a presentation on API management, shift left and right testing strategies, and lessons learned. The presentation discusses managing APIs throughout the development lifecycle, from API-first design to continuous integration and testing in production environments. It promotes testing approaches like test-driven development (TDD) to enable rapid delivery and monitoring APIs in production to rapidly resolve issues. The presentation concludes that recognizing APIs as enabling business change, adopting API-first architectural strategies, and embracing both shift left and right testing are important to effective API management.
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenDigipolis Antwerpen
natural language interface to all automation
Autonomous cloud: self-driving operations with AI
Security: autonomous detection and response
Together, these initiatives will transform IT and enable autonomous clouds
DEM04 Fearless: From Monolith to Serverless with DynatraceAmazon Web Services
When you break your monolith into components, services, or functions, you must understand where and how to break your existing code base and architecture into smaller units so that it scales, performs, and is easy to operate. In this session, Andreas Grabner, technical AWS advocate, shows you how Dynatrace redefined its architecture. He discusses the migration capabilities Dynatrace engineers built into their product and explains how the lessons learned can help you fearlessly transition from monolith to serverless. This session is brought to you by AWS Partner, Dynatrace.
DEM09 [Repeat] Fearless: From Monolith to Serverless with DynatraceAmazon Web Services
Dynatrace is a monitoring platform that can help companies migrate from monolithic architectures to microservices and serverless architectures. It uses AI to automatically map dependencies, detect where to split up monoliths, validate performance and scalability at each step, and provide automated root cause analysis. Dynatrace monitoring and APIs help optimize architectures, automate deployments, and enable self-healing throughout the migration process.
MuleSoft Surat Meetup#39 - Pragmatic API Led ConnectivityJitendra Bafna
This document provides an overview of a MuleSoft meetup on API-led connectivity. It includes introductions of the organizers and agenda. The agenda discusses designing RESTful reusable APIs, how API-led fits into architecture, and an example use case. It also covers when system APIs may be useful, such as to address security issues, improve error handling, reduce complexity, and improve third party APIs. The document emphasizes that core or business APIs are the essential layer and other layers like system or process APIs are optimizations that need only be built if necessary.
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
Discover the innovative features and strategic vision that keep WSO2 an industry leader. Explore the exciting 2024 roadmap of WSO2 API management, showcasing innovations, unified APIM/APK control plane, natural language API interaction, and cloud native agility. Discover how open source solutions, microservices architecture, and cloud native technologies unlock seamless API management in today's dynamic landscapes. Leave with a clear blueprint to revolutionize your API journey and achieve industry success!
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
Amazon Kinesis makes it easy to speed up the time it takes for you to get valuable, real-time insights from your streaming data. In this session, we walk through the most popular applications that customers implement using Amazon Kinesis, including streaming extract-transform-load, continuous metric generation, and responsive analytics. Our customer Autodesk joins us to describe how they created real-time metrics generation and analytics using Amazon Kinesis and Amazon Elasticsearch Service. They walk us through their architecture and the best practices they learned in building and deploying their real-time analytics solution.
NFL and Forwood Safety Deploy Business Analytics at Scale with Amazon QuickSi...Amazon Web Services
The document discusses how Forwood Safety and the National Football League use Amazon QuickSight for business analytics. Forwood Safety uses QuickSight to analyze safety verification data from millions of checks to identify potentially fatal issues. The NFL uses QuickSight for its Next Gen Stats to provide real-time player tracking data to media, teams, and broadcasters. QuickSight provides benefits like easy scalability, no server management, and pay per use pricing.
Microservics, serverless and real time; Building blocks of the modern data pi...Manisha Sule
The document discusses modern approaches for building data pipelines, including microservices, serverless computing, and real-time analytics. It describes how microservices break large applications into independent services, and the advantages this provides. Serverless computing is defined as an event-driven execution model where cloud providers manage infrastructure. Real-time analytics involves processing and analyzing data as it becomes available. Examples of AWS and Google Cloud services for implementing real-time analytics are also provided.
The document discusses IBM's BPM, API Management, and Application Performance Management solutions. It provides pricing and licensing details for IBM Process Center and Process Server for BPM, as well as rental and on-premise pricing models for IBM DataPower Gateway, API Management, and Application Monitoring solutions. It also identifies potential areas of opportunity and provides an overview of IBM's Application Performance Management SaaS offering.
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
Nowadays, one tweet can have impact on the value of the company or cryptocurrency. It becomes important for companies to be able to know everything what's happening in the market, especially for startups or when entering the new market. The presentation is about presenting the complex platform used for creating and verifying the strategy for a startup from the Wellbeing market. We go through web scraping-based data ingestion to ElasticSearch, NLP pipelines to understand what people write and what is the possible future of each market predicted by PySpark job.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Similar to That Conference 2017: Refactoring your Monitoring (20)
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
3. Today’s Climb
Overview
Your monitoring stack
Deciding what to monitor
The monitoring project-plan
Extra: Humane on-call rotations
@sysadm1138ThatConference 2017
5. This is your stack. Really
Polling Engine
Reporting Engine
User Interface
Aggregation Engine
Alerting Engine
API
Humans
PolicyEngine
@sysadm1138ThatConference 2017
6. Scheduled-tasks &
Powershell
Scheduler runs scripts on a schedule.
Scripts emit email or update a database.
Polling Engine
Reporting
Engine
User Interface
Aggregation Engine
Alerting
Engine
API
Humans
PolicyEngine
@sysadm1138ThatConference 2017
10. Aggregation Engine
Turns raw data into useful data.
● Summarizes over time (think RRDTool)
● Does stats (min/max/%-tile) on incoming
stream.
● Summarizes over system/rack/datacenter
No one (except possibly Google) keeps full
granularity monitoring logs forever and ever in
a trivially queryable way. Too expensive, and
you don’t usually care about 2 years ago.
Aggregation Engine
Alerting
Engine
@sysadm1138ThatConference 2017
11. Alerting Engine
Bothering humans in realtime!
● May do analytics.
● May be threshold-based, or trigger on
very sophisticated conditions.
● Scripts that send email every time.
● Scripts that drop notices in group-chat.
● Night-operator calling the Systems
Engineers
● PagerDuty.
Alerting
Engine
@sysadm1138ThatConference 2017
12. Reporting Engine
Bothering humans on a lag!
● Long-term trends
● Capacity analysis
● Growth tracking
● Full-bore big-data analytics
● SLA pass/fail reporting
● Track user behaviors across features
● BA building reports for executives
Reporting
Engine
@sysadm1138ThatConference 2017
13. API
Programatic interfaces into your monitoring
system.
● Build feedback systems
● Manage policy-engine details
● Could be your CM system
Good monitoring systems have APIs. It makes
them easier to integrate with. And integration is
usage.
API
@sysadm1138ThatConference 2017
14. User Interface
How humans interface with it.
A monitoring system with a bad user-
interface is a bad monitoring system.
- Jamie Riedesel, lots of times
I’ve seen things.
User Interface
API
@sysadm1138ThatConference 2017
15. User Interface
To access a previous job’s monitoring system:
1. Open a browser.
2. Log in using 2-factor to our SSL-VPN.
3. Connect to RDP using same password as VPN.
4. Open another browser.
5. Hit Monitoring site.
6. Using non SSO-ed password, log in.
7. See what’s going on.
User Interface
API
@sysadm1138ThatConference 2017
16. Policy Engine
This defines the behavior of each stage of the
stack.
Configured as part of the User Interface and
API.
PolicyEngine
User Interface
API
Humans
@sysadm1138ThatConference 2017
17. Policy Engine +
Polling engine
● How often are things polled?
○ Every 10s, 1m, 2m, 5m, 1d?
● Does polling get paused for
maintenance-windows?
● What data gets reported to the
Aggregation Engine?
Polling Engine
PolicyEngine
User Interface
API
Humans
@sysadm1138ThatConference 2017
18. Policy Engine +
Aggregation Engine
● How long do you keep data at all?
● How long do you keep full granularity
data?
● How long do you keep summarized data?
● Where do you keep full granularity data?
● Where do you keep summarized data?
● How do you summarize data?
○ Time? System? Location?
● Do maintenance windows affect any of
Polling Engine
Aggregation Engine
PolicyEngine
User Interface
API
Humans
@sysadm1138ThatConference 2017
19. Policy Engine +
Alerting Engine
● Which alarms merit bothering humans?
● Which alarms merit automatic fixing?
● Which alarms can be ignored?
● How do maintenance-windows impact
alarms?
● What escalation policies are in place?
Polling Engine
Aggregation Engine
Alerting
Engine
API
PolicyEngine
User Interface
API
Humans
@sysadm1138ThatConference 2017
20. Policy-Engine +
Reporting Engine
● Do reports get automatically generated?
● What reports are viewable on-demand?
● What reports are defined?
● Are ad-hoc reports possible?
● Who gets automatically generated
reports?
● What trends are we looking for?
Polling Engine
Reporting
Engine
User Interface
Aggregation Engine
Alerting
Engine
API
Humans
PolicyEngine
@sysadm1138ThatConference 2017
21. That’s cleared up!
Polling Engine
Reporting Engine
User Interface
Aggregation Engine
Alerting Engine
API
Humans
PolicyEngine
@sysadm1138ThatConference 2017
22. Deciding What To Monitor
PLANNING THE APPROACH
@sysadm1138ThatConference 2017
23. Different Kinds of
Monitoring
Granularity and goals differ
from type to type. Be aware of
these as you build your system.
Performance Monitoring
Operational Monitoring
Capacity Monitoring
SLA Monitoring
@sysadm1138ThatConference 2017
24. Performance Monitoring
Granularity: Very high ( 10s, 1s, or even sub-second)
Duration: As-needed
Response: Realtime
Tools: Procmon, wireshark, strace, perf, Performance Monitor, gdb
Typically done as part of debugging, troubleshooting, and profiling activities. Granularity is much
higher than operational monitoring. Typically, results are reviewed in near realtime and not
persisted long.
Performance Monitoring
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
25. Operational Monitoring
Granularity: Medium (1m, 2m, 5m, 10m, 1h, etc)
Duration: Continuous.
Response: Rapid.
Tools: Dell OpenManage, HP Operations Manager, Cisco OpManager, NetApp
What most people think of when you say monitoring (but they’re wrong). This type of monitoring
catches the health of your infrastructure and is not directly related to the services it provides.
Think disk replacements, switch failures, and tornados.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
Operational Monitoring
26. This one is easy
OPERATIONAL MONITORING
1
The SLA for this is: our infrastructure can support the
delivery of our products and services.
● Switch failures.
● Disk failures.
● Blade-chassis failures.
● UPS failures.
● PSU / PDU failures.
● Compliance failures.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
27. Capacity Monitoring
Granularity: Low (1h, 1d, 1w, 1m)
Duration: Continual or occasional
Response: Slow
Tools: Grafana, Kibana, Graphite, Nagios, Excel
Monitoring the capacity of your system to do work. Lead times can be quite long for some
replacements (SAN arrays), and capacity can be budgetary more than hardware. Especially in
cloud contexts.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
Capacity Monitoring
28. How much do I need,
And when do I need it?
CAPACITY MONITORING
2
Every product or service uses consumables. This is
where you track them:
● Disk-space
● Cloud budget
● Overtime allowance
● P1 incident usage
● SmartHands budget
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
29. Service Level Agreement Monitoring
Granularity: Medium to Low
Duration: Continual
Response: Rapid and Slow
Tools: Everything
Monitoring to detect whether or not you’re meeting your SLA for a given service or services.
Where most monitoring really exists.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLA Monitoring
30. This one is complicated
SERVICE LEVEL AGREEMENT MONITORING
3
How your product or service is supposed to perform. Not
just executives care about SLAs.
SLA: Service Level Agreement
SLO: Service Level Objectives
SLI: Service Level Indicators
We’ll get into these.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
31. What if we don’t have SLAs? That’s like…
commitment. We avoid that around here!
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
32. What if we don’t have SLAs? That’s like…
commitment. We avoid that around here!
Yes, you have an SLA
No, really. You do.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
33. The service is up when our users need it
to be.
And if it isn’t, they’re allowed to slag us
on Twitter.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
DEFACTO SERVICE LEVEL AGREEMENT
34. The service is up when our users need it
to be.
And if it isn’t, they’re allowed to slag us
on Twitter.
In short, 100% uptime or your reputation will be hauled through the meat-grinder.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
DEFACTO SERVICE LEVEL AGREEMENT
35. We promise X availability, on penalty of Y
things, outside of Q maintenance
periods. Planned outages will have no
less than Z days notice...
Less likely to end up as a meme on Twitter. This can be 100% an internal-only document!
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
DEFINED SERVICE LEVEL AGREEMENT
36. Service Level Agreement (SLA): An agreement, written in
Human; or sometimes Lawyer. Sets goalposts, defines penalties
(if any), defines terms.
Service Level Objective (SLO): A set of objectives, written in
Engineer. Technical definition of the goalposts in the SLA.
Service Level Indicator (SLI): Something that tells you whether
or not you’re meeting your SLO.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
DEFINITIONS
37. SLA: The service is up 99.99% of the time, not including
scheduled maintenance.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLOs - SERVICE LEVEL OBJECTIVES
38. SLA: The service is up 99.99% of the time, not including
scheduled maintenance.
● The settings page renders in under 10 seconds.
● The site returns HTTP-200 from Europe within 2 seconds.
● Branch-office ADC01 can reach the service.
● 98%-tile end to end request time is not more than 3
seconds.
● The SSL certificate is valid and chains to our CA.
● The text, “Welcome to Example Co,” is on the main page.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLOs - SERVICE LEVEL OBJECTIVES
39. SLA: The site is up 99.99% of the time, not including scheduled
maintenance.
SLO:
● Site is reachable.
● The site is showing the right content.
● Scheduled maintenance is tracked.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLOs - SERVICE LEVEL OBJECTIVES: HasDCDoneSomethingStupidToday.com
40. SLA: Printing is available in Computer Labs 99.99% of the time,
outside of scheduled closures and maintenance.
SLO:
● Every Computer Lab has at least one working printer with
paper.
● Printers service only the central print queues.
● The swipe-card terminal in Computer Labs must work for
the printers to be considered ‘working’.
● Printers do not work if they can’t talk to the payment
processor.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLOs - SERVICE LEVEL OBJECTIVES: University Print Services
41. SLO: The settings page renders in under 10 seconds.
SLI:
● Logins work.
● Page render-time from same data-center.
● Page render-time from Europe.
● Database disk-queue length.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLIs - SERVICE LEVEL INDICATORS: Specific monitorables!
42. SLO: 98%-tile end to end request time is not more than 3
seconds.
SLI:
● Time-to-process for all requests.
● Request processing is functional at least 30 seconds ago.
● 10 minute 98th percentile request-time average.
● 10 minute 50th percentile request-time average.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
SLIs - SERVICE LEVEL INDICATORS: Specific monitorables!
43. Service Level Agreement (SLA): An agreement, written in
Human; or sometimes Lawyer. Sets goalposts, defines penalties
(if any), defines terms.
Service Level Objective (SLO): A set of objectives, written in
Engineer. Technical definition of the goalposts in the SLA.
Service Level Indicator (SLI): Something that tells you whether
or not you’re meeting your SLO.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
DEFINITIONS
44. Alarm: Informing humans of failing SLI/SLOs in realtime.
Report: Eventually informing humans of failing SLI/SLOs
Which humans do you bother for each SLI/SLO? Only you can
figure that out!
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
DEFINITIONS
45. Specific: Must tell me something specific is
wrong.
Alarms that require a human to log in to figure out what is
actually wrong, if anything is, are bad alarms.
FYI alarms lead to high cognitive load and decrease worker
satisfaction.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
GOOD ALARMS
46. Actionable: Must be something I can directly fix
Getting alarmed for things you can’t fix is a great road to
burnout. These are especially great at 3:19 AM.
The failure mode is teaching people that some alarms can be
ignored safely. Eventually, they’ll ignore the wrong one. This is
bad.
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
GOOD ALARMS
47. Format Agnostic: Don’t be a dick about format
If a team wants full HTML with links to runbooks and wiki-pages,
let ‘em.
If a team wants the entire alert to fit into their iPhone lock-
screen, let ‘em.
Better, allow both!
@sysadm1138ThatConference 2017 @sysadm1138ThatConference 2017
GOOD ALARMS
50. Get Approval For The Project:
● If it’s just you, that’s easy! Do it.
● A good monitoring product is used by many people
○ Get buy-in from not just IT, but sales, support, etc.
● Pitch to the business-case, not process improvement for your department.
○ We will reduce customer churn by enabling our CSMs.
○ We will improve our reaction time to reputation-impacting events.
○ This will increase buy-in from other departments, enabling our IT
goals
0
PROJECT PLAN
@sysadm1138ThatConference 2017
51. Figure out high-level needs (SLA)
● If you have a written one? Great! Work backwards from that.
● If you have an unwritten one, ask people to see what they think it is.
○ Play 20-questions with higher level execs on impacts of down-time
and service degredations.
○ Point out the de facto SLA, see how they react.
○ Point out we don’t need to publish the SLA to our customers, but can
have one internally.
● If you have microservices, each service will need its own SLA.
1
PROJECT PLAN
@sysadm1138ThatConference 2017
52. Figure out concrete definitions (SLO)
● Now that you have an SLA, or many SLAs, do the analysis to determine
what ‘up’ and ‘responsive’ mean in a concrete way.
● Ask other people to get involved. Involvement keep the project rolling.
● This is an opportunity for education with business leaders.
2
PROJECT PLAN
@sysadm1138ThatConference 2017
53. Figure out specific monitorables (SLI)
● Take your SLO list and figure out how to monitor for each.
● You may need to monitor new things.
● You may be able to stop monitoring/alarming some other things.
● Magic happens: your first opportunity to turn off existing alarms!
3
PROJECT PLAN
@sysadm1138ThatConference 2017
54. Figure out how to monitor those things
● Some of this may already exist. If so, cool.
● Some may need to be monitored in a different way.
● Some may need to be monitored for the first time.
● This defines how the Polling Engine works.
● Build new engines if you need to.
● Poll direct measurements where you can, try not to use proxy
measurements.
4
PROJECT PLAN
@sysadm1138ThatConference 2017
Polling Engine
55. Decide on your aggregation techniques
● Some of this may already exist. If so, cool.
● Perhaps you don’t need to keep data as long as you thought.
● Perhaps you need to keep high granularity data longer than you thought.
● Perhaps you need to start tracking things like percentiles and standard-
deviations.
● This defines how the Aggregation Engine works.
5
PROJECT PLAN
@sysadm1138ThatConference 2017
Aggregation Engine
56. Alert Definition (OperationalSLA monitoring)
● Some of this may already exist. If so, cool.
● Figure out who needs to know what and how fast they need to know it.
● One person shop? Easy!
● Ops team of 80? There will be meetings.
○ Work with each group individually.
○ Be flexible with requirements in each.
○ Don’t force communications-format standards without good cause.
○ Ensure the alarms are specific and actionable.
6
PROJECT PLAN
@sysadm1138ThatConference 2017
Alerting
Engine
57. Report Definition (CapacitySLA monitoring)
● Some of this may already exist. If so, cool.
● Figure out how to write the pass/fail report for your SLAs.
● Determine what kind of response-times are needed to address SLA risks.
● Determine what kind of response-times are needed for capacity risks.
● Determine who gets what.
7
PROJECT PLAN
@sysadm1138ThatConference 2017
Reporting
Engine
58. Periodic Review
● Run the system for a while.
● Come back 3 months, 6 months later and ask questions.
○ How are the alarms working for you?
○ What changes do you think need to be made?
○ What new things have shown up?
● Especially important for departments that haven’t been attached to a
monitoring system before.
8
PROJECT PLAN
@sysadm1138ThatConference 2017
Humans
59. Step 0: Get approval
Step 1: Figure out high level needs (Service Level Agreement)
Step 2: Turn that into concrete definitions (Service Level Objectives)
Step 3: Figure out specific monitorables (Service Level Indicators)
Step 4: Decide how to monitor it (Polling Engine)
Step 5: Determine aggregation requirements (Aggregation Engine)
Step 6: Define Alerts (Operational and SLA monitoring)
Step 7: Define Reports (Capacity and SLA monitoring)
Step 8: Periodic Review
@sysadm1138ThatConference 2017
60.
61. Post-Incident Review Questions
1. Did the monitoring system see the problem?
2. Did we react to the monitoring system, or humans?
3. Is it worth our time to catch this problem in the monitoring system?
4. What changes do we need to make, including to alerts, to deal with this in
the future?
9
PROJECT MAINTENANCE
@sysadm1138ThatConference 2017