An introductory talk on Chaos Engineering, featuring Chaos Toolkit and ChaosIQ that provides Chaos for Cloud Native Microservices
The live streamed video of the talk being given at WorldPay is available on Twitter: https://www.pscp.tv/w/1DXGyEzMrRWGM?t=9
From Chaos to Verification at Expedia Group, LondonRussell Miles
Chaos engineering delivers evidence of system weakness; system verification helps chaos engineering bring context and business value so that you can make better decisions about where to focus your resources to improve a system's reliability.
This talk was given by Russ Miles, CEO of ChaosIQ, at the London Chaos and Resilience Engineering meetup on 28/01/2020
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
#6 Top Rated Talk for QCon New York 2017 on how to get started with Chaos Engineering. Provides both high-level talk on the practice of Chaos Engineering and pointed advice on best practices from bringing Chaos Engineering to Jet.com and working on Chaos Engineering at Netflix.
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018Russell Miles
Keynote delivered for W-JAX in Munich in November 2018 on how you can use Chaos Engineering as part of establishing your own Resilience Engineering capability.
Break stuff - Confessions of a misguided chaos engineerRussell Miles
In this talk I walk through the many unfortunate mistakes people make when adopting chaos engineering. Sharing the pain, so you can hopefully avoid it.
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2luk9iS.
Tammy Butow shares her experiences using chaos engineering to build resilient systems, when they couldn’t build their systems from scratch. Filmed at qconlondon.com.
Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Previously, she led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers.
General overview of what is "Chaos Engineering", the current
"perturbation models" available and the benefits of Chaos Engineering to Customers, Business and Tech.
Chaos Engineering when you're not NetflixMartez Reed
This document discusses chaos engineering and how organizations that are not Netflix can implement it. It begins with defining chaos engineering as experimenting on systems to build confidence in their ability to withstand turbulent conditions. It then discusses why Netflix uses chaos engineering due to their large scale microservices architecture. While most organizations are not the size of Netflix, the document outlines how chaos engineering can still be beneficial by challenging common assumptions about architectures and validating system resilience. It provides examples of chaos engineering experiments and tools that can be used to implement chaos engineering.
An introductory talk on Chaos Engineering, featuring Chaos Toolkit and ChaosIQ that provides Chaos for Cloud Native Microservices
The live streamed video of the talk being given at WorldPay is available on Twitter: https://www.pscp.tv/w/1DXGyEzMrRWGM?t=9
From Chaos to Verification at Expedia Group, LondonRussell Miles
Chaos engineering delivers evidence of system weakness; system verification helps chaos engineering bring context and business value so that you can make better decisions about where to focus your resources to improve a system's reliability.
This talk was given by Russ Miles, CEO of ChaosIQ, at the London Chaos and Resilience Engineering meetup on 28/01/2020
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
#6 Top Rated Talk for QCon New York 2017 on how to get started with Chaos Engineering. Provides both high-level talk on the practice of Chaos Engineering and pointed advice on best practices from bringing Chaos Engineering to Jet.com and working on Chaos Engineering at Netflix.
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018Russell Miles
Keynote delivered for W-JAX in Munich in November 2018 on how you can use Chaos Engineering as part of establishing your own Resilience Engineering capability.
Break stuff - Confessions of a misguided chaos engineerRussell Miles
In this talk I walk through the many unfortunate mistakes people make when adopting chaos engineering. Sharing the pain, so you can hopefully avoid it.
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2luk9iS.
Tammy Butow shares her experiences using chaos engineering to build resilient systems, when they couldn’t build their systems from scratch. Filmed at qconlondon.com.
Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Previously, she led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers.
General overview of what is "Chaos Engineering", the current
"perturbation models" available and the benefits of Chaos Engineering to Customers, Business and Tech.
Chaos Engineering when you're not NetflixMartez Reed
This document discusses chaos engineering and how organizations that are not Netflix can implement it. It begins with defining chaos engineering as experimenting on systems to build confidence in their ability to withstand turbulent conditions. It then discusses why Netflix uses chaos engineering due to their large scale microservices architecture. While most organizations are not the size of Netflix, the document outlines how chaos engineering can still be beneficial by challenging common assumptions about architectures and validating system resilience. It provides examples of chaos engineering experiments and tools that can be used to implement chaos engineering.
Security incident response is a reactive and chaotic exercise. What if it were possible to flip the scenario on its head? Security focused chaos engineering takes the approach of advancing the security incident response apparatus by reversing the postmortem and preparation phases. Contrary to Purple Team or Red Team game days, Security Chaos Engineering does not use threat actor tactics, techniques and procedures. It develops teams through unique configuration, cyber threat and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Security Chaos Engineering allows incident response and product teams to derive new information about the state of security within their distributed systems that was previously unknown. Within this new paradigm of instrumentation where we proactively conduct “Pre-Incident” vs. “Post-Incident” reviews we are now able to more accurately measure how effective our security incident response teams, tools, skills, and procedures are during the manic of the Incident Response function.
In this session Aaron Rinehart, the mind behind the first Open Source Security Chaos Engineering tool ChaoSlingr, will introduce how Security Chaos Engineering can be applied to create highly secure, performant, and resilient distributed systems.
The document is a presentation about infrastructure automation for the cloud. It discusses how infrastructure is changing with the rise of cloud computing and how this necessitates new approaches like treating infrastructure as code. It advocates for techniques like configuration management, version controlling all components, building from source code, enabling one step deployments, continuous monitoring, and integrating development and operations teams through a DevOps culture and shared processes. The overall goal is to enable agile infrastructure that allows for business agility and a faster time to market.
The document discusses concepts related to game day and chaos engineering on AWS. It provides examples of chaos experiments that can be conducted such as resource exhaustion, network unreliability, and datastore saturation. It also discusses tools for chaos engineering like Chaos Toolkit and Simian Army. The goal of game days and chaos engineering is to test systems resilience by simulating failures and disasters to gain insights on how to improve systems reliability.
This is a presentation I gave to 100+ people at Rev1 Ventures in Columbus, OH. The presentation was about how to define DevOps. Like any new concept, there are multiple and sometimes competing definitions. I've found that implementations of DevOps can change but there are some very common anti-patterns. Lastly, I talk about how we implement DevOps at Bold Penguin.
GameDay - Achieving resilience through Chaos EngineeringDiUS
http://dius.com.au/resources/game-day/
Agility has brought us iterative software development, independent feature teams, nimble architectures and distributed, scalable infrastructure. But how do you maintain confidence in these systems in the face of this emergent complexity and fast paced change? The answer is to anticipate and practice failure!
In this session we explore GameDays, a collaborative exercise where teams safely introduce chaos into their systems, in order to make them better.
DevOpsDays PGH: How to Fail With One Weird TrickPete Cheslock
1. The document is a series of tweets by Pete Cheslock about DevOps failures and successes.
2. Cheslock discusses how organizational structure, priorities, and culture can enable or prevent effective DevOps practices.
3. He emphasizes building strong, collaborative teams over job titles and encourages learning from failures as an organization rather than assigning blame.
AllDayDevOps : DevSecOps & Chaos Engineering: Knowing the UnknownAaron Rinehart
This document discusses how DevSecOps and chaos engineering can be used to test systems and build confidence in their ability to withstand turbulent conditions. It provides an overview of how a large healthcare company faces challenges due to its size, complexity, and diverse technology portfolio. The document advocates using chaos engineering experiments to gain objective understanding of security, validate security incident response plans, discover new insights about security tools and processes, and build a learning culture around security. It acknowledges that systems are becoming more unpredictable and difficult for humans to understand, and that chaos engineering can help determine how security defenses actually work.
Adventures in a Microservice world at REA Groupevanbottcher
This document discusses lessons learned from working in a microservices environment. Some key points include:
- Create an isolated development environment for each service that is as simple as possible.
- The team that builds a service should also be responsible for supporting it.
- Be cautious about shared code and resources between services to avoid coupling.
- Create templates or patterns for common service types to standardize development.
- Consider the costs and benefits of different types of automated tests.
- Putting services into production quickly provides the best testing environment.
- Assign long-term responsibility and support for each service through "service custodianship."
- Establish delivery engineering practices to continuously improve deployment processes.
Why We Can't Have Nice Things, A Tale of Woe and a Hope For the FuturePete Cheslock
What this talk here: https://vimeo.com/129822165
DevOpsDays Austin Talk.
Computers are hard, and security is even harder. Let's discuss things to do when you have a dedicated Infosec team, and tools you can use when you don't.
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
This document discusses enabling business continuity when employees are unavailable by focusing on adaptive capacity. It recommends decentralizing platforms, communication, and knowledge through approaches like cloud-native engineering, modern communication tools, and runbook automation. Runbook automation involves capturing expert knowledge in automated runbooks to standardize responses and allow anyone to handle incidents. The document advocates testing capabilities regularly through everyday operations to prepare for disruptions and becoming a learning organization that treats incidents as opportunities. The goal is to move beyond legacy business continuity strategies that may be undermined by increasing complexity and change.
Everyone has a plan until... Automacon16Pete Cheslock
The document discusses the challenges of automation and scaling systems over time. It notes that initial plans for automation often fail to account for the complexities of scaling up over the long run. Several technologies and tools are listed that have been used in automation efforts. The document advocates for a mindset of continuous improvement rather than premature optimization, and accepting that environments and tools will change over time.
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.
About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...DevOpsDays Tel Aviv
One of the critical factors for development velocity is software correctness. Our ability to develop and ship new features fast is bounded by our ability to validate several aspects of the change: * Does the feature meet the requirements? * How does the feature affect existing code, and how can it affect the production environment? With continues codebase growth and new features being added, naturally our productivity decreases, and our need to improve the guarantees for quality and correctness increase.
In this talk, I’ll focus on testing environments: why developers need a self-serve platform to create a full functioning environment on-demand, how such environments should be managed, and how can one restore part of the lost velocity. I’ll cover an internal system we use at AppsFlyer called ‘Namespaces’ that addresses the issue with the help of Mesos / Marathon, Docker, Traefik, and Consul.
What makes a “good” service is a moving target. Technologies and requirements change over time. It can be impossible to ensure that none of your services have been left behind.
The Service ScoreCard approach is to have a small check for each service initiative we have, this could be anything measurable; deployment frequency, the oncall team all have phone; ensuring the latest version of the JVM.
The Service ScoreCard, gives each service a grade from 'F' to 'A+', based on passing or failing the list of checks. As soon as anyone see the service grade’s slipping everyone rallies to improve the grades.
We can then set up rules based on the grades, “Only B and above services can deploy 24 / 7”, “moratorium on services without an A+” or “No SRE support until the services below C grade”.
Albert Witteveen - With Cloud Computing Who Needs Performance TestingTEST Huddle
EuroSTAR Software Testing Conference 2013 presentation on With Cloud Computing Who Needs Performance Testing by Albert Witteveen.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
Practical Chaos Engineering will show how to start running chaos experiments in your infrastructure and will try to guide your through the principles of chaos.
The practical implementation of Continuous Delivery at Etsy, and how it enables the engineering team to build features quickly, refactor and change architecture, and respond to problems in production.
Presented at GOTO Aarhus 2012.
Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.
http://www.etsy.com/careers
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
Dr. Elephant is a self-serve performance tuning tool for Hadoop that was created by LinkedIn to address the challenges their engineers faced in optimizing Hadoop performance. It automatically monitors completed Hadoop jobs to collect diagnostic information and identifies performance issues. It provides a dashboard and search interface for users to analyze job performance and get help tuning jobs. The goal is to help every user get the best performance without imposing a heavy time burden for learning or troubleshooting.
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
Building Reactive Applications With Akka And JavaTu Pham
Reactive applications share four key traits: responsive, resilient, message-driven, and elastic. The Akka framework supports building reactive systems using the actor model which represents processing, storage, and communication as actors that receive messages asynchronously. Actors can send messages to other known actors, create new actors, and handle the next received message. This allows reactive applications to scale dynamically to meet demand and tolerate failures through message passing and supervision hierarchies.
Security incident response is a reactive and chaotic exercise. What if it were possible to flip the scenario on its head? Security focused chaos engineering takes the approach of advancing the security incident response apparatus by reversing the postmortem and preparation phases. Contrary to Purple Team or Red Team game days, Security Chaos Engineering does not use threat actor tactics, techniques and procedures. It develops teams through unique configuration, cyber threat and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Security Chaos Engineering allows incident response and product teams to derive new information about the state of security within their distributed systems that was previously unknown. Within this new paradigm of instrumentation where we proactively conduct “Pre-Incident” vs. “Post-Incident” reviews we are now able to more accurately measure how effective our security incident response teams, tools, skills, and procedures are during the manic of the Incident Response function.
In this session Aaron Rinehart, the mind behind the first Open Source Security Chaos Engineering tool ChaoSlingr, will introduce how Security Chaos Engineering can be applied to create highly secure, performant, and resilient distributed systems.
The document is a presentation about infrastructure automation for the cloud. It discusses how infrastructure is changing with the rise of cloud computing and how this necessitates new approaches like treating infrastructure as code. It advocates for techniques like configuration management, version controlling all components, building from source code, enabling one step deployments, continuous monitoring, and integrating development and operations teams through a DevOps culture and shared processes. The overall goal is to enable agile infrastructure that allows for business agility and a faster time to market.
The document discusses concepts related to game day and chaos engineering on AWS. It provides examples of chaos experiments that can be conducted such as resource exhaustion, network unreliability, and datastore saturation. It also discusses tools for chaos engineering like Chaos Toolkit and Simian Army. The goal of game days and chaos engineering is to test systems resilience by simulating failures and disasters to gain insights on how to improve systems reliability.
This is a presentation I gave to 100+ people at Rev1 Ventures in Columbus, OH. The presentation was about how to define DevOps. Like any new concept, there are multiple and sometimes competing definitions. I've found that implementations of DevOps can change but there are some very common anti-patterns. Lastly, I talk about how we implement DevOps at Bold Penguin.
GameDay - Achieving resilience through Chaos EngineeringDiUS
http://dius.com.au/resources/game-day/
Agility has brought us iterative software development, independent feature teams, nimble architectures and distributed, scalable infrastructure. But how do you maintain confidence in these systems in the face of this emergent complexity and fast paced change? The answer is to anticipate and practice failure!
In this session we explore GameDays, a collaborative exercise where teams safely introduce chaos into their systems, in order to make them better.
DevOpsDays PGH: How to Fail With One Weird TrickPete Cheslock
1. The document is a series of tweets by Pete Cheslock about DevOps failures and successes.
2. Cheslock discusses how organizational structure, priorities, and culture can enable or prevent effective DevOps practices.
3. He emphasizes building strong, collaborative teams over job titles and encourages learning from failures as an organization rather than assigning blame.
AllDayDevOps : DevSecOps & Chaos Engineering: Knowing the UnknownAaron Rinehart
This document discusses how DevSecOps and chaos engineering can be used to test systems and build confidence in their ability to withstand turbulent conditions. It provides an overview of how a large healthcare company faces challenges due to its size, complexity, and diverse technology portfolio. The document advocates using chaos engineering experiments to gain objective understanding of security, validate security incident response plans, discover new insights about security tools and processes, and build a learning culture around security. It acknowledges that systems are becoming more unpredictable and difficult for humans to understand, and that chaos engineering can help determine how security defenses actually work.
Adventures in a Microservice world at REA Groupevanbottcher
This document discusses lessons learned from working in a microservices environment. Some key points include:
- Create an isolated development environment for each service that is as simple as possible.
- The team that builds a service should also be responsible for supporting it.
- Be cautious about shared code and resources between services to avoid coupling.
- Create templates or patterns for common service types to standardize development.
- Consider the costs and benefits of different types of automated tests.
- Putting services into production quickly provides the best testing environment.
- Assign long-term responsibility and support for each service through "service custodianship."
- Establish delivery engineering practices to continuously improve deployment processes.
Why We Can't Have Nice Things, A Tale of Woe and a Hope For the FuturePete Cheslock
What this talk here: https://vimeo.com/129822165
DevOpsDays Austin Talk.
Computers are hard, and security is even harder. Let's discuss things to do when you have a dedicated Infosec team, and tools you can use when you don't.
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
This document discusses enabling business continuity when employees are unavailable by focusing on adaptive capacity. It recommends decentralizing platforms, communication, and knowledge through approaches like cloud-native engineering, modern communication tools, and runbook automation. Runbook automation involves capturing expert knowledge in automated runbooks to standardize responses and allow anyone to handle incidents. The document advocates testing capabilities regularly through everyday operations to prepare for disruptions and becoming a learning organization that treats incidents as opportunities. The goal is to move beyond legacy business continuity strategies that may be undermined by increasing complexity and change.
Everyone has a plan until... Automacon16Pete Cheslock
The document discusses the challenges of automation and scaling systems over time. It notes that initial plans for automation often fail to account for the complexities of scaling up over the long run. Several technologies and tools are listed that have been used in automation efforts. The document advocates for a mindset of continuous improvement rather than premature optimization, and accepting that environments and tools will change over time.
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.
About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...DevOpsDays Tel Aviv
One of the critical factors for development velocity is software correctness. Our ability to develop and ship new features fast is bounded by our ability to validate several aspects of the change: * Does the feature meet the requirements? * How does the feature affect existing code, and how can it affect the production environment? With continues codebase growth and new features being added, naturally our productivity decreases, and our need to improve the guarantees for quality and correctness increase.
In this talk, I’ll focus on testing environments: why developers need a self-serve platform to create a full functioning environment on-demand, how such environments should be managed, and how can one restore part of the lost velocity. I’ll cover an internal system we use at AppsFlyer called ‘Namespaces’ that addresses the issue with the help of Mesos / Marathon, Docker, Traefik, and Consul.
What makes a “good” service is a moving target. Technologies and requirements change over time. It can be impossible to ensure that none of your services have been left behind.
The Service ScoreCard approach is to have a small check for each service initiative we have, this could be anything measurable; deployment frequency, the oncall team all have phone; ensuring the latest version of the JVM.
The Service ScoreCard, gives each service a grade from 'F' to 'A+', based on passing or failing the list of checks. As soon as anyone see the service grade’s slipping everyone rallies to improve the grades.
We can then set up rules based on the grades, “Only B and above services can deploy 24 / 7”, “moratorium on services without an A+” or “No SRE support until the services below C grade”.
Albert Witteveen - With Cloud Computing Who Needs Performance TestingTEST Huddle
EuroSTAR Software Testing Conference 2013 presentation on With Cloud Computing Who Needs Performance Testing by Albert Witteveen.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
Practical Chaos Engineering will show how to start running chaos experiments in your infrastructure and will try to guide your through the principles of chaos.
The practical implementation of Continuous Delivery at Etsy, and how it enables the engineering team to build features quickly, refactor and change architecture, and respond to problems in production.
Presented at GOTO Aarhus 2012.
Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.
http://www.etsy.com/careers
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
Dr. Elephant is a self-serve performance tuning tool for Hadoop that was created by LinkedIn to address the challenges their engineers faced in optimizing Hadoop performance. It automatically monitors completed Hadoop jobs to collect diagnostic information and identifies performance issues. It provides a dashboard and search interface for users to analyze job performance and get help tuning jobs. The goal is to help every user get the best performance without imposing a heavy time burden for learning or troubleshooting.
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
Building Reactive Applications With Akka And JavaTu Pham
Reactive applications share four key traits: responsive, resilient, message-driven, and elastic. The Akka framework supports building reactive systems using the actor model which represents processing, storage, and communication as actors that receive messages asynchronously. Actors can send messages to other known actors, create new actors, and handle the next received message. This allows reactive applications to scale dynamically to meet demand and tolerate failures through message passing and supervision hierarchies.
You are already the Duke of DevOps: you have a master in CI/CD, some feature teams including ops skills, your TTM rocks ! But you have some difficulties to scale it. You have some quality issues, Qos at risk. You are quick to adopt practices that: increase flexibility of development and velocity of deployment. An urgent question follows on the heels of these benefits: how much confidence we can have in the complex systems that we put into production? Let’s talk about the next hype of DevOps: SRE, error budget, continuous quality, observability, Chaos Engineering.
This document discusses building full stack data analytics applications using Apache Kafka and Apache Spark. It provides an overview of agile data science principles and methodologies. It also outlines various tools that can be used in the data pipeline and stack, such as Apache Spark, Apache Kafka, MongoDB, Elasticsearch, and d3.js. It discusses considerations for data structure and access patterns, as well as climbing the data value pyramid from raw data to higher order insights.
DOES14 - Scott Prugh - CSG - DevOps and Lean in Legacy EnvironmentsGene Kim
10 Techniques for Flow & Continuous Delivery
Startups are continually evangelizing DevOps to be able to reduce risk, hasten feedback and deploy 1000’s of times a day. But what about the rest of the world that comes from Waterfall, Mainframes, Long Release Cycles and Risk Aversion? Learn how one company went from 480 day lead times and 6 month releases to 3 month releases with high levels of automation and increased quality across disparate legacy environments. We will discuss how Optimizing People & Organizations, Increasing the Rate of Learning, Deploying Innovative Tools and Lean System Thinking can help large scale enterprises increase throughput while decreasing cost and risk.
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf
There are benefits to be gained when patterns and practices from developer techniques are applied to operations. Notably, a fully automated solution where infrastructure is managed as code and all changes are automatically validated before reaching production. This is a process shift that is recognized among industry innovators. For organizations already leveraging these processes, it should be clear how to leverage Microsoft platforms. For organizations that are new to the topic, it should be clear how to bring this process to your environment and what it means to your organizational culture. This presentation explains the components of a Release Pipeline for configuration as code, the value to operations, and solutions that are used when designing a new Release Pipeline architecture.
We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover various Site Reliability Engineering topics, such as SLIs and SLOs while we explore real life examples of defining and implementing metrics in a system with examples using Prometheus, an open-source system monitoring and alert platform, to demonstrate implementation. Let's get back to some real science.
White Paper On ConCurrency For PCMS Application ArchitectureShahzad
This document discusses various approaches to implementing optimistic and pessimistic concurrency in different technologies like .NET, ASP.NET, NHibernate, and LINQ to SQL. It provides code examples and explanations of how to configure optimistic concurrency checks in database queries and handle concurrency violations. Sections cover topics like implementing optimistic concurrency for ADO.NET data adapters, ASP.NET, NHibernate mapping, and LINQ to SQL. Pessimistic concurrency is also briefly introduced along with references for further reading.
Fault-tolerant and thus distributed applications are hard, but Mesos is the ultimate breeding ground for them. Rule number one for designing such beasts is to use well-known, tried and tested, and in particular proven methods and tools as building blocks.
Video recording of the talk: https://www.youtube.com/watch?v=6O_Wuc9FUXg&index=1&list=PLbzoR-pLrL6pLSHrXSg7IYgzSlkOh132K
Startups are continually evangelizing DevOps to be able to reduce risk, hasten feedback and deploy 1000’s of times a day. But what about the rest of the world that comes from Waterfall, Mainframes, Long Release Cycles and Risk Aversion? Learn how one company went from 480 day lead times and 6 month releases to 3 month releases with high levels of automation and increased quality across disparate legacy environments. We will discuss how Optimizing People & Organizations, Increasing the Rate of Learning, Deploying Innovative Tools and Lean System Thinking can help large scale enterprises increase throughput while decreasing cost and risk.
Similar to muCon 2017 - Build Confidence in your System with Chaos Engineering (20)
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
15. "It's just one of these cases where Mars is going to give us
a new deal, and we're going to have to play the cards we
get, not the ones we want”
Jim Erickson / Project Manager at Nasa for Mars Rovers missions
21. One might rephrase “calculation and
correction of error” as “recognition of
and response to difference”.Jeff Sussna / Designing Delivery: Rethinking IT in the Digital Service Economy
50. What normal looks like? Your steady state
{
"probes": {
"steady": {
"title": "All services must be healthy before we begin",
"layer": "application",
"type": "python",
"module": "chaosk8s.probes",
"func": "all_microservices_healthy"
}
}
}
51. Add sources of information with probes
"probes": {
"close": {
"title": "Fetch the CPU usage for our service",
"layer": "application",
"type": "python",
"module": "chaosprometheus.probes",
"func": "query",
"arguments": {
"query": "process_cpu_seconds_total{job='websvc'}",
"when": "2 minutes ago"
}
}
}
52. Set the condition for change in normality
"action": {
"title": "Let's max out the CPU of a node",
"layer": "application",
"type": "python",
"module": "chaosgremlin.actions",
"func": "attack",
"secrets": "gremlin",
"arguments": {
"command": {
"type": "cpu"
},
"target": {
"type": "Random"
}
}
}
53. Before learning
$ chaos run experiment.json
[2017-10-06 17:37:33 INFO] Running experiment: System is resilient to provider's failures
[2017-10-06 17:37:33 INFO] Observing steady state: All services must be healthy before we begin
[2017-10-06 17:37:33 INFO] Steady State succeeded
[2017-10-06 17:37:33 INFO] Observing steady state: Before we kill it, our microservice should be alive
[2017-10-06 17:37:33 INFO] Steady State succeeded
[2017-10-06 17:37:33 INFO] Observing action: Let's stop our provider
[2017-10-06 17:37:33 INFO] Action succeeded
[2017-10-06 17:37:33 INFO] Observing close state: All services must be healthy before we begin
[2017-10-06 17:37:33 INFO] Close State succeeded
[2017-10-06 17:37:33 INFO] Observing steady state: Consumer should respond as if nothing
[2017-10-06 17:37:44 ERROR] Steady State failed: {"timestamp":1507304264100,"status":500,"error":"Internal
Server Error","exception":"feign.RetryableException","message":"connect timed out executing GET http://my-
provider-service:8080/","path":"/invokeConsumedService"}
[2017-10-06 17:37:44 INFO] Experiment is now complete
54. Respond to the non-functional force of
change
Do not merely correct the error
55. Adaptation
$ chaos run experiment.json
[2017-10-06 17:40:25 INFO] Running experiment: System is resilient to provider's failures
[2017-10-06 17:40:25 INFO] Observing steady state: All services must be healthy before we begin
[2017-10-06 17:40:25 INFO] Steady State succeeded
[2017-10-06 17:40:25 INFO] Observing steady state: Before we kill it, our microservice should be alive
[2017-10-06 17:40:26 INFO] Steady State succeeded
[2017-10-06 17:40:26 INFO] Observing action: Let's stop our provider
[2017-10-06 17:40:26 INFO] Action succeeded
[2017-10-06 17:40:26 INFO] Observing close state: All services must be healthy before we begin
[2017-10-06 17:40:26 INFO] Close State succeeded
[2017-10-06 17:40:26 INFO] Observing steady state: Consumer should respond as if nothing
[2017-10-06 17:40:30 INFO] Steady State succeeded
[2017-10-06 17:40:30 INFO] Experiment is now complete