This document discusses rethinking the application of Site Reliability Engineering (SRE) principles to IT Service Management (ITSM). It notes that while SRE has innovative concepts like focusing on eliminating toil and defining service level objectives, the name "SRE" implies a specialist developer role and may not apply to many ITSM contexts. It also cautions that most organizations are not at the scale of Google and a separate production operations team may not be efficient. The document proposes rebranding the SRE concept for ITSM with a new name and identifying new ways to apply key SRE principles to reduce toil and improve reliability within common ITSM frameworks and practices.
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
<p>From <a href="https://en.wikipedia.org/wiki/Site_reliability_engineering" target="_blank">Wikipedia</a>: Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.<p>
<p>Over the past year Acquia has built their own SRE team to help their products and services scale with the demand of our growing number of customers. We wish to share our experience so that others are enabled to do the same and reap the rewards.</p>
<p>This presentation will discuss how the SRE team came about at Acquia, what achievements we have made so far, and the lessons we have learned along the way. We will then show the steps on how to introduce SRE to your workplace so you can deliver more reliable and scalable services to your customers! We will specifically cover:</p>
<ul>
<li>SRE's basic concepts and history from Google</li>
<li>The management support you will need to get started</li>
<li>Introducing the idea of service level objectives and error budgets</li>
<li>Operational Responsibility Assessments as a tool to measure risk</li>
<li>Creating a Launch Readiness Checklist to standardize and improve product launches</li>
<li>Finding ideal candidates for your SRE team</li></ul>
<p>The intended audience are software engineers, system administrators, and managers that have a desire to improve how they do their work and how their products/services perform.</p>
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
<p>From <a href="https://en.wikipedia.org/wiki/Site_reliability_engineering" target="_blank">Wikipedia</a>: Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.<p>
<p>Over the past year Acquia has built their own SRE team to help their products and services scale with the demand of our growing number of customers. We wish to share our experience so that others are enabled to do the same and reap the rewards.</p>
<p>This presentation will discuss how the SRE team came about at Acquia, what achievements we have made so far, and the lessons we have learned along the way. We will then show the steps on how to introduce SRE to your workplace so you can deliver more reliable and scalable services to your customers! We will specifically cover:</p>
<ul>
<li>SRE's basic concepts and history from Google</li>
<li>The management support you will need to get started</li>
<li>Introducing the idea of service level objectives and error budgets</li>
<li>Operational Responsibility Assessments as a tool to measure risk</li>
<li>Creating a Launch Readiness Checklist to standardize and improve product launches</li>
<li>Finding ideal candidates for your SRE team</li></ul>
<p>The intended audience are software engineers, system administrators, and managers that have a desire to improve how they do their work and how their products/services perform.</p>
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
An overview of Google's Site Reliability Engineering with a view toward possible incorporation in the IEEE P2675 DevOps security standard. (Creative Commons with credit.)
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
How Small Team Get Ready for SRE (public version)Setyo Legowo
How Urbanindo small team engineering team implement Site Reliability Engineering (SRE) in their daily work life and why we choose SRE instead of ordinary DevOps.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
Cloud Native Engineering with SRE and GitOpsWeaveworks
Site reliability engineering (SRE), a model championed by Google, is a software engineering approach to IT operations. For companies striving to become cloud native and adopting modern tools such as Kubernetes, SRE best practices are crucial for success.
In this webinar, Brice, one of our seasoned Customer Reliability Engineers will show how to design a fail-proof Kubernetes platform using tried and tested SRE and GitOps methods.
He will share best practices on:
Increasing performance and ensuring scalability
Managing incident responses through disaster recovery
Designing for High Availability in Kubernetes
Achieving 360 visibility and alerts for your platform
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud MigrationFloyd DCosta
Capgemini Cloud Assessment offers a methodology and a roadmap for Cloud migration to reduce decision risks, promote rapid user adoption and lower TCO of IT investments. It leverages pre-built accelerators such as ROI calculators, risk models and portfolio analyzers and provides three powerful deliverables in just six to eight weeks:
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
How do you make DevOps magic when you aren’t Google? This talk will help whether you’re still figuring out how to create a site reliability practice at your company or you’re trying to improve the processes and habits of an existing SRE team.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
Presenter: Perry Statham
SRE Squad Leader with IBM Cloud DevOps Services
In this presentation, the IBM DevOps Services SRE team will give a brief introduction to Site Reliability Engineering, then show how they adopted its principals in their existing enterprise organization.
Cloud First does not mean you should jump into cloud without a well-constructed plan. The most successful organizations build the foundational capabilities first and create a roadmap that aligns people, technology, platforms, security, governance and culture. Many CIO’s want the benefits associated with cloud technologies but don’t necessarily know where to start and what questions to ask. This session will provide real world examples of the transformation process, challenges and successes that organizations have experienced in their cloud journey. We will define a low risk strategy and approach to achieve success in the cloud quickly. We will describe the sequencing requirements, milestones, and foundational services that need to be in place on Day 1. And finally, we will reveal the most critical and often overlooked component of your transformation strategy. Sponsored by Smartronix.
AIOps is becoming imperative to the management of today’s complex IT systems and their ability to support changing business conditions. This slide explains the role that AIOps can and will play in the enterprise of the future, how the scope of AIOps platforms will expand, and what new functionality may be deployed.
Watch the webinar here. https://www.moogsoft.com/resources/aiops/webinar/aiops-the-next-five-years
This talk explains a proven approach to assessment SRE practices for an organization. The approach uses a 9 pillar model and 7 step transformation blueprint to determine current state of SRE practices and to set a roadmap to improve SRE practices towards industry best practices.
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...Splunk
With the acceleration of customer and business demands, site reliability engineers and IT Ops analysts now require operational visibility into their entire architecture, something that traditional APM tools, dev logging tools, and SRE tools aren’t equipped to provide. Observability enables you to inspect and understand your IT stack on premises and in the cloud(s); It’s no longer about whether your system works (monitoring), but being able to task why it is not working? (Observability). This presentation will outline key steps to take to move from monitoring to observability.
In an IT context companies struggling to increase profits and often view IT as a necessary evil: one that consumes resources rather contributes to the bottom line. These organizations often don’t see value in data collection analysis or benchmarking either. However, IT can be a significant contributor when IT decisions are made after measuring and estimating both cost and return. IT data collection analysis and benchmarking continue to improve the cost of IT systems and help make decisions regarding where to spend money to stop the bleeding. As such, repeatable processes for estimating cost, schedule and risk will be addressed along with the “iron triangle” of software. The Iron Triangle looks at issues of cost, schedule, scope and quality and helps determine what must give when client increases scope, reduces schedule or reduces budget.
Additionally this presentation will address the risk adjusted Total Cost of Ownership and return an IT investment along with its consistency with long-range investment and business strategy of an organization measured against risk and key technical and performance parameters and technical debt.Finally, since this presentation will address the overriding business concerns: how much value does this software contribute to the business and is the best place to spend the money. (IT Confidence 2013, Rio de Janeiro (Brazil))
An overview of Google's Site Reliability Engineering with a view toward possible incorporation in the IEEE P2675 DevOps security standard. (Creative Commons with credit.)
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
How Small Team Get Ready for SRE (public version)Setyo Legowo
How Urbanindo small team engineering team implement Site Reliability Engineering (SRE) in their daily work life and why we choose SRE instead of ordinary DevOps.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
Cloud Native Engineering with SRE and GitOpsWeaveworks
Site reliability engineering (SRE), a model championed by Google, is a software engineering approach to IT operations. For companies striving to become cloud native and adopting modern tools such as Kubernetes, SRE best practices are crucial for success.
In this webinar, Brice, one of our seasoned Customer Reliability Engineers will show how to design a fail-proof Kubernetes platform using tried and tested SRE and GitOps methods.
He will share best practices on:
Increasing performance and ensuring scalability
Managing incident responses through disaster recovery
Designing for High Availability in Kubernetes
Achieving 360 visibility and alerts for your platform
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud MigrationFloyd DCosta
Capgemini Cloud Assessment offers a methodology and a roadmap for Cloud migration to reduce decision risks, promote rapid user adoption and lower TCO of IT investments. It leverages pre-built accelerators such as ROI calculators, risk models and portfolio analyzers and provides three powerful deliverables in just six to eight weeks:
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
How do you make DevOps magic when you aren’t Google? This talk will help whether you’re still figuring out how to create a site reliability practice at your company or you’re trying to improve the processes and habits of an existing SRE team.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
Presenter: Perry Statham
SRE Squad Leader with IBM Cloud DevOps Services
In this presentation, the IBM DevOps Services SRE team will give a brief introduction to Site Reliability Engineering, then show how they adopted its principals in their existing enterprise organization.
Cloud First does not mean you should jump into cloud without a well-constructed plan. The most successful organizations build the foundational capabilities first and create a roadmap that aligns people, technology, platforms, security, governance and culture. Many CIO’s want the benefits associated with cloud technologies but don’t necessarily know where to start and what questions to ask. This session will provide real world examples of the transformation process, challenges and successes that organizations have experienced in their cloud journey. We will define a low risk strategy and approach to achieve success in the cloud quickly. We will describe the sequencing requirements, milestones, and foundational services that need to be in place on Day 1. And finally, we will reveal the most critical and often overlooked component of your transformation strategy. Sponsored by Smartronix.
AIOps is becoming imperative to the management of today’s complex IT systems and their ability to support changing business conditions. This slide explains the role that AIOps can and will play in the enterprise of the future, how the scope of AIOps platforms will expand, and what new functionality may be deployed.
Watch the webinar here. https://www.moogsoft.com/resources/aiops/webinar/aiops-the-next-five-years
This talk explains a proven approach to assessment SRE practices for an organization. The approach uses a 9 pillar model and 7 step transformation blueprint to determine current state of SRE practices and to set a roadmap to improve SRE practices towards industry best practices.
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...Splunk
With the acceleration of customer and business demands, site reliability engineers and IT Ops analysts now require operational visibility into their entire architecture, something that traditional APM tools, dev logging tools, and SRE tools aren’t equipped to provide. Observability enables you to inspect and understand your IT stack on premises and in the cloud(s); It’s no longer about whether your system works (monitoring), but being able to task why it is not working? (Observability). This presentation will outline key steps to take to move from monitoring to observability.
In an IT context companies struggling to increase profits and often view IT as a necessary evil: one that consumes resources rather contributes to the bottom line. These organizations often don’t see value in data collection analysis or benchmarking either. However, IT can be a significant contributor when IT decisions are made after measuring and estimating both cost and return. IT data collection analysis and benchmarking continue to improve the cost of IT systems and help make decisions regarding where to spend money to stop the bleeding. As such, repeatable processes for estimating cost, schedule and risk will be addressed along with the “iron triangle” of software. The Iron Triangle looks at issues of cost, schedule, scope and quality and helps determine what must give when client increases scope, reduces schedule or reduces budget.
Additionally this presentation will address the risk adjusted Total Cost of Ownership and return an IT investment along with its consistency with long-range investment and business strategy of an organization measured against risk and key technical and performance parameters and technical debt.Finally, since this presentation will address the overriding business concerns: how much value does this software contribute to the business and is the best place to spend the money. (IT Confidence 2013, Rio de Janeiro (Brazil))
Servicedesk and IT Support Show, London 2017
Audio from the presentation:
https://soundcloud.com/diversifieduk/sits-2017-is-devops-really-changing-it-support-jon-hall-bmc
Leveraging Failure to Succeed in DevOpsSteve Brown
DevOps is typically perceived as a way to avoid failure; however, failures are steps in the right direction. Learning from failures and turning the DevOps practice into one that will lead you toward even greater success – better, faster.
Making sense of microservices, service mesh, and serverlessChristian Posta
As companies move to become digital, we can get sidetracked and distracted by some of the changes in the technology landscape. Ideally we will be harnessing technology to solve the problems we have and leverage it to deliver software faster and safer. In this talk, I'll we'll take a look at some new technology trends in the open-source communities and when and how to use them.
This session will explore both the strategic and practical aspects of how the DBA role is impacted with the adoption of DevOps. Firsthand experiences from a DBA and the organizational learning gained during these transitions will be featured during the presentation.
How to Drive Maximum Business Value from IT Investments with the Flow FrameworkTasktop
When organizations connect and measure the impact of their IT investments on the business’ strategy, business and technology leaders are able to partner more effectively and accelerate the delivery of real business value.
But to do so, these teams must be seamlessly aligned throughout the delivery pipeline, able to quickly identify and resolve bottlenecks, and work together to optimize their processes end-to-end. Flow Metrics enable organizations to measure what matters in software delivery, optimizing the flow of business value from ideation to operation and turning IT from a project-oriented cost center to a profit-generating product operating model.
Key Takeaways:
Discover how to align business and IT leaders to optimize value delivery using the Flow Framework™
Learn how to use Flow Metrics to expose bottlenecks and reveal opportunities to improve time-to-market, responsiveness to customers, and quality
Understand how to create a seamless end-to-end flow of business value in large-scale application delivery using Blueprint Storyteller and Tasktop Hub
How AI is transforming DevOps | Calidad InfotechCalidad Infotech
DevOps is a remarkable asset to start-ups. The growing technology over the last two decades has made it easier to build & scale all sizes of businesses & organizations. In this fast-paced growing technology world, DevOps has paved its way with its innovative & effective tools & practices that have turned out to be a… Continue reading.. https://calidadinfotech.com/devops-services
More and more organizations are turning to DevOps as a way of working together to improve the efficiency and quality of software delivery and start adding more value to the business. But what exactly is DevOps and what does it mean for you and your organization?
Join Microsoft Data Platform MVP Kendra Little to discover:
• What is DevOps and what benefits can it offer your organization?
• Who in your organization should be involved in DevOps?
• Why should your organization adopt DevOps?
• How can your organization start implementing DevOps?
This joint webinar with Neebula Systems CTO Ariel Gordon and DBmaestro CTO Yaniv Yehuda highlights the critical features in best practices and tools that are required to address the new challenges to your organization.
First debrief of the Outcomes of the Owasp Summit 2017 (with keynote slides and photos)
Full details at https://owaspsummit.org/
Outcomes at https://owaspsummit.org/Outcomes/
DevOps Beyond the Buzzwords: What it Means to Embrace the DevOps LifestyleMark Heckler
Session presented at CodeMash 2016.
DevOps is a hot topic, but it’s a bit ambiguous. What do developers really need to know about DevOps? What is it? What ISN’T it? What difference does it make? We’ll start by examining “DevOps”, what it means to embrace it, and the various personnel involved. We consider the potential benefits associated with a DevOps approach and the risks associated with adopting it…and with not adopting it. We take a quick look at some of the tools and platforms that can be used to implement a productive DevOps environment, including (but not limited to):
* Continuous Integration/Continuous Delivery (CI/CD) software
* Infrastructure Build Automation tools
* Virtualization, Containerization, and Cloud options
Finally, we run a live scenario using several of the tools discussed to demonstrate the key components of a DevOps-committed lifestyle. We start from nothing, using available tooling to build the target platform in a scriptable, repeatable fashion…then demonstrate effective use of CI/CD software for a more streamlined and effective software build/test/deploy cycle…and finally show how containers and cloud services form the foundation upon which everything else is built.
The number of connected devices is growing at an accelerated pace. We developers must have the knowledge & skills to help make that happen. But how? As device deployments and data collected grow exponentially, DevOps is the answer to fast, consistent, and sane systems, organizations, and developers. This session will provide a brief-but-thorough examination of key DevOps tenets and how they apply to large-scale deployments of small-scale devices and the platforms that tie them together. A live-coding demo will convert these concepts from ideas to implementations.
AMIS 25: DevOps Best Practice for Oracle SOA and BPMMatt Wright
DevOps and Cloud are transforming the software release process, one which spans multiple teams across development and operations (including testing, infrastructure management), into a collaborative process, with all teams working together to deliver solutions into production faster.
This session details how to implement a continuous delivery process for Oracle SOA/BPM projects, both on-premise and in the cloud, which transform the release process into an automated, reliable, high quality delivery pipeline that that deliver projects faster, with less risk and less cost.
It details the processes and best practices that need to be established, how to use tools to automate and govern the build, deployment and configuration of code from our first initial environment through to production.
1. Learn how DevOps and Continuous Delivery can stream-line the delivery of integration / bpm projects into production.
2. Learn how DevOps plus the Cloud service can accelerate the implementation of on-premise Oracle SOA .
3. Learn best practice for implementing DevOps or Continuous Delivery for Oracle SOA projects on-cloud and on-premise.
4. How to use tools to automate and govern the build, deployment and configuration of code from dev through to production
5. How to leverage the Cloud for Dev and Test, and the benefits this provides.
Similar to Rethinking Site Reliability Engineering for ITSM - SDI virtual event "New Ways of Working", May 2020 (20)
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge ManagementJon Stevens-Hall
BMC Engage presentation on Knowledge Management and Knowledge Centered Service, which also introduced some upcoming functionality to support KCS (delivered shortly afterwards in Smart IT).
"Aligning DevOps and IT Support in the Enterprise, Through Intelligent Swarming" - Presented at Devopsdays Edinburgh 2017 as an "Ignite" (5 minute) talk
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...Jon Stevens-Hall
Devopsdays Stockholm Ignite talk.
Summary: As DevOps grows within enterprises, products begin to be assimilated into mainstream user support processes, and DevOps teams find themselves in the traditional 3-tier support team structure.
Problems arise because of fundamental inconsistencies between DevOps philosophy and the 3-tier structure.
This presentation proposes Swarming as a better option, enabling DevOps to align more seamlessly into the enterprise, and ensuring that support issues have a minimal impact on innovation.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Monitoring Java Application Security with JDK Tools and JFR Events
Rethinking Site Reliability Engineering for ITSM - SDI virtual event "New Ways of Working", May 2020
1. Rethinking Site Reliability
Engineering for ITSM
Principal Product Manager (BMC Software) and ITIL® 4 Contributing Author
SDI “New Ways of Working” virtual event: 7th May 2020
Jon Stevens-Hall
2. A brief introduction to SRE
• First published 2016
• Free online: https://landing.google.com/sre/books/
“SRE is what happens when you ask a software engineer
to design an operations team”
• Some SREs at Google are software engineers hired in
the standard way
• Others were close to being recruited as a software
engineer, and brought other infra skills
• SREs help software developers run their services but
also expect accountability from them.
@jonstevenshall
3. Me
• ITSM veteran – 23 years
• Principal Product Manager with BMC software, for
our enterprise ITSM toolset, BMC Helix ITSM
• ITIL® 4 contributing author
• Strong focus on DevOps and new ways of working
• Asked by SDI to talk about SRE today
Twitter: @JonStevensHall
• https://www.bmc.com/blogs/author/jonhall/
• https://medium.com/@jonhall_
@jonstevenshall
4. Eliminating Toil
Subjects covered in the SRE handbook
Risk
Service Level
Objectives
Monitoring
Distributed Systems
Automation
Release Engineering
Alerting
On-Call
Handling Incidents
Postmortem SchedulingHuman Welfare
Communication
Risk
Service Level
Objectives
Eliminating Toil
@jonstevenshall
5. Defining Toil
AutomatableRepetitiveManual Tactical
Devoid of
enduring
value
Linearly
scaling
• SRE teams can only spend a maximum of 50% on work
considered to be “Toil”.
• Google’s SRE handbook describes “toil” as work which is:
@jonstevenshall
6. Service Level Objective
Availability =
Successful requests
Total requests
Availability =
Uptime
Uptime + Downtime
SLOs are not SLAs. Not all Google services have SLAs, but some do.
SLOs define appropriate technical bounds for service performance.
“99.9% of RPC calls will
complete in less than 100 ms"
“The service will be available
99.99% of the time”
@jonstevenshall
7. Error Budget
Defines how unreliable the service is allowed to be,
within a given time period.
100%
Required performance (SLO target)
Remaining error budget
Outages to-date
8. “An outage is no longer a bad thing.
It is an expected part of the process of innovation,
and an occurrence that development and SRE teams
manage rather than fear”
Outages are okay, within error budget tolerances
@jonstevenshall
9. Error Budget
99.9% SLO 0.05% outages already
Error budget remaining is 0.05%
Hence an additional outage which consumes 0.02% of the quarter’s
availability would cost 40% of the remaining error budget.
10. Balancing innovation and reliability
• Large budget: developers can take
more risks
• Budget nearly drained: developers tend
to slow down
• Budget used up: releases temporarily
halted
Error Budget shapes ways of working
Martin Vorel via Libreshot
@jonstevenshall
11. How does SRE align to ITSM as we know it?
“Site Reliability Engineering (SRE) is an
emerging IT service management
(ITSM) framework”
(First line of the report, published 1st April 2020)
@jonstevenshall
12. Will SRE replace ITSM as we know it?
“As a service management alternative, SRE
updates traditional ITSM activities with
innovative and self-organizing concepts such as…
• Management to service level objectives
• Error budgets
• Toil reduction
• Release engineering
• Monitoring/observability
• Embracing risk”
Jayne Groll, DevOps.com, 2020
@jonstevenshall
13. A counterpoint: Most of us are not Google.
“If you think spinning up an ops team and calling it
SRE is ‘an implementation of DevOps’ you’ve
swallowed the worst poison pill the DevOps talk
circuit can deal to you.”
• Unless you are Google scale, a separate
production operations team is inefficient
• DevOps showed that “throw over the wall
from Dev to Ops” was wrong
• Throwing over the wall from Dev to SRE
doesn’t improve that
The Agile Admin – Blog (2018).
@jonstevenshall
14. One size doesn’t fit all
“Toyota opens its factory doors to us again and
again, but I imagine Toyota’s leaders may also be
shaking their heads and thinking, ‘Sure, come
have a look. But why are you so interested in the
solutions we develop for our specific problems?
Why do you never study how we go about
developing those solutions?’”
Mike Rother – Toyota Kata (2009)
@jonstevenshall
16. SRE is already an established profession
Job advert for ”disruptive Fintech”, retrieved May 2020
@jonstevenshall
17. ITIL® 4 maps SRE principles to 14 practices
• Availability management
• Capacity and performance management
• Change enablement
• Incident management
• Infrastructure and platform management
• Monitoring and event management
• Problem management
• Service design
• Software development and management
• Deployment management
• Organizational change management
• Release management
• Service configuration management
• Service validation and testing
(but at a very high level)
@jonstevenshall
18. • Current service desk performance: 98% call pickup, on a target of 90%
• Agents fully utilised on calls
Example of an “SRE-like” change for IT support
Before:
• Schedule a proportion of each agent’s time to be away from calls
• Use that time for proactive reduction of toil (e.g. building self service
offerings, developing knowledge content)
• The previous 8% over-performance in call pickup is spent as “error budget”
Applying SRE thinking:
@jonstevenshall
19. • Level of change approval determined by perceived risk of change type
• Pressure from DevOps to remove approval barriers (particularly CAB)
SRE applied to Change Enablement
Before:
• Dynamically assess approval level based on confidence level in team
• Implementation success rate of previous changes
• Issues caused by previous changes
• Adherence to guardrails in the DevOps toolchain
• Minimal intervention, applied more smartly
Applying SRE thinking:
@jonstevenshall
20. • Site Reliability Engineering is a bad name for many aspects of ITSM.
• SRE is already defined (and recruited) as a specialist operations
developer role.
• Some key, novel themes have great potential but need redefinition
for much of ITSM.
Summary of issues
@jonstevenshall
21. • “Rebrand” the SRE concept for ITSM: New name, new thinking.
• Identify new ways to apply key novel principles such as reliability and toil
reduction.
• Invest in ITSM initiatives such as KCS which share good principles with SRE.
Proposal
@jonstevenshall