When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
An overview of Google's Site Reliability Engineering with a view toward possible incorporation in the IEEE P2675 DevOps security standard. (Creative Commons with credit.)
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
How do you make DevOps magic when you aren’t Google? This talk will help whether you’re still figuring out how to create a site reliability practice at your company or you’re trying to improve the processes and habits of an existing SRE team.
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
An overview of Google's Site Reliability Engineering with a view toward possible incorporation in the IEEE P2675 DevOps security standard. (Creative Commons with credit.)
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
How do you make DevOps magic when you aren’t Google? This talk will help whether you’re still figuring out how to create a site reliability practice at your company or you’re trying to improve the processes and habits of an existing SRE team.
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
Presenter: Perry Statham
SRE Squad Leader with IBM Cloud DevOps Services
In this presentation, the IBM DevOps Services SRE team will give a brief introduction to Site Reliability Engineering, then show how they adopted its principals in their existing enterprise organization.
<p>From <a href="https://en.wikipedia.org/wiki/Site_reliability_engineering" target="_blank">Wikipedia</a>: Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.<p>
<p>Over the past year Acquia has built their own SRE team to help their products and services scale with the demand of our growing number of customers. We wish to share our experience so that others are enabled to do the same and reap the rewards.</p>
<p>This presentation will discuss how the SRE team came about at Acquia, what achievements we have made so far, and the lessons we have learned along the way. We will then show the steps on how to introduce SRE to your workplace so you can deliver more reliable and scalable services to your customers! We will specifically cover:</p>
<ul>
<li>SRE's basic concepts and history from Google</li>
<li>The management support you will need to get started</li>
<li>Introducing the idea of service level objectives and error budgets</li>
<li>Operational Responsibility Assessments as a tool to measure risk</li>
<li>Creating a Launch Readiness Checklist to standardize and improve product launches</li>
<li>Finding ideal candidates for your SRE team</li></ul>
<p>The intended audience are software engineers, system administrators, and managers that have a desire to improve how they do their work and how their products/services perform.</p>
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
How Small Team Get Ready for SRE (public version)Setyo Legowo
How Urbanindo small team engineering team implement Site Reliability Engineering (SRE) in their daily work life and why we choose SRE instead of ordinary DevOps.
In this presentation I will speak how are the SRE and DevOps, what is a reliability. Also about the reliability approach in Competitive Gaming in Wargaming and show a few cases.
According to Google, SRE is what you get when you treat operations as if it’s a software problem. In this video, I briefly explain what is and isn't toil, how to identify, measure and eliminate them.
Youtube channel here: https://youtu.be/EgpCw15fIK8
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
According to Google, SRE is what you get when you treat operations as if it’s a software problem. In this video, I briefly explain the term SRE (Site Reliability Engineering) and introduce key metrics for an SRE team SLI, SLO, and SLA.
Youtube Channel here: https://www.youtube.com/playlist?list=PLm_COkBtXzFq5uxmamT0tqXo-aKftLC1U
Hidden Costs of Chasing the Mythical 'Five Nines'DevOpsDays DFW
“Five Nines” refers to the five nines in 99.999% available that is often synonymous with highly available. Does every highly available service require five nines? Not by a long shot. Yet the general state of the practice is to chase after this typically unrealistic goal almost blindly in many cases, often leading to unnecessarily high costs in both operational and development resources. Even less aggressive availability goals are often over-specified compared to true business drivers.
This talk will cover:
* The history of “five nines”
Common reasons why many organizations often inadvertently over-specify availability requirements
* The costs of such over-specification
* How service agility is negatively affected
* Examples of highly available systems with reasonable availability requirements
* Techniques on how to avoid over-specification based on Site Reliability Engineering principles
* Ways to spend your Error Budget (once you have one) most effectively
Applying these techniques should result in a more cost-effective service that keeps end users and management happy, and fewer alerts to the on-call DevOps engineer.
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
Presenter: Perry Statham
SRE Squad Leader with IBM Cloud DevOps Services
In this presentation, the IBM DevOps Services SRE team will give a brief introduction to Site Reliability Engineering, then show how they adopted its principals in their existing enterprise organization.
<p>From <a href="https://en.wikipedia.org/wiki/Site_reliability_engineering" target="_blank">Wikipedia</a>: Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.<p>
<p>Over the past year Acquia has built their own SRE team to help their products and services scale with the demand of our growing number of customers. We wish to share our experience so that others are enabled to do the same and reap the rewards.</p>
<p>This presentation will discuss how the SRE team came about at Acquia, what achievements we have made so far, and the lessons we have learned along the way. We will then show the steps on how to introduce SRE to your workplace so you can deliver more reliable and scalable services to your customers! We will specifically cover:</p>
<ul>
<li>SRE's basic concepts and history from Google</li>
<li>The management support you will need to get started</li>
<li>Introducing the idea of service level objectives and error budgets</li>
<li>Operational Responsibility Assessments as a tool to measure risk</li>
<li>Creating a Launch Readiness Checklist to standardize and improve product launches</li>
<li>Finding ideal candidates for your SRE team</li></ul>
<p>The intended audience are software engineers, system administrators, and managers that have a desire to improve how they do their work and how their products/services perform.</p>
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
How Small Team Get Ready for SRE (public version)Setyo Legowo
How Urbanindo small team engineering team implement Site Reliability Engineering (SRE) in their daily work life and why we choose SRE instead of ordinary DevOps.
In this presentation I will speak how are the SRE and DevOps, what is a reliability. Also about the reliability approach in Competitive Gaming in Wargaming and show a few cases.
According to Google, SRE is what you get when you treat operations as if it’s a software problem. In this video, I briefly explain what is and isn't toil, how to identify, measure and eliminate them.
Youtube channel here: https://youtu.be/EgpCw15fIK8
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
According to Google, SRE is what you get when you treat operations as if it’s a software problem. In this video, I briefly explain the term SRE (Site Reliability Engineering) and introduce key metrics for an SRE team SLI, SLO, and SLA.
Youtube Channel here: https://www.youtube.com/playlist?list=PLm_COkBtXzFq5uxmamT0tqXo-aKftLC1U
Hidden Costs of Chasing the Mythical 'Five Nines'DevOpsDays DFW
“Five Nines” refers to the five nines in 99.999% available that is often synonymous with highly available. Does every highly available service require five nines? Not by a long shot. Yet the general state of the practice is to chase after this typically unrealistic goal almost blindly in many cases, often leading to unnecessarily high costs in both operational and development resources. Even less aggressive availability goals are often over-specified compared to true business drivers.
This talk will cover:
* The history of “five nines”
Common reasons why many organizations often inadvertently over-specify availability requirements
* The costs of such over-specification
* How service agility is negatively affected
* Examples of highly available systems with reasonable availability requirements
* Techniques on how to avoid over-specification based on Site Reliability Engineering principles
* Ways to spend your Error Budget (once you have one) most effectively
Applying these techniques should result in a more cost-effective service that keeps end users and management happy, and fewer alerts to the on-call DevOps engineer.
In Agile Development, Testing is meant to be a part of the development process, right along with coding, but many “Agile Teams” are missing this vital component and experiencing degregated quality. In this presentation, we will discuss how to integrate Agile Testing in Kanban processes by discussing the following:
• Introduction to Agile and Lean
• How testers add value to cross-functional Agile Development Teams
• How testers participate in Agile ceremonies
• How to test in an Agile Environment
• The Four Environments (Dev, Test, Stage, Production)
• The types of testing that occurs in each environmen
XBOSoft runs through the Top 10 Agile Metrics revealing the most fundamental data points Agile methodology requires to work effectively, and will put you on the highly targeted path to successful implementation of your Agile processes.
XBOSoft and Go2Group run through the top data points you should be measuring in your Agile Workflow. We’ll show you what to track, when and how often, and most importantly – why. Many believe that metrics are useless, but unless you measure, how can you systematically improve or know how you are doing? And with velocity as an overarching objective in agile, you should be tracking other things so that you know what else you could be impacting by going faster. But, with all the metrics so readily available to us today, how do we filter through to the most meaningful?
Agile Transformation: People, Process and Tools to Make Your Transformation S...QASymphony
Many companies are currently going through Agile Transformation or thinking about making the transition to agile. While moving to agile can create great opportunity for organizations, the journey to get there can be highly challenging. If you don’t have the right people, process and tools in place, the true benefits of agile may not be recognized. In this webinar, Andrew Stickland, Head of Client Services, for Clearvision and Kevin Dunne, VP of Business Development and Strategy for QASymphony will discuss the best practices for making the agile transformation. In this webinar, we will try to answer the following questions:
- Who are the people I need in place?
- What are the core processes that I need to change?
- What tools do I need?
View the On-Demand webinar here: http://pi.qasymphony.com/agile-transformation-best-practices-webinar-lp060?utm_source=slideshare&utm_medium=slideshare&utm_campaign=Agile%20Transformation%20Webinar
Can you process 10 trillion logs per day software architecture conference 2015Sumo Logic
Built on AWS, Sumo Logic’s multitenant machine data analytics service has scaled to query over 10 trillion logs per day. Christian Beedgen, Sumo Logic’s cofounder and CTO, will walk you through the planning and execution of a massive SaaS architecture and key insights he had along the way.
Topics include:
- a short history of scale
how we have needed to scale incrementally by several orders of magnitude since 2010
- how to recover from being an enterprise software engineer the realization that arguing with customers about Solaris vs Linux, and RAID 6 vs RAID 10 when selling them software is a waste of time; nobody wants to know how to run your system, users want to actually use your system; how building services is a way out of the enterprise software conundrum of having to manage increasingly complex systems is dragging users down; how the cloud turns every programmer into a datacenter architect
- herding microservices
a look at Sumo Logic’s microservices architecture; why we went this way; what we had to build to manage the herd 4 years ago; what we could today take off the shelf; how any real system service architecture diagram looks like spaghetti; how we deal with this at scale in operations
- factoring and refactoring on a new level, or how everything old is new again
maybe our OO skills are still useful; programmable infrastructure is still a program; any program benefits from factoring; any program benefits from refactoring; any system should be highly cohesive and loosely coupled; guess what, this still applies, but at a +1 higher layer of abstraction
- when not to scale
scaling out is great; scaling out in light of state is a bad idea; data and locality fragmentation; fractal horizontal scaling using partitioning and affinity; how to manage this operationally at runtime; musings on copy and paste scaling
Cloud architects – if you’re looking to improve scalability and performance, this session will share successes (and failures!) applicable to your own infrastructure.
Patching is Your Friend in the New World Order of EPM and ERP CloudDatavail
Historically, patching was an IT effort to stay on the support path or remove vulnerabilities. Today, in the EPM Cloud market, patching is so much more. This presentation will review several case studies of how clients received free capacities in their patches. Be a hero and make business change.
What is the best way to measure DevOps performance? There are many ways that people have tried to measure productivity in software delivery in the past -- what works and what doesn’t? In this webinar, Dr. Nicole Forsgren and Robert Reeves will present some lessons learned about measuring software delivery and why it’s important. The webinar will also highlight the key factors driving DevOps performance and offer a preview to some of the challenges on the horizon.
Tune into this webinar to learn about:
Flaws in previous attempts to measure performance
What really matters (hint: focus on outcomes)
The four measures that are key to delivery performance
The *big* difference between high performers and the rest
Why maturity models don’t work
What high performers can do for their organizations
What challenges are up next in technology transformations… things like data, serverless, and security.
You have a roadmap of how to bring your next digital innovation to market, designed to transform your business model for the future. But if the first stop on the journey of development isn’t incorporating plans for assessing quality of the end product, evaluating your overall processes, and tracking the on-going health of your development, then you are sure to discover many more costly pitfalls along the way.
During this free on-demand webinar, you will learn:
What is digital transformation and why should we care about it? (Part 1)
How to change in the digital era with preserving the quality? (Part 1)
4 main steps that will help you improve quality while going digital (Part 2)
Serverless Days Helsinki 2019 Rolf Koski - Business Driven AvailabilityRolf Koski
This talk concentrates on understanding, what issues are at play, when operating on systems run on public clouds. This talk should get you thinking, why service levels are not supposed to be thought as a sequence of 9s, but how to take more holistic approach and how to think of investing in the resilience the correct amount before going live and running in production. Also it is equally important to understanding the human element, which is where most of the errors occur in any case and being able to minimize the impact and occurrence of the human based errors. The key takeaway in this talk is to understanding that everything can and will eventually fail and how to approach your design in such a way, that you are able to handle those situations gracefully
What is DevOps? How can it impact my Customers and my BusinessQualitest
QualiTest and Kubisys help clarify and explain what DevOps can do for you and your business. Experts will shed light on the purpose, the target, the goal and how DevOps can improve your testing process.
For more information visit: www.QualiTestGroup.com
User expectations have changed over the last decade. Customers today expect access to their applications and data from all devices (mobile, laptop, desktop, tablet, etc.) with similar performance from any of those devices at all times of the day. In a world of growing complexity where architects and application designers are dependent on 3rd party providers to delivering part (or at time entire) of the application how does one ensure consistent delivery of performance. This presentation provides a view of some of the challenges involved and how not to make costly mistakes.
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
This is an introduction to Chaos Engineering - the Art of Breaking things in Production. This is conducted by two Site Reliability Engineers which explains the concepts, history, principles along with a demonstration of Chaos Engineering
The technical talk is given in this video: https://youtu.be/GMwtQYFlojU
This presentation includes an evaluation of Facebook Messenger, in terms of Human Computer Interaction. This is under the module CS-4242 of the Department of Computer Science and Engineering.
This presentation is on the basics of cyber security and cloud computing, where it also addresses the aspects ethical hacking in detail.
The url of the live presentation: http://syscolabs.lk/blog/cyber-security-and-cloud-computing/
The presentation is based on Competing in Hackathons. It includes the basic section like what are hackathons, how t look for them, and how to compete in them. This presentation also includes the core factors which should be included in a Hackathon Pitch Presentation
When to Stop Testing. (The Exit Criteria in Software Testing)
Slides adopted from this blog. "http://www.softwaretestinghelp.com/"
The contents are as follows:
What is testing
Testing flowchart
Why we can’t stop testing
How confident about the test runs
Define the exit criteria
This report is based on the internship experience I had during my time of internship. The relevant details of the internship program are available in the cover page. This report contains three main chapters namely, Introduction to the Training Establishment, Training Experience and Conclusion. In the following paragraphs, what each chapter contains is explained briefly.
The first chapter is titled, “Introduction to training establishment” and it contains information about the organization that I had my training at.
The second chapter includes information related to the training experience I had, during my time of stay at the training establishment.
The final chapter is the conclusion of the report, where it contains a summary of the training experience mentioned in chapter 2 and how all these training experiences affected my life and career and it distinguishes the university life from the training life, by clearly mentioning what I gained as an intern in that company.
This slide set contains a basic understanding on object oriented programming and its design concepts.
The Agenda would be
Objects (Instances)
Classes
Advantages of OOP
Disadvantages of OOP
Let’s Design an OOP Solution
OOP Concepts
UML - Unified Modelling Language
UML Syntax
Associations
Inheritance
Cohesion and Coupling
Revolutionizing digital authentication with gsma mobile connectKeet Sugathadasa
This is the presentation that was conducted at the Colombo Identity and Access Management User Group Meetup on the 7th of September 2017.
The Title is "Revolutionizing Digital Authentication with GSMA Mobile Connect"
Speaker is Keet Malin Sugathadasa. He is an undergraduate of the Department of Computer Science and Engineering, University of Moratuwa.
Topics Addresses in this Presentation:
1) Problems with Current Authentication Solutions
2) Introduction to Mobile Connect
3) The Mobile Connect Flow
4) Discovery API
5) Mobile Connect API
6) Level of Assurance (LoA)
7) Mobile Connect and OpenID Connect
8) The WSO2 Identity Server
9) Mobile Connect Demonstration
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
2. Presented By
Keet Malin Sugathadasa
Associate Tech Lead at Cognite
More than 3 years of experience in
various roles related to Software
Engineering
Contributor to NPM and
Stackoverflow
Research Interests –Cyber
Security, Cloud Computing,
Distributed Computing.
3. AGENDA
• What is Site Reliability Engineering (SRE)
• The 5 Pillars of SRE
• SLOs, SLIs, SLAs
• Error Budgets
• Toil
• Ensuring Successful operations of a
production system
4. What is DevOps
Like Agile came in to remove the gap between BA &
Dev, DevOps made the gap between Dev & Ops go
away
5. What is SRE?
• DevOps has been a community built set of practices, a culture;
• while SRE was groomed inside Google as a secret sauce.
9. • SRE teams share ownership of production with
developers
• SRE teams get involved in development at very early
stages
• But products may not start with SRE support at first.
When onboarding, following items get checked
• System architecture and interservice dependencies
• Instrumentation, metrics, and monitoring
• Emergency response
• Capacity planning
• Change management
• Performance: availability, latency, and efficiency
Reduce Silos
11. Blameless Postmortems
• When things have actually gone bazooka,
who’s fault is it?
• Answer: Nobody’s. It's the system’s fault.
It allowed people to act that way!
• Ask WHY not WHO!
If nobody is blamed, people open up, and
then the root cause cascade opens up.
12. Agility[Devs] vs Stability[Ops]
• What is availability?
• Clear definitions
• How available you want to be?
• Clear numerical indicators
• What to do when availability is
not met?
13. SLI - SLO - SLA : Service Level what?
Service Level Indicator: A metric aggregated over time, ( 90th percentile, median )
• Batch throughput
• Failures per request
• Is the ratios of errors to total number of requests received in last 5 minutes < 1%?
• Request latency
• Is the average latency of requests in last 5 minutes < 300ms?
• Is the 90th percentile of the latency of requests in last 5 minutes < 300ms?
Service Level Objectives: Number which SLI needs to be
• Is above indicator is YES 99.9% of the time?
• Monitor the SLIs over a long time and decide this
Service Level Agreement: A legal agreement
• The the level of reliability I promise & what will I do if I do not
• Usually based on SLOs but a business agreement
14.
15. Risk and availability
• 100% availability is impossible.
• Each 9 you add to the SLO,
increases your cost
• Each 9 you add, you lose your
comfort
16. Error Budgets
• Once you decide the SLO, you get X number of minutes to go unavailable.
• X is your Error Budget
• If you reach that budget, you cannot release new features anymore
• Under AND over spending is bad.
19. Gradual change
• Updates should be pushed as canaries, not as bulk version changes
• Less code change means lesser mean time to recover on failure
• Rate of change would depend on selection of SLO
21. Toil
Toil is the manual repetitive work tied to running in PROD ( which can be
automated )
22. Toil & Toil budget
SREs actively measure Toil. Toil budget should be
around 30% to 50%
If toil is not kept at its margins, it fills up to 100%
easily
But a little amount of toil is not harmful.
• Automation might be harder than the manual
work
• Helps newcomers to orient themselves