SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
5. SRE: WHEN
OPERATIONS IS
DESIGNED
BY SOFTWARE
ENGINEERS
Modern Product Development requires more
functionality introduced more frequently,
creating more complexity and more support
activities.
Site reliability engineering (SRE) is part of the
solution: a discipline that incorporates aspects of
software engineering and applies that to
operations whose goals are to create ultra-
scalable and highly reliable software systems.
Applying principles of computer
science and engineering to the design
and development of highly available
systems
Proactively finding ways to make
systems more scalable, reliable, and
efficient until systems reach “desired
reliability targets”
Spanning a broad portfolio of
software (applications, databases,
cloud services) and hardware
(network, data-center) assets
SREs are engineers ... … focused on reliability ... … while operating services
14. • Service Level Agreement
(SLA)
• Defines the service availability for
a customer and the penalties for
breaking that availability
• Question: what happens if the
SLOs aren’t met?
SRE usually not involved
• Service Level Indicator (SLI)
• Metrics over time which inform about
the health of a service
• Examples:
Request latency
Error rate
System throughput
Availability
• Service Level Objective (SLO)
• Agreed upon bounds for how
often SLI’s must be met
• Examples:
LI ≤ target
lower bound ≤ SLI ≤ upper bound
SRE MEASURMENTS
• SLIs and SLOs are the prescriptive way in which SRE practices the
DevOps principle of "measure everything". Implementing SLOs
also forces collaboration between product owners and systems
operators, adhering to the DevOps principle of "break down
organizational barriers".
16. SRE Effectiveness?
• SLA Compliance
• System Compliance Profile
• MTTR
• Problem or Bug Age
• Incident to Unique Root-
Cause Ratio
• Toil to Overall Effort Ratio
• Service Performance (e.g.
Page LoadTimes, Network
Latency, etc.)
• Infrastructure & Cloud
Efficiency
• Service ProvisioningCycle
Time
• Service Automation Ratio
Given their breadth of scope, it becomes important to define performance and success metrics upon which the SRE is
evaluated
EXAMPLES
17. SRE Getting Started
Think big… …start small… …scale fast.
Talent, Organization and Culture
Align on the portfolio of Product
Development services available and identify
health indicators
Web app, MW services, SAP HR, etc.
Identify product owner & end-customer
What reliability expectations do they have?
(availability, latency, etc.)
What indicators and mechanisms do you use
to measure health today?
Identify value potential & path forward
Dependency on other services
Are reliability expectations realistic?
Does the team have the right telemetry in
place to measure E2E health
Is team empowered and skilled to make
changes to improve reliability?
Set the Strategy Begin Implementation Transform the Organization
Select Product Development services that
have the greatest need for reliability to the
business – availability, stability, and
performance
Consider agility and viability constraints
Service criticality (Maintenance state/EOL,
critical to core, strategic)
Telemetry state, productivity tools
Organizational considerations –
Consolidating multi-tiered groups into a
single multi-modal group
Human capital strategy
Dependency on other services
Assemble pilot SRE team for identified
Product Development service, define
operating model, productivity measures,
start running
Reflect on pilot team achievements across health and
reliability metrics – MTTR, Availability, Performance,
Incident ratios
Trend data over 30, 60, 90 days
Analyze SRE backlog – big rock projects
Talk about toil (work humans don’t wish to do)
Fine-tune, continue to improve
Initiate broader assessment and selection of Product
Development services based on business need and
viability (functional and technical)
Strategize SRE specialties based on nature of Product
Development service portfolio to scale while
remaining lean
SRE for custom web applications
SRE for storage infrastructure
SRE for all packaged back-office solutions
CONSIDER IMPLEMENTATIONOF SREAS A CULTURALJOURNEY
19. You can mobilize your ADOP toolset in less than 48 hours with 3 easy steps
through our self-service portal
19
What ADOP Can Do for you
DevOps processes on the ADOP integrated tooling environment have been
known to reduce delivery costs substantially
The platform support projects of all sizes - both enterprise-scale or smaller
projects at a low cost & flexible subscription model
ADOP includes ready-to-go pipelines and infrastructure automation branded
cartridges for hundreds of technologies
ADOP Support both Agile and Waterfall projects by driving increased
productivity, quality, and lower risk
ADOP: ACCENTURE DEVOPS PLATFORM
20. WHAT CAN YOU DO WITH ADOP?
The platform is designed around technology extensions and re-usable components called cartridges, which further accelerate DevOps enablement.
Document and
Manage Project
Scope
Track Project
Progress
Build Code
Artifacts and
Products
Deploy your
Code to Any
Environment
Test your
System
Enforce
Security Policy
“Install”
Accenture Best
Practices
21. 2014
2015
2016
2017- 2018
ADOP/Enterprise History
Launched Managed Jira within
ALM Factory in Hoff Data Centre
Re - Platforming to AWS cloud
ADLM merges with ADOP
ADOP CI/CD Offering
Accenture DevOps Platform
Projects using
CI/CD
215+ on
300+ Masters
560+
Clients supported by
ADOP SaaS
Confluence
21.5K in last 3 months
Jira
45K+ Total
Users
11K+ Active in
last 3 months
27M+
LOC Analysis total
17K+
Jenkins Job
weekly
Cloud
4 Clouds Account
330 EC2
1000 Containers
300TB data
500+ Security groups
Accenture Security Compliant
600+
Tickets processed monthly
PaaS capabilities
Self Service Capabilities
BY THE NUMBERS….
Editor's Notes
If you’ve not heard the term Site Reliability Engineering (SRE), it’s worth exploring. The term originates from Google who actually invented it as a role name around 14(!) years ago as part of reinventing their approach to IT Operation
There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. We keep on turning and making new combinations indefinitely; but they are the same old pieces of colored glass that have been in use through all the ages.- Mark Twain, a Biography
In truth
We emphasis Dev part of lifecycle
This is another crack at it
Renaissance of Ops Arch
Like New IT – not all of it is that new, but it’s helpful to have a reason to change to adopt it and avoid “why weren’t you doing this before?”
DevOps a loose generic set of principles and SRE an advanced explicit implementation. Andrew CS
Named, defined, actually measured
Quarterly surveys of Google’s SREs show that the average time spent toiling is about 33%, so we do much better than our overall target of 50%.
Best in class agile tools
Mobilisation expertise
24/7 support
ADOP/E consulting/soliutioning offering
Solutioning service offering
Most valuable product – we
There is nothing in Accenture that comes close to ADOP
Grassroots internal business – roots that are watered by the basics – good ideas- great ideas – hardwork and people who are just really damn good at what they do.
Be proud – be very proud of what you are. You are
You are the best-in-class, cutting edge central engine that generates millions in USD in revenue across the globe continuously and sustainable.
You are the team that gets thousands of our people in hundreds of cities across the globe to do better work and let them maybe even home on time to live their lives. When we, Accenture, do better work – that means the whole world does better world. People in the medical industry searching for and delivering cures, people in media – getting better information to people in father reaches, people in the resources industry – powering the nations, people in governments, people in groceries, people all over the world doing everything. Make no mistake, it is true that DevOps and modern engineering is more than technology – it is a new paradigm. We are the pioneers showing the rest of the world the right way to get work done in the modern computing age. Our excellence is their excellence. Be proud, and be wary like it or not, everyone of us in this room must raise to this occasion and this opportunity. We must embody the progressive spirit of New IT – team work, collaboration, relentless dedication to improvement.
That is what this week is about. How do we come together as a team and rise to this expectation. I want to set expectation now that all of you actively engage this week with open yet critical minds, as well as mutual respect. So to get things started, lets all introduce ourselves. I know many of you already know many of you, but lets give everyone their moment here and maybe we learn something new.