Performance Engineering Masterclass: Efficient Automation with the Help of SRE Concepts

Performance Engineering Masterclass
Efﬁcient Automation with
SRE Concepts
Henrik Rexed

Henrik Rexed
● Cloud Native Advocate
● 15+ years of Performance engineering
● Owner of
Producer of

What are you going to learn out of this session?
● The notion of SLI/SLO
● The importance of observability
● Introduction à Keptn
● Demo

Continuous Performance Testing
Dev Integration
Testing
Application
Testing
Test a Component Test a System Test Real World
Continuous Testing Embedded in CI/CD Pipelines

Performance is often a false automation
Several hours,
days
Build Deploy to
„Test“
Run Test
In „Test“
Manual Approval
Promote to
the next
stage
Build Deploy to
„Test“
Run Test
In „Test“
Manual Approval
Promote to
the next
stage

Performance Engineering Masterclass 6

Performance is not only about Response time!
7
Response time
Ressources
Network
Cost
Scallability
Errors

Solution : Use SRE methodology
8

Why do we need SRE?
■ Developers were focused on
innovation and agility
■ Operations on stability
■ SRE has been created to make
sure that we are building reliable
services and avoiding conflict
between Developers and
Operations
Developer Operation

SLI/SLO help to build reliability targets
Developer Operation
■ Product owners deﬁned at
a very early stages the
objectives for each services
■ SLI/SLO helps to :
■ Availability
■ performance
■ More
■ SLI/SLO helps to detect
issues before our end-users Production
speed
availability

SRE Mantra

SLI
Good Events
Valid Events
100 %
SLI
Service Level
Indicator
A key metric to for understanding the
health of a service
Example: HTTP Request Latency
# of HTTP Request
with <= 5 sec
response time
Total # of
Requests
100 %

SLO
SLO
Service Level
Objective
100 %
0 %
99
Example: Request latency will be <= 5 secs
for 99% of Requests
An objective/target we set against an SLI
100 %
0 %
SLO
# of HTTP Request
with <= 5 sec
response time

Error budget
SLO
99% Is equal to….
Error Budget
1%
One minus the availability target. SREs
and Devs work within the error budget.
1% 1000 000 (30 Days we have 1 000 000 requests)
1000 requests
30 days
Error Budget
Error Budget + Burn Down How fast are we using our budget?

Remove toil
Typical SRE day
Operations
50%
Dev
50%

How can we take advantage of
SRE mantra in Performance
Engineering?

To build SLI we need
measurements

Observability pilars
Logs Events Metric
Observability
Traces

The CNCF Landscape
https://landscape.cncf.io/

The reality…
https://twitter.com/dastbe/statu
s/1303858170155081728

Open Observability

Prometheus
Metric provider

Prometheus architecture
Kube State metrics
Node exporter
Cadvisor
Alertmanager
Scrape
Prometheus Serveur
PromQl

Prometheus is a standard
•CouchDb
•Mysql
•Oracle
•PostgreSQL
•MongoDB
•…
Database
•Netgear
•Windows
•IBM Z
•Nvidia
•….etc
Hardware
•MQ
•Kafka
•MQTT
•RabbitMQ
•…etc
Broker
•Tivoli
•Hadoop
•NetApp
•ScaleIO
Storage
•Jira
•Jenkins
•Github
•Fluentd
•Nagios
•…etc
Other

Automate
Sre Mantra

Modern DevOps
● CI/CD
● Production
● Staging

Operation at scale
• Complex
conﬁguration
• Repeated tasks
• Manual
integrations
Expensive
maintenance

Open, event- & data-driven automation for DevOps & SREs
Makes data-driven decisions based
on SLOs (Service Level Objectives)
Event-Driven task orchestration
for Multi-Stage Delivery …
Connects to any existing delivery, test, notification, ticketing, config mgmt … tool
Connects to any Observability
Platform to query metrics (SLIs)
, through open event standard
subscriptions
Deploy Test
Validat
e
… and Day 1 & Day 2
Operations
(Canaries, Remediation …)
Assess
Rollbac
k
Validat
e
Release
Scale
Escalat
e

Keptn: SLO-Driven Automation for DevOps & SREs
You
(Dev/Ops/SRE)
bring your configuration
pick your use case
SLO-Quality
Gates
Progressive
Delivery
Auto-
Remediation
Declaration GitOps SLOs Standards
shipyard SLI/SLO runboo
k
SRE
Automation
workload
Monitoring Delivery Reliability Remediation
automates configuration and provides self-service for
through event-driven process orchestration based on
connect your tools

SLOs for Data Driven Decisions at the Core of Keptn Orchestration
sli.yaml (Dynatrace)
indicators:
error_rate: "builtin:service.errors.total.count:merge(0):avg"
count_dbcalls: "calc:service.toptestdbcalls:merge(0):sum"
jvm_memory: "builtin:tech.jvm.memory.pool.committed:merge(0):sum"
slo.yaml (SLI Provider independent)
objectives:
- sli: error_rate
pass:
- criteria:
- "<=1“ # We expect a max error rate of 1%
- sli: jvm_memory
- sli: count_dbcalls
pass:
- criteria:
- "=+2%" # We allow a 2% increase in DB Calls between builds
warning:
- criteria:
- "<=10" # We expect no more than 10 DB Calls per TX
total_score:
pass: "90%"
warning: "75%"
sli.yaml (Prometheus)
indicators:
http_requests_total_sucess: http_requests_total{status="success"}
go_routines: go_goroutines{job="$SERVICE-$PROJECT-$STAGE"}
SLI Providers: Query SLIs based on sli.yaml and return individual values
Lighthouse Service: Retrieves SLIs and compares them against SLOs
...
*get-sli*
*evaluation*
count_dbcalls: 5
jvm_memory: 360MB
error_rate: 4.3% sli_y: value for Y
sli_x: value for X

Automate Approval through SLI/SLO-based Quality Gates
Build Deploy to
„Test“
Run Test
In „Test“
Manual Approval
Promote to
„Staging“
Trigger
Quality Gate
Wait for
Result
SLI & SLO
Result: success, Score: 85/100
Run Test In „Test“
w Tagging
Rt(p95) < 500ms
#ofSQLs <= 5
cpu(max)< 80%
Java GC < 2%
...
Pull SLIs for Testing time frame
Validate
SLOs
Build Deploy to
„Test“
Promote to
„Staging“
~1min
~30-60min

Keptn is extendable
https://artifacthub.io/packages
/search?ts_query_web=Keptn

Keptn integrates with other solutions
CLI / REST API

Demo
36

Quality Gate – Keptn,Prometheus et K6
SLO Evaluation &
Monitoring
Prometheus
Integration
Service
Hipster-shop
Run load test and evaluate

Is it observable
■ If you are looking for educational content on
Observability, check out:
Is It Observable

Keep in touch!
Henrik Rexed
Cloud Native Advocate
Dynatrace
Henrik.rexed@dynatrace.com
@hrexed

Performance Engineering Masterclass: Efficient Automation with the Help of SRE Concepts

More Related Content

What's hot

Similar to Performance Engineering Masterclass: Efficient Automation with the Help of SRE Concepts

More from ScyllaDB

Recently uploaded

Performance Engineering Masterclass: Efficient Automation with the Help of SRE Concepts