20. STACK & VALUE STREAM MAP
GoCD
Kubernetes
GIT
Invision
JIRA
Docker Registry
S PP P
Spring Config Server
0.5-5 Hrs 0.5-5 Days 40 Min
~1-5
Day
s
Value Add Non Value Add
35. UI TEST – SINGLE PAGE APP
36
UI Rendering
Integration
API Gateway /
BFF
API Gateway /
BFF
Mock Mock
App 1 App 2
Public API
36. 37
PRE PROD PROD
Unit Tests
Integration Tests
Component Tests
CI
UI Integration
UI Rendering
End to End
Fault Injection
STAGING
3rd
Party
Mock
37. Unit Tests
Integration Tests
Component Tests
End to End
UI Integration
UI Rendering
Fault Injection
3rd
Party
Mock
3rd
Party
Sys
CI STAGING
PRE PROD PROD
40. Unit Tests
Integration Tests
Component Tests
End to End
UI Integration
UI Rendering
Fault Injection
3rd
Party
Mock
3rd
Party
Sys
CI STAGING PRE PROD
Real Integration
CDC
Synthetic Transactions
PROD
41. Unit Tests
Integration Tests
Component Tests
End to End
UI Integration
UI Rendering
Fault Injection
Real Integration
CDC
Synthetic Transactions
3rd
Party
Mock
3rd
Party
Sys
3rd
Party
Sys
CI STAGING PRE PROD
PROD
50. TOLERANT READER & MAGNANIMOUS WRITER
be conservative in what you do, be liberal in
what you accept from others.
-- Jon Postel
&
51. LOWERING RISK
• Use each stage to lower risk
• Test appropriate and dedicated concerns
• Deploy as often as possible
• Accept reasonable risk
• Be able to recover & rollback
• Things will fail, so do QA in production
Few people create different functionality and want to give the users a great experience
Every change to anything can potentially break things
More moving parts: more risks
Larger systems – more live users – more things to loose
More deployments: more chances to break
Developing software in a non breaking way
Traditionally different people dev vs ops
Tension fast delivery <-> stability
large systems often justified by large amount of users
Every change to anything can potentially break things
More moving parts: more risks
Larger systems – more live users – more things to loose
More deployments: more chances to break
Developing software in a non breaking way
Traditionally different people dev vs ops
Tension fast delivery <-> stability
large systems often justified by large amount of users
Every change to anything can potentially break things
More moving parts: more risks
Larger systems – more live users – more things to loose
More deployments: more chances to break
Developing software in a non breaking way
Traditionally different people dev vs ops
Tension fast delivery <-> stability
large systems often justified by large amount of users
The change train is inevitable
Its coming or you business is dying
Old way: comprehensive testing & low releases
Old wary of perceiving operations
Roles like Deployment Manager
Misconception: even higher risk
The reason for DevOps
Opportunity Costs = missed out value you would have gained if you took an action (e.g. releasing the feature)
Time is money – benjamin franklin
Validation of you Assumption and Value Propositions
More deployments = Smaller deployments = less risk & more often practiced
Shape it
Value Proposition
How we were remaning speed while managing risk
Remain speed
Minimize risk
Disrupting business without disrupting users
> how am I making sure my stuff works in prod (incompatblity but other aspects aswell)
How to bring things into live
How make sure it works
How to realize its not working -> because you cant prevent all problems you better make sure yall notice when shit goes down
Its size and complexity impose big amout of complexity onto us:
Slow
Loading / Deploy times
Test times
Side Effects
Higher time to market
Becomes ‘bad’ legacy
Non replacability / ugradability
Higher integration cost
Loosely coupled serivces
Much more volatile
Different lifecycles. Some change very often some change not at all anymore
Dependency management Integration not on compile team, but on run time
first topic I want to discuss how a pipeline should look like, whats the main requirements on new CD systems and a few patterns
Value stream mapping, artifact flows through the pipeline on the live.
Lean Poka-yoke [poka joke] is a Japanese term that means "mistake-proofing" or “inadvertent error prevention”. Lean manufactoring
Lower the risk of an error or make it obv that it wont reach the customer
Each stage provides increasing confidence, usually at the cost of extra time. Early stages can find most problems yielding faster feedback, while later stages provide slower and more through probing.
These can include performance, security, or usability issues.
The pipeline is connecting all roles from dev, ops, qa, ux, security champion etc
Keep it as lean as possible: master branch development (don’t want to get that complexity)
Cross functional team is end to end responsible
Ochestrates the build, test and deploy actions
Pipeline and Artifacts as first class citizen
Only build packages/artifacts once
Gocd
Lambdacd
jenkins2
Value stream mapping
Only build packages/artifacts once
-> Docker, rkt, s3, artifactory
No semantic verisoning -> build number
Communitate with references of
Along with the build it also needs instructions to deploy it
No track of incomabilities – manually manage it
Pipeline as code
Setting up pipeline is part of the task
Pipeline needs to be part of the code
Scripts need to live in the repo
https://blog.spinnaker.io/codifying-your-spinnaker-pipelines-ea8e9164998f
https://medium.com/@pavanbelagatti/declarative-continuous-deployment-pipelines-9d30807f7d0d
https://dzone.com/articles/declarative-continuous-deployment-pipelines
https://github.com/spinnaker/dcd-spec
https://www.gocd.org/2017/07/10/gocd-vs-spinnaker/
https://www.upguard.com/blog/articles/declarative-vs.-imperative-models-for-configuration-management
Env Variable Injection
Spring Cloud Configuration Server
Netflix Archaius
Zookeeper
Hashicorp Consul
Etcd
No snowflake servers:
Abstraction layer over deployment actions
spinnaker
Kubernetes
Apache Mesos
Docker Swarm
Creating and destroying environments should be abstracted
Deploy the same way to every environment—including development. This way, we test the deployment process many, many times before it gets to production, and again, we can eliminate it as the source of any problems.
Stateless services
& services need to keep the code
Waiting
transportation
lean manufactoring
waiting, testing, merging, dead code
-> no branch, direct deploy
No independance,
Distributed monilith (deployment wise)
a single service should be as independent as possible
- independent evolutiuon
- independent deployment
Independent evolution, no lockstep
Brings speed but also more risks
Independent value -> Independent evolution
Avoid locksteps (explicit or implicit)
Small changes (Speed)
keep it simple (deployment coordination/ dist monlith)
Problems: integration later
Also assync
Deploying, testing, merging, waiting
Automate everything above
Make it work!
Integration not on build time, but
Whats different:
Higher lever of automation
More integrational aspects
Find more and easier mistakes in ealier stages
Loosely coupleing, no lockstep deployment independent evolution. -> enables speed
Cone of risk: harden the software through the journey
- With different kinds of tests and responsibilities
On more and more realistic environments
Less mocks though the pipeline
Difference is: amount of interface/network code in comp to domain logic is higher
Key difference when looking at the differences:
Interfaces and network connections are more prominent and need activly taken care of. And are also part of the own product.
Test responsilibity separate
Key difference when looking at the differences:
Interfaces and network connections are more prominent and need activly taken care of. And are also part of the own product.
Test responsilibity separate
By instantiating the full microservice in-memory using in-memory test doubles and datastores it is possible to write component tests that do not touch the network whatsoever.
This can lead to faster test execution times and minimises the number of moving parts reducing build complexity.
However, it also means that the artifact being tested has to be altered for testing purposes to allow it to start up in a 'test' mode. Dependency injection frameworks can help to achieve this by wiring the application differently based on configuration provided at start-up time.
Like regular single service development – maybe with more emphasizes on interfaces
Real machines (or virtual ones)
Containerization
Integration
Configuratio
Rendering (against many devices?)
Technical 3rd party systems
Containerization works?
Service Discovery
DNS
Configuration management
DB connection
Integration with mocked 3rd party systems
Test harness = more control over data
Even though its timeing out or giving an errornous response I want to have it working
Two Concerns:
- Tests Basic Integration (End to End)
- Tests UI Functionality / Rendering (UI)
- API Gateway? Mock it
- Limited end to end testing
Test the test (prerequisites)
UI tests
-> blocking operation? Takes long time
Rolling updates help -> should work in production
idempotent testing
test snapshots
create tests data flexible
Test State:
First Integration on the cluster
Running Services against each other (only SCS)
Test Harnesses for 3rd P systems
UI Tests:
Test Connection
- You’ve tested that your part works on similar tech like production
Now testing:
Real Integration
Monitoring
Constant Running
Realisitic Environment
Comparing a test and production environment to each another is like comparing a zoo to nature [..]. They may have similarities but the differences are plentiful.
3rd Stage:
Real Integration with 3rd P Systems
CDCs
Test Monitoring - Synthetic Transactions
http://www.gartner.com/id=2272416
https://blog.appdynamics.com/product/synthetic-vs-real-user-monitoring-a-response-to-gartner/
Chaos Monkey?
Json stream processing -> not visible until life
China data example
Generate Traffic
Monitor errors
Assert on Returns
Monitor no errors / alerting
Smoke Tests
- You’ve tested that your part works on similar tech like production
Now testing:
Real Integration
Monitoring
Constant Running
Realisitic Environment
Comparing a test and production environment to each another is like comparing a zoo to nature [..]. They may have similarities but the differences are plentiful.
3rd Stage:
Real Integration with 3rd P Systems
CDCs
Test Monitoring - Synthetic Transactions
http://www.gartner.com/id=2272416
https://blog.appdynamics.com/product/synthetic-vs-real-user-monitoring-a-response-to-gartner/
Chaos Monkey?
We still don’t know if the new version works with the systems deployed in live. They could be month old potentially.
That’s why we deploy as often as possible (multiple times a day) to avoid release management.
It might be safer to activly track compatible versions and test on configurations, but
Optimism does not scale
Ask what questions
Self healing systems
http://blog.fosketts.net/2011/07/06/defining-failure-mttr-mttf-mtbf/
https://assertible.com/blog/testing-and-monitoring-in-production-your-qa-is-incomplete-without-it#plan-to-recover-from-failures,-not-prevent-them
Things always go wrong in production, but this doesn’t have to be a bad thing.
richer understanding of the real issues
learn new ways to improve its quality.
Monitoring
Alerting
http://www.neotys.com/blog/7-ways-to-build-a-robust-testing-in-production-practice/
Either integrating syntehtically
Or running testcases in prod
White box help you to identify problems before they have impact (and help debugging)
Black box: sometimes the last resort (otto story)
The rules that catch real incidents most often should be as simple, predictable, and reliable as possible.
Data collection, aggregation, and alerting configuration that is rarely exercised (e.g., less than once a quarter for some SRE teams) should be up for removal.
- Router can be internal or external
- Reducing risks
First attempt: shortening circuits
Mutation testing
Chaos monkey
Gracefully degradation
If something happens: am I able to see it?
Chaos Engineering
http://principlesofchaos.org/
Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
Hypothesize that this steady state will continue in both the control group and the experimental group.
Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
Vary Real-world Events
Chaos variables reflect real-world events. Prioritize events either by potential impact or estimated frequency. Consider events that correspond to hardware failures like servers dying,software failures like malformed responses, and non-failure events like a spike in traffic or a scaling event. Any event capable of disrupting steady state is a potential variable in a Chaos experiment.
Database compatible
Independent Databse Evolution
A writer that can anticipate and forgive some of the mistakes that readers that are less tolerant will inevitably commit
Reader:
Only read what you need
Ignore other fields
Writer:
Never introduce breaking changes