Creating an Effective Developer
Experience for Cloud-Native Apps
Daniel Bryant
@danielbryantuk | @datawireio
“Developer Experience”
@danielbryantuk
Independent Technical Consultant, Product Architect at Datawire
Previously: Academic, software developer (from startups to gov),
architect, consultant, CTO, trainer, conference tourist…
Leading change through technology and teams
DevEx 101
Developer Experience (DevEx) is about...
“...reducing engineering friction between creating a hypothesis, to
delivering an observable experiment (or business value) in production”
- Adrian Trenaman (SVP Engineering, HBC)
https://www.infoq.com/news/2017/07/remove-friction-dev-ex
DevEx isn’t new, but it is important
● Lead time
● Deployment frequency
● Mean time to restore (MTTR)
● Change fail percentage
● Rapid provisioning
● Basic monitoring
● Rapid app deployment
https://martinfowler.com/bliki/MicroservicePrerequisites.html
DevEx: Three Components
DevEx,
Workflow,
Platforms
The Ideal Workflow
“Progressive Delivery”
https://redmonk.com/jgovernor/2018/08/06/towards-progressive-delivery/ https://launchdarkly.com/blog/progressive-delivery-a-history-condensed/
https://speakerdeck.com/stilkov/microservices-patterns-and-antipatterns-1
Decentralised Biz/Product Teams;
Centralised Specialists and Platform
Team Z:
Prototyping
Team A: Mission-
Critical
Team T:
Production Phase
https://twitter.com/kelseyhightower/status/851935087532945409
https://www.infoq.com/news/2017/06/paved-paas-netflix
https://www.infoq.com/news/2018/07/shopify-kubernetes-paas
Should I Build a
PaaS on k8s?
Fundamental questions
Do you understand your domain?
Is your problem domain complex?
Do you have product/market fit?
Question
Is your solution event-driven (and simple)?
Should you be adding value elsewhere?
SOLID K8s: Open for Extension...
● Kubernetes becoming de facto CoaaS (the new cloud broker?)
○ Lots of hosted options
● Know the extension points
○ Custom Resources & Controllers
○ Operators
○ operatorhub.io (kudos to Red Hat)
● Extension enables custom workflow
○ “Kubernetes Custom Resource, Controller and Operator Development Tools”
https://github.com/weaveworks/flux
How quick do you need feedback?
Question
https://mitchdenny.com/the-inner-loop/
https://bit.ly/2RXfokz
Automate Inner Dev Loop
https://blog.hasura.io/draft-vs-gitkube-vs-helm-vs-
ksonnet-vs-metaparticle-vs-skaffold-f5aa9561f948
https://codeengineered.com/blog/2018/kubernet
es-helm-related-tools/
● Draft
○ Automates “inner loop” build-push-deploy
○ Utilises Helm
● Gitkube
○ Automates build-push-deploy
○ Provides heroku / CF like experience
● Skaffold
○ Automates build-push-deploy
○ Watches source code
○ Provides “dev” and “run” (CD) modes
● Tilt
○ Automates “inner loop” build-test-deploy
● Garden
○ Automates local build-push-test-deploy
Automate Inner Dev Loop
● Helm (*)
○ Package manager for k8s
○ Deploy and manage (ready-made) charts
● Ksonnet
○ Define k8s manifests in jsonnet
○ Create composable config/services
● Telepresence (*)
○ Enables local-to-prod development
(*) CNCF projects
Develop and test services locally, or
within the cluster (or both)?
● Working locally has many advantages
○ Reduce ops cost of multi-cluster
● However, some systems are simply too
large to run locally (for integration tests)
● Local/remote container dev tools like
Telepresence and Squash allow hybrid
Question
Develop and test services locally, or
within the cluster (or both)?
● Working locally has many advantages
○ Reduce ops cost of multi-cluster
● However, some systems are simply too
large to run locally (for integration tests)
● Local/remote container dev tools like
Telepresence and Squash allow hybrid
Question
“Bring the cloud to you” “Put you in the cloud”
How do want to verify your system?
● Pre-prod testing in distributed systems
○ Dealing with complex adaptive systems
○ Probabilistic guarantee of “correctness”
https://medium.com/@copyconstruct/testing-microservices-
the-sane-way-9bb31d158c16
Question
https://skillsmatter.com/skillscasts/13773-london-java-community-april
How do want to verify your system?
● Pre-prod testing in distributed systems
○ Dealing with complex adaptive systems
○ Probabilistic guarantee of “correctness”
https://medium.com/@copyconstruct/testing-microservices-
the-sane-way-9bb31d158c16
Question
● Traffic shaping/splitting is powerful
○ Canarying
○ Shadowing
https://github.com/weaveworks/flagger
The Importance of L7 (and Envoy)
● “Service-mesh all the things”?
● Old pattern, new technology
○ Allows fine-grained release
● Many control planes for Envoy
○ Ambassador
○ Gloo
○ Istio
○ Consul Connect
https://www.infoq.com/articles/ambassador-api-gateway-kubernetes
https://www.youtube.com/watch?v=o1MJi54_R4o&list=PLj6h78yzYM2PpmMAnvpvsnR4c27wJePh3&index=179
https://www.infoq.com/articles/api-gateway-service-mesh-app-modernisation/
Canary gotchas (and mitigations)
● Observability is a prerequisite
○ Service Level Indicators (SLIs)
○ Service Level Objectives (SLOs)
○ Key Performance Indicators (KPIs)
● Needs high volume of diverse
(representative) traffic
● Take care with side effects
● Focus on “golden signals”
○ Latency, traffic, errors, saturation
○ Okay to initially “eyeball” data
○ Create actionable alerts
● Load test (with flag header)
● Run synthetic transactions
● Service virtualisation (Hoverfly)
Observability UX
Do you want to implement “guard rails”
for your development teams?
● Larger teams often want to provide
comprehensive guard rails
● Startups and SMEs may instead value
team independence
● Hybrid? Offer platform, but allow service
teams freedom and responsibility
https://blog.openshift.com/multiple-deployment-methods-openshift/
Question
Build vs buy
https://www.infoq.com/news/2019/03/airbnb-kubernetes-workflow
https://docs.openshift.com/container-platform/3.9/dev_guide/openshift_pipeline.html
Security guard rails
Getting Started:
Where to Focus
Decentralised Biz/Product Teams;
Centralised Specialists and Platform
Team Z:
Prototyping
Team A: Mission-
Critical
Team T:
Production Phase
Some thoughts on where to focus...
Prototype Production Mission Critical
Dev and test Local / hybrid Hybrid / local / staged Local / (hybrid) staged
Release Canary
(synthetic shadow)
Canary / pre-prod test Pre-prod test / Canary
Guide rails “YOLO” Limited Strong
Where to focus? Inner development
loop & CI/CD
Observability and
scaffolding (codifying
best practices)
Observability, debugging
and “recreatability”
(environment & data)
Conclusion
In Summary
The developer experience is primarily about minimising the friction between having
an idea, to dev/test, to release, to delivering observable business value
How you construct your ‘platform’ impacts the developer experience greatly
You must intentionally curate the experience of: local development, continuous
delivery, release control, observability, debuggability, and more...
Thanks for Listening!
Questions, comments, thoughts…
db@datawire.io
@danielbryantuk

muCon 2019: "Creating an Effective Developer Experience for Cloud-Native Apps"

Editor's Notes

  • #3 How you construct your ‘platform’ impacts the developer experience greatly
  • #6 ...is minimising the distance between a good idea and production
  • #34 Do you want to implement “guide rails” for your development teams? Larger teams and enterprises often want to provide comprehensive guide rails for development teams; these constrain the workflow and toolset being used. Doing this has many advantages, such as the reduction of friction when moving engineers across projects, and the creation of integrated debug tooling and auditing is easier. The key trade-off is the limited flexibility associated with the establishment of workflows required for exceptional circumstances, such as when a project requires a custom build and deployment or differing test tooling. Red Hat’s OpenShift and Pivotal Cloud Foundry offer PaaS-es that are popular within many enterprise organizations. Startups and small/medium enterprises (SMEs) may instead value team independence, where each team chooses the most appropriate workflow and developer tooling for them. My colleague, Rafael Schloming, has spoken about the associated benefits and challenges at QCon San Francisco: Patterns for Microservice Developer Workflows and Deployment. Teams embracing this approach often operate a Kubernetes cluster via a cloud vendor, such as Google’s GKE or Azure’s AKS, and utilize a combination of vendor services and open-source tooling. A hybrid approach, such as that espoused by Netflix, is to provide a centralized platform team and approved/managed tooling, but allow any service team the freedom to implement their own workflow and associated tooling that they will also have the responsibility for managing. My summary of Yunong Xiao’s QCon New York talk provides more insight to the ideas: The “Paved Road” PaaS for Microservices at Netflix. This hybrid approach is the style we favor at Datawire, and we are building open-source tooling to support this.