One of the critical factors for development velocity is software correctness. Our ability to develop and ship new features fast is bounded by our ability to validate several aspects of the change: * Does the feature meet the requirements? * How does the feature affect existing code, and how can it affect the production environment? With continues codebase growth and new features being added, naturally our productivity decreases, and our need to improve the guarantees for quality and correctness increase.
In this talk, I’ll focus on testing environments: why developers need a self-serve platform to create a full functioning environment on-demand, how such environments should be managed, and how can one restore part of the lost velocity. I’ll cover an internal system we use at AppsFlyer called ‘Namespaces’ that addresses the issue with the help of Mesos / Marathon, Docker, Traefik, and Consul.
3. 120+ R&D
100s (Micro-)Services
1000s Servers
10s Deployments / Day
About
“AppsFlyer is the world's leading mobile attribution &
marketing analytics platform, helping app marketers
around the world make better decisions.”
80B+ Events / Day
85K+ Apps (Using SDK)
1000s Partners
17B$ Media Spent Measure
9. Shared [Dev | Test] Environments
● Easy - When it’s too big
for the laptop
● “Low” maintenance
overhead
● Every developer
maintains only their
team’s stack
● Similarity to production
● Fuzzy Version Control - is
it my version?
● Stability - who owns it?
who deploys new
versions?
● State is mutable - shared
database, shared state.
● Mutability!!!!!!!!!!!!
Pets vs. Cattle
Source: https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
10. Environment as a human-
readable schema (JSON /
YAML)
API Driven (UI/CLI Enhanced),
Self-serve (!)
Isolation (!!)
Composable
Key Principles To Env. Creation
Definition
Interaction
Safety
Usability
11. Namespaces (to the rescue!)
● Environment == JSON file
● Self-serve, based on API
● Isolation between
environments
● A building block
It’s a platform for creating testing environments easily without dealing
with infrastructure.
Based on two main concepts:
● Services (e.g., any micro-service)
● Resources (e.g., Kafka, MySQL)
17. Namespaces API* (Rest)
● -X POST @env.json /namespace (CREATE)
● -X GET /namespace/{name}/schema (READ)
● -X PUT @env.json /namespace (UPDATE)
● -X DELETE /namespace/{name} (DELETE)
Fully self-serve. Developers / QAs invoke the API to
manage their environments.
* And more (/status, /refresh, /logs, etc)
19. Communication & Service Discovery
http://helloworld.devopsdays-ns.msp.af.com
For every namespace, we deploy another container
called “mspproxy” (Traefik + custom), that handles the
service communication routing.
20. Isolation (by DNS)
How can we make sure services / resources
communicates inside their environment?
resolv.conf FTW!
Source: https://en.wikipedia.org/wiki/Resolv.conf
21. Isolation (by DNS)
Services / Resources are communicating via “short name”
$ curl http://helloworld/ => http://helloworld.devopsdays-
ns.msp.af.com
$ telnet memcached 11211 => memcached.devopsdays-
ns.msp.af.com
Source: https://kubernetes.io/docs/concepts/services-networking/service/
Reminds K8s Service Model
What happens if you want to talk with different namespace? Just add
the environment name as a suffix:
$ curl http://helloworld.scaladays-ns => http://helloworld.scaladays-ns…
24. Composability (Building Block)
API driven by design, means that it can be invoked anywhere you
want.
● CI / CD Pipeline
● Selenium / UI Builds
● Scheduled Jobs
28. Infrastructure Utilization
● Most of the nodes are running on spots.
● Containers are exposed with random port. Proxy handles
communication
● For resources (e.g. Memcached), we expose their original port
(e.g. memcache:11211)
● Can’t bind same port twice on a single machine (hmm, proxy on
port 80?)
- It is possible to use overlay network with Mesos.
29. Service Discovery
● Marathon provides event-bus for all the deployment events.
- Delayed messages
- Losing events
● Solution - scheduled sync + events subscription.
- “mspproxy”
30. Data Replication
● Multiple databases.
● Owners created a job that creates Snapshot from production.
- Without private / sensitive data.
- For big storages, 1/X(>=10) of the data can be enough.
● They can inject the snapshot revision as part of the JSON.
- Wrapping the Database container with custom entrypoint
script.
31. Debuggability
● Dynamic environments means resource (mem/cpu) constraints
- Out of memory.
● Env. is not stable - whose side is the problem? DevOps?
- Out of memory.
● Logs? Metrics?
- Yes, please.
32. ● Yes and no.
- Ideally, we can leverage our existing deployment tool.
- Interface should be different.
● Creating [test/dev] environment != production deployment.
- No need in canary / green deployments.
- Less validations, more into the point.
Another Deployment Tool
36. ● Static environments are ok. Dynamic environments are great.
● Simplify as much as possible. Self-serve & API driven by design.
● KPI is developers / QAs happiness. They aren’t? Not good
enough.
● Keep it layered. Building blocks are better than one over-
engineered solution (:cough: K8s :cough:)
● Nobody cares about your Docker problems. Make it work.
Observability and clear error messages are key.
Summary