‘Feature is DONE !’ A regular statement that is typically shared by an overjoyed engineer who is declaring the completion of a feature that he/she has implemented and has been approved by a test engineer. BUT is it ‘Done, Done ?!!’ Engineering an enterprise cloud services would require teams to satisfy the ‘ilities’ requirements. If you are wondering what they are, how engineering teams tackle them and satisfy these requirements, this session will give you an insight of the work and investment needed for a feature to be ‘Done’ in the enterprise cloud world.
7. Availability
#ISSLearningFest
• Deployments must not impact availability.
• Service availability must adhere to
SLAs/SLOs.
• Must no propagate the failure of a single
non-critical components.
• Implement region independence.
• Proper flow control between components
and its dependencies
• Resource consumption fairness for
components susceptible to noisy
neighbors
• Async messaging among loosely coupled
components to improve fault tolerance
8. Durability
#ISSLearningFest
• More important than availability
• Low-latency sync replication throughout
regions should be employed
• Idempotency tokens must be durably
persisted
• Backup and restore are clearly defined and
regularly tested
9. Reliability
#ISSLearningFest
• Service recovery from failures without
human intervention in lieu of a complete
disaster
• No cyclical dependencies among
operationally significant dependencies
• Disaster recovery must be documented
and routinely tested
• Features are well hardened against
‘emergent performance degradation’
• Recovering from corrupted or deleted
state
• Mean time between failures should not
decrease significantly with system size
• Mean time to recovery should not increase
significantly with system size
This Photo by Unknown Author is licensed under CC BY-SA-NC
10. Scaling and Performance
#ISSLearningFest
• Bottlenecks of each component withing
system are well understood
• Dashboards and Alarms when actions
need to be taken to scale a component
• Cost-efficient infrastructure
• Durability + Durability maintained or
improves as demand increases
• Use isolated pools for capacity
• Predictable performance at peak load
• Automate performance testing are part of
continuous deployment
This Photo by Unknown Author is licensed under CC BY-SA-NC
11. Security
#ISSLearningFest
• Customer facing APIs follows API best
practices (e.g. TLS1.2)
• Secrets are stored ephemerally, only
accessible by processes that requires
them
• Features to remain secure at all times
• Communication encrypted in transit
• Guidance on Security compliance and
algorithmic standards
This Photo by Unknown Author is licensed under CC BY
12. Operability
#ISSLearningFest
This Photo by Unknown Author is licensed under CC BY
• Human-based operational load scales
sub-linearly with your deployment
footprint.
• Human operators have a documented,
audited, procedure for obtaining access to
environment.
• The system is observable.
• Build canaries to continuously test the
service
• Timeouts and retries are dynamically
configured
14. Testability
#ISSLearningFest
This Photo by Unknown Author is licensed under CC BY-SA-NC
• Support testing in a given test context
• Keep testability of service high to find faults
easily and in an automated manner
• Leverage on deployment and release model
15. Give Us Your Feedback
#ISSLearningFest
Day 1 Programme