Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
NICTA Copyright 2012 From imagination to impact
Challenges in Practicing
High Frequency Releases in
Cloud Environments
Lim...
NICTA Copyright 2012 From imagination to impact
NICTA (National ICT Australia)
• Australia’s National Centre of Excellence...
NICTA Copyright 2012 From imagination to impact
Challenge: High Frequency Releases/Changes
• Significant shorter release c...
NICTA Copyright 2012 From imagination to impact
Heavily-Baked vs. Lightly-Baked
• Heavily-baked approach
+ No server drift...
NICTA Copyright 2012 From imagination to impact
Motivating Example: Rolling Upgrade
• Used in large-scale web operations
–...
NICTA Copyright 2012 From imagination to impact
Observations 1/3
NICTA Copyright 2012 From imagination to impact
Observations 2/3
NICTA Copyright 2012 From imagination to impact
Observations 3/3
NICTA Copyright 2012 From imagination to impact
Solutions for Better Reliability/Predictability
• Ad hoc tactics to reduce...
NICTA Copyright 2012 From imagination to impact
Process-Oriented Dependability (POD)
• Offline: treat operations as a proc...
NICTA Copyright 2012 From imagination to impact
Example: Rolling Upgrade Using Asgard
Read by
Operator
Process
Mining
Serv...
NICTA Copyright 2012 From imagination to impact
Summary
• Lightly vs. heavily-baked for high frequency releases
• Solution...
Upcoming SlideShare
Loading in …5
×

Challenges in Practicing High Frequency Releases in Cloud Environments

1,526 views

Published on

Talk at RELENG 2014
Full paper: http://www.nicta.com.au/pub?doc=7925
The continuous delivery trend is dramatically shortening release cycles from months into hours. Applications with high frequency releases often rely heavily on automated deployment tools using cloud infrastructure APIs. We report some results from experiments on reliability issues of cloud infrastructure and trade-offs between using heavily-baked and lightly-baked images. Our experiments were based on Amazon Web Service (AWS) OpsWorks APIs and configuration management tool Chef. As a result of our experiments, we then propose error handling practices that can be included in tailor-made continuous deployment facilities.
More related info at our DevOps book http://www.ssrg.nicta.com.au/projects/devops_book/

Published in: Software, Technology, Education
  • Hello! I do no use writing service very often, only when I really have problems. But this one, I like best of all. The team of writers operates very quickly. It's called ⇒ www.WritePaper.info ⇐ Hope this helps!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You can hardly find a student who enjoys writing a college papers. Among all the other tasks they get assigned in college, writing essays is one of the most difficult assignments. Fortunately for students, there are many offers nowadays which help to make this process easier. The best service which can help you is HelpWriting.net
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Challenges in Practicing High Frequency Releases in Cloud Environments

  1. 1. NICTA Copyright 2012 From imagination to impact Challenges in Practicing High Frequency Releases in Cloud Environments Liming Zhu, Donna Xu, Xiwei Xu, An Binh Tran, Ingo Weber, Len Bass NICTA/UNSW http://slideshare.net/limingzhu
  2. 2. NICTA Copyright 2012 From imagination to impact NICTA (National ICT Australia) • Australia’s National Centre of Excellence in Information and Communication Technology • Five Research Labs: – ATP: Australian Technology Park, Sydney – NRL: UNSW, Sydney – CRL: ANU, Canberra – VRL: Uni. Melbourne – QRL: Uni. Queensland and QUT • 700 staff including 270 PhD students • Research Groups – Software Systems Research Group (SSRG) • ssrg.nicta.com.au – Machine Learning, Optimisation, Networks, Computer Vision
  3. 3. NICTA Copyright 2012 From imagination to impact Challenge: High Frequency Releases/Changes • Significant shorter release cycles and DevOps – Continuous delivery/deployment • from months at scheduled downtime to hours at all times • Cloud uncertainty during provision/deployment – Heavy reliance on Cloud APIs; Indirect control – Other “sporadic” operations: cron jobs/backup/reconfig... – Our focus: error detection/diagnosis during continuous “changes” • Anomaly-detection/monitoring for normal operation not working • One solution: machine image as build artifacts? – Heavily-baked vs. lightly-baked? Immutable server?
  4. 4. NICTA Copyright 2012 From imagination to impact Heavily-Baked vs. Lightly-Baked • Heavily-baked approach + No server drifts, consistent, more reliable? – Image preparation time for any minor release – Image sprawl – Image consistency among teams • coordination, golden image, image inheritance.. • Lightly-baked approach + Highly dynamic, config-as-service, less restarting… – Less reliable due to runtime dependence on external services (etc. repo, configuration services.. )? – Drifting, outcome validation, race conditions..
  5. 5. NICTA Copyright 2012 From imagination to impact Motivating Example: Rolling Upgrade • Used in large-scale web operations – Have 100+ servers in cloud with version 1 software – Upgrade 10 servers at a time to version 2 software • Potentially take a long time to complete with errors during the operation – Provisioning failure, logical failures, instance failure – Other interfering operations • Heavily-baked vs. lightly-baked – Past experiences: Netflix Asgard with heavily-baked – AWS OpsWorks: • DevOps automation + life cycle events + abstraction • Heavily-baked + built-in recipe vs. lightly-baked + custom recipe
  6. 6. NICTA Copyright 2012 From imagination to impact Observations 1/3
  7. 7. NICTA Copyright 2012 From imagination to impact Observations 2/3
  8. 8. NICTA Copyright 2012 From imagination to impact Observations 3/3
  9. 9. NICTA Copyright 2012 From imagination to impact Solutions for Better Reliability/Predictability • Ad hoc tactics to reduce tails – Inspired by Jeff Dean’s “Tail at Scale” CACM article – Retry with alternative options • stop-restart, replace, deploy without restart – Fail fast • Tracking status time and 95 percentile to fail fast – Asynchronous waves for upgrading granularity >1 • Validate intermediary outcomes – Inside machine: • Chef Mini-test; test cases in production monitoring – Outside machine: • Process-Oriented Dependability (POD) • Assertion checking and conformance checking
  10. 10. NICTA Copyright 2012 From imagination to impact Process-Oriented Dependability (POD) • Offline: treat operations as a processes – Process discovered automatically from logs/scripts • Log line clustering and process mining – Expected step outcomes specified as assertions • Online: use process context – Process context: process/instance/step ids, expected states – Errors are detected by examining logs and monitoring data • Assertions evaluation using monitoring facilities or directly • Compliance checking against expected processes – Detected errors are further diagnosed for (root) causes • Examining a fault tree to locate potential root causes • Performing more diagnostic tests and on-demand assertions X. Xu, L Zhu, et. al. "POD-Diagnosis: Error Diagnosis of Sporadic Operations on Cloud Applications,” 44nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2014.
  11. 11. NICTA Copyright 2012 From imagination to impact Example: Rolling Upgrade Using Asgard Read by Operator Process Mining Service Controls Outputs Create SnapshotCheck AZs Create instance from snapshot Create AMI from instance Evaluate AMI Discovered Model Asgard Log dataLog dataGenerates Offline Online Error Detection Service has two methods for detecting errors: • Assertion Checking • Conformance Checking
  12. 12. NICTA Copyright 2012 From imagination to impact Summary • Lightly vs. heavily-baked for high frequency releases • Solutions for unreliable processes – Some tactics to reduce long tails • fail fast, alternative actions, asynchronous waves… – Validate intermediary outcomes • Inside machine: Chef Mini-test; test cases in production monitoring • Outside machine: Process-Oriented Dependability (POD) – Assertion checking and conformance checking • Currently integrating with monitoring and alerting • We need industry help and collaboration – Logs, trials, feedback, case study as book chapters Book: http://www.ssrg.nicta.com.au/projects/devops_book/ Contact: Liming.Zhu@nicta.com.au

×