Your SlideShare is downloading. ×
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
issues with the use of canaries in upgrade
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

issues with the use of canaries in upgrade

382

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
382
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. NICTA Copyright 2012 From imagination to impact Issues with the use of Canaries Len Bass
  • 2. NICTA Copyright 2012 From imagination to impact Goal of Presentation • For you to understand there are interesting problems associated with canaries. • For me to be sure that I understand how canaries are used in practice. • For me to get feedback about appropriate testing results. 2
  • 3. NICTA Copyright 2012 From imagination to impact Various Upgrade Strategies • How many at once? – One at a time (rolling upgrade) – Groups at a time (staged upgrade, e.g. canaries) – All at once (big flip) • What happens to old versions? – Replaced en masse – Maintained for some period for compatibility purposes • This talk will focus on the canary strategy and examine some issues associated with canaries 3
  • 4. NICTA Copyright 2012 From imagination to impact Context • Deep service dependency hierarchy – may be 70 deep • Upgrading one service in this hierarchy • Need to consider both service and its clients 4 Figure from Netflix Tech Blog
  • 5. NICTA Copyright 2012 From imagination to impact Current state of major internet provider • Each service has an owner • Every service instance is instrumented • When a canary is deployed, service owner examines monitoring data (next slide) and uses judgment to decide when to move to production. • Canary testing is currently based on functionality. No stress testing of canaries. • Research question – what scientific criteria can be used to make judgment of when to go into production? 5
  • 6. NICTA Copyright 2012 From imagination to impact Netflix Monitoring Sequence 6 • Client outbound (start/end) • Network (start/end) • Service network (inbound start/end) • Service processing (start/end) • Service outbound (start/end) • Network (start/end) • Client inbound (start/end)
  • 7. NICTA Copyright 2012 From imagination to impact Common upgrade strategy • Require all versions to be backward compatible with previous versions • Require changes associated with new version to be software switchable. • Clients of a service must be version aware in order to know whether to utilize new functionality. • Once all instances have been upgraded to new versions, send signal to turn on changes both in the new version and their clients. • When using canaries only turn on changes for a subset of services and their clients. 7
  • 8. NICTA Copyright 2012 From imagination to impact Canary Issues • Canaries are a form of live testing. Put a new version into limited production to test its correctness. • Issues – How long are new versions tested to determine correctness? • Period based – for some period of time • Load based – under some utilization assumptions • Result based – until some criteria is met – How are clients of new version chosen and how is this choice enforced? – How are the canaries deployed? 8
  • 9. NICTA Copyright 2012 From imagination to impact General Picture Client Top Level load balancer Second level load balancer Server for Version A Server for Version A Server for Version B Second level load balancer Server for Version A Server for Version B 9 Client • Version aware • Must know about new versions In order to take advantage of new functionality • May be implicitly version aware based on, e.g. cluster • Version unaware clients will only use old functionality and these can be served by any server since services are backward compatible. In addition: • Load variation may trigger elasticity rules. • Deciding whether to load new version or old version raises other issues.
  • 10. NICTA Copyright 2012 From imagination to impact More Detail on Upgrade Process • Canaries are deployed and allowed to run for a period without turning on new features. • This is to test backward compatibility. • Once canaries pass this test, then the new features are turned on. 10
  • 11. NICTA Copyright 2012 From imagination to impact Question 1 – how are clients messages routed? • Three cases: 1. Clients are separated, a priori, into those utilizing new version and those not. 2. Messages are routed arbitrarily by load balancer and those that are received by new version of service cause client to be designated as utilizing new version. 3. All services are capable of being old version or new version and choose based on version of message they receive. (seems contrary to canary strategy) 11
  • 12. NICTA Copyright 2012 From imagination to impact More Questions 2. After turning on new functionality, how does one decide that the canaries have been sufficiently functionally tested for the fixed set of clients – Are there results from the testing community that pertain here? I don’t know. – After answering this question, one can add additional clients to those being routed to Version B until the metric available(?) from the testing community passes some threshold. 3. How can one perform stress testing in a live environment? – We are examining a metric called “Performance Nonscability Likelihood” for its applicability 12
  • 13. NICTA Copyright 2012 From imagination to impact Summary • We have identified the problem of determining when canary testing is adequate as one that could use more rigor. • Multiple different strategies for connecting new version clients to new version services • Outstanding questions are – How long before all of the benefit of using canaries has been realized and the new functionality can be turned on? – How is stress testing performed? 13
  • 14. NICTA Copyright 2012 From imagination to impact Questions/comments • Len.bass@nicta.com.au 14

×