Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Zero Downtime Migrations at Scale

1,560 views

Published on

Already have a system that serves user traffic and it has become so popular that it's hitting scaling limitations? It's probably time to upgrade its architecture or move its data to a more scalable database. Learn how to do this upgrade with zero downtime and no user visible effects in my talk!

Published in: Software
  • Girls for sex in your area are there: tinyurl.com/areahotsex
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: www.bit.ly/sexinarea
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: www.bit.ly/2AJerkH
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area for one night is there tinyurl.com/hotsexinarea Copy and paste link in your browser to visit a site)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Girls for sex are waiting for you https://bit.ly/2TQ8UAY
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Zero Downtime Migrations at Scale

  1. 1. Zero Downtime Migrations at Scale Aysylu Greenberg ∽ April 28, 2018 ∽ Medellín Photo by Iván Erre Jota / CC BY-SA
  2. 2. Aysylu Greenberg @aysylu22
  3. 3. Zero Downtime Migrations at Scale
  4. 4. Zero Downtime Migrations at Scale
  5. 5. Data Migration Architecture Migration
  6. 6. Data Migration Architecture Migration
  7. 7. Zero Downtime Migrations at Scale
  8. 8. Zero Downtime Migrations at Scale We will be performing scheduled site maintenance on Saturday from 3 am to 5 am!
  9. 9. Zero Downtime Migrations at Scale
  10. 10. New data storage and architecture Zero-Downtime Migration for Backups
  11. 11. Zero Downtime Migrations at Scale
  12. 12. Considerations Before Migration ● Scale: O(100M) of users with 1B+ backups, representing 1T+ objects
  13. 13. Considerations Before Migration ● Scale: O(100M) of users with 1B+ backups, representing 1T+ objects ● No sharing, no search, no folders
  14. 14. Considerations Before Migration ● Scale: O(100M) of users with 1B+ backups, representing 1T+ objects ● No sharing, no search, no folders ● Mainly write traffic, read traffic is rare and high priority
  15. 15. Considerations Before Migration ● Scale: O(100M) of users with 1B+ backups, representing 1T+ objects ● No sharing, no search, no folders ● Mainly write traffic, read traffic is rare and high priority ● Spanner: global strong consistency, SQL
  16. 16. Considerations Before Migration ● Scale: O(100M) of users with 1B+ backups, representing 1T+ objects ● No sharing, no search, no folders ● Mainly write traffic, read traffic is rare and high priority ● Spanner: global strong consistency, SQL ● Backups can get large
  17. 17. Considerations Before Migration ● Scale: O(100M) of users with 1B+ backups, representing 1T+ objects ● No sharing, no search, no folders ● Mainly write traffic, read traffic is rare and high priority ● Spanner: global strong consistency, SQL ● Backups can get large
  18. 18. So how do you migrate all this data? Photo by Richard Evea / CC BY-SA 2.0
  19. 19. So how do you migrate all this data? copy/paste... Photo by Richard Evea / CC BY-SA 2.0
  20. 20. Zero-Downtime Migration for Backups Data migration & Architecture migration
  21. 21. Data migration ● Dual writes Architecture migration So how do you migrate all this data?
  22. 22. Data migration ● Dual writes Architecture migration So how do you migrate all this data? Are we storing data correctly?
  23. 23. Data migration ● Dual writes Architecture migration ● Dual writes So how do you migrate all this data?
  24. 24. Data migration ● Dual writes Architecture migration ● Dual writes So how do you migrate all this data? No effect on latency and error rates!
  25. 25. Data migration ● Dual writes ● Backfill data Architecture migration ● Dual writes So how do you migrate all this data?
  26. 26. Data migration ● Dual writes ● Backfill data Architecture migration ● Dual writes So how do you migrate all this data? Do we understand all client behavior and adapt the data correctly?
  27. 27. Data migration ● Dual writes ● Backfill data Architecture migration ● Dual writes ● Prove the stack So how do you migrate all this data?
  28. 28. Data migration ● Dual writes ● Backfill data Architecture migration ● Dual writes ● Prove the stack So how do you migrate all this data? Is the response from the new system same as from the old?
  29. 29. Data migration ● Dual writes ● Backfill data ● Learn the new storage Architecture migration ● Dual writes ● Prove the stack So how do you migrate all this data?
  30. 30. Data migration ● Dual writes ● Backfill data ● Learn the new storage Architecture migration ● Dual writes ● Prove the stack So how do you migrate all this data? New storage mechanism or schema?
  31. 31. Data migration ● Dual writes ● Backfill data ● Learn the new storage Architecture migration ● Dual writes ● Prove the stack ● Harden the system So how do you migrate all this data?
  32. 32. Data migration ● Dual writes ● Backfill data ● Learn the new storage Architecture migration ● Dual writes ● Prove the stack ● Harden the system So how do you migrate all this data? How to get it to production readiness to serve full load?
  33. 33. Data migration ● Dual writes ● Backfill data ● Learn the new storage ● Migrate slowly Architecture migration ● Dual writes ● Prove the stack ● Harden the system So how do you migrate all this data?
  34. 34. Data migration ● Dual writes ● Backfill data ● Learn the new storage ● Migrate slowly Architecture migration ● Dual writes ● Prove the stack ● Harden the system So how do you migrate all this data? Validate, validate, validate Resource constraints? Scale migration
  35. 35. Data migration ● Dual writes ● Backfill data ● Learn the new storage ● Migrate slowly Architecture migration ● Dual writes ● Prove the stack ● Harden the system ● Roll out slowly So how do you migrate all this data?
  36. 36. Data migration ● Dual writes ● Backfill data ● Learn the new storage ● Migrate slowly Architecture migration ● Dual writes ● Prove the stack ● Harden the system ● Roll out slowly So how do you migrate all this data? Scale carefully & proactively
  37. 37. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE
  38. 38. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ● Prepare to write code for intermediate state >>> Quality of code corresponds to the expected lifetime
  39. 39. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ● Prepare to write code for intermediate state >>> Quality of code corresponds to the expected lifetime ● Migrate backends first
  40. 40. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ● Prepare to write code for intermediate state >>> Quality of code corresponds to the expected lifetime ● Migrate backends first
  41. 41. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ● Prepare to write code for intermediate state >>> Quality of code corresponds to the expected lifetime ● Migrate backends first
  42. 42. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ● Prepare to write code for intermediate state >>> Quality of code corresponds to the expected lifetime ● Migrate backends first ● Invest into visibility into system & migration state
  43. 43. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY
  44. 44. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY ● Validate scalability while affecting fewest users
  45. 45. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY ● Validate scalability while affecting fewest users ● Decouple launch of services
  46. 46. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY ● Validate scalability while affecting fewest users ● Decouple launch of services
  47. 47. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY ● Validate scalability while affecting fewest users ● Decouple launch of services
  48. 48. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY VALIDATE & PRACTICE ROLLOUT
  49. 49. Zero-Downtime Migrations at Scale
  50. 50. Zero-Downtime Migrations at Scale
  51. 51. Zero-Downtime Migrations at Scale
  52. 52. Zero-Downtime Migrations at Scale
  53. 53. Zero-Downtime Migrations at Scale
  54. 54. Zero-Downtime Migrations at Scale
  55. 55. Zero-Downtime Migrations at Scale FOCUS ON INTERMEDIATE STATE ROLL OUT INCREMENTALLY VALIDATE & PRACTICE ROLLOUT
  56. 56. Gratitude Steve Clark Thomas Escobar Matt Welsh Ranjodh Mathial Tatiana Marquez
  57. 57. Zero Downtime Migrations at Scale Aysylu Greenberg ∽ April 28, 2018 ∽ Medellín Photo by Iván Erre Jota / CC BY-SA

×