Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PWL Denver: Copysets

625 views

Published on

The paper describes an interesting approach to data replication which allows for finer control over the probability of data loss occurrence and the amount of data loss during such an event. In addition, we'll discuss a technique for moving randomization from runtime to initialization to achieve the same benefits. After the discussion of the paper's contributions, we'll turn to pragmatic aspects of this approach.

Published in: Software
  • Be the first to comment

  • Be the first to like this

PWL Denver: Copysets

  1. 1. opysets: Reducing the Frequency of Data Loss in Cloud Storage Aysylu Greenberg Papers We Love Denver April 27, 2017
  2. 2. Welcome Papers We Love Denver!
  3. 3. Aysylu Greenberg @aysylu22 paperswelove.org
  4. 4. Today • Random replication • Copyset Replication • Copyset Replication with Scatter Width • Pragmatic aspects
  5. 5. RANDOM REPLICATION Overview & Tradeoffs
  6. 6. Random Replication R = 3 N = 9
  7. 7. Random Replication: Correlated Failures
  8. 8. Recovery from Data Loss Fixed cost of restoring lost data is high Lose more data but less often Increase in R is expensive
  9. 9. Random Replication: Tradeoff {small amount & high frequency} data loss {large amount & low frequency} data loss
  10. 10. COPYSET REPLICATION Intuition
  11. 11. Copyset Replication R = 3 N = 9
  12. 12. Copyset Replication R = 3 N = 9 S = 2
  13. 13. Recovery from Node Failure Simpler recovery than random replication: R – 1 nodes with data Higher load on small number of nodes
  14. 14. Copyset Replication with S = 2: Tradeoff {small amount & high frequency} data loss {large amount & low frequency} data loss
  15. 15. SCATTER WIDTH Tuning choices
  16. 16. Copyset Replication with S=2 R = 3 N = 9
  17. 17. Copyset Replication with S=4 R = 3 N = 9
  18. 18. Copyset Replication with S = 4 1 2 3 654 7 8 9 R = 3 N = 9
  19. 19. 1 2 3 654 7 8 9 1 2 3 654 7 8 9 Copyset Replication with S = 4: Permutation Phase
  20. 20. 1 2 3 654 7 8 9 Copyset Replication with S = 4: Permutation Phase
  21. 21. 1 2 3 654 7 8 9 Copyset Replication with S = 4: Permutation Phase
  22. 22. 1 2 3 654 7 8 9 Copyset Replication with S = 4: Permutation Phase
  23. 23. Tuning Scatter Width Set by system designer to control parallelism of data recovery Control load on each individual node during recovery
  24. 24. Copyset Replication Scatter Width: Tradeoffs {small amount & high frequency} data loss {large amount & low frequency} data loss
  25. 25. Scatter Width: Tuning Choices Random replication: scatter width of N-1, lots of replica sets
  26. 26. Scatter Width: Tuning Choices Random replication: scatter width of N-1, lots of replica sets S << N
  27. 27. Scatter Width: Tuning Choices Random replication: scatter width of N-1, lots of replica sets S << N To reduce frequency of data loss, minimize:
  28. 28. FROM IDEAS TO PRACTICE Pragmatic aspects
  29. 29. Pragmatic Aspects • Move randomization to permutation stage • Low overhead on operations • Near optimal and fast • Support for dynamic systems while maintaining guarantees is tricky -> chainsets (http://hackingdistributed.com/2014/02/14/chainsets/) • Tiered replicationhttps://www.usenix.org/conference/atc15/t echnical-session/presentation/cidon
  30. 30. opysets: Reducing the Frequency of Data Loss in Cloud Storage Aysylu Greenberg Papers We Love Denver April 27, 2017

×