Causality Based Versioning

372 views
297 views

Published on

Slides for CMPS229

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
372
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Causality Based Versioning

  1. 1. Causality-Based Versioning Kiran-Kumar Muniswamy-Reddy and David A. Holland Slides By Authors And Aleatha Parker-Wood Tuesday, June 1, 2010
  2. 2. Versioning • Already popular • Saves back up “versions” of files as they change • Two flavors: versioning (event based) and snapshotting (time based) • Snapshots: WAFL, Venti... • Versioning: Elephant, VersionFS... Tuesday, June 1, 2010
  3. 3. Why Version/Snapshot? • Disaster recovery is baked into the file system • “Oops, I needed that...” • “Oops, I didn’t mean to click that virus...” • “Oops, that new driver patch broke everything...” • Maintains backup files to which you can recover (without going offsite) Tuesday, June 1, 2010
  4. 4. Causality • Depends on time (to cause Y, X must be before it) • Uni-directional (If X causes Y, Y cannot cause X) • Defined in terms of data flow • A reads B ⇒ B causes A • A writes B ⇒ A causes B • PASS, Intrusion Dectection Systems (BackTracker, Taser...) Tuesday, June 1, 2010
  5. 5. Why Causality? • Track propagation of data • Find out what files were modified by what processes • Reconstruct the scene of the crime Tuesday, June 1, 2010
  6. 6. Causality-Based Versioning • Decide when to version using causal relationships between two files • Has advantages of versioning file systems or snapshots • Eases recovery from corruption, viruses, and user mistakes • In addition, creates causal links between files • Easier to decide what to restore • Sort of like transactions on steroids Tuesday, June 1, 2010
  7. 7. Applications • Intrusion Recovery • System configuration management • IP compliance • Reproduction of research results Tuesday, June 1, 2010
  8. 8. A Scenario... • Apache split-logfile Vulnerability • Vulnerability in Apache 1.3 • Vulnerability allows attacker to overwrite any file with a .log extension • Let’s look at the current versioning options... Tuesday, June 1, 2010
  9. 9. #' $% *+ $ ,- '$ '' ,- '() * !
  10. 10. #$% ! 7 Tuesday, June 1, 2010
  11. 11. 8) ' $% $ % '.*+ '$ '() * !
  12. 12. #$% ! ! Tuesday, June 1, 2010
  13. 13. $% $ '.*+ '$0! (.*+ - .*+ '$ /'.*+ /(.*+ !
  14. 14. #$% ! 5 Tuesday, June 1, 2010
  15. 15. The Goal • One of these has too much information • The other not enough • Can we leverage causality to create just enough versions? Tuesday, June 1, 2010
  16. 16. Creating Just Enough Versions • Building on top of the Provenance Aware Storage System (PASS) • Two options • Cycle Avoidance • Graph Finesse Tuesday, June 1, 2010
  17. 17. How PASS works • Translates system calls to provenance records (read/write become edges in a dependency graph) • Maintains provenance for transient objects such as pipes and processes, and creates virtual objects as needed • Analyzes to ensure there are no cyclic dependencies between objects • Causality based versioning extends the analysis phase Tuesday, June 1, 2010
  18. 18. The big idea • Cycles are violations of causality • The creation of a cycle is an indicator that this is an interesting event • We can prevent cycles by creating a new version every time a cycle is about to occur Tuesday, June 1, 2010
  19. 19. 6) ' 3 D
  20. 20. 2
  21. 21. !
  22. 22. !
  23. 23. #$% ! 5! Tuesday, June 1, 2010
  24. 24. 3 D 8)
  25. 25. )
  26. 26. ' !
  27. 27. #$% ! Tuesday, June 1, 2010
  28. 28. 3 D 8)
  29. 29. ) 3
  30. 30. ' ' !
  31. 31. #$% ! 5 Tuesday, June 1, 2010
  32. 32. 3 D 8)
  33. 33. ) 3
  34. 34. ' ( ' !
  35. 35. #$% ! Tuesday, June 1, 2010
  36. 36. 3 D 8)
  37. 37. ) 3
  38. 38. ' ( ( ' !
  39. 39. #$% ! / Tuesday, June 1, 2010
  40. 40. 3 D 8)
  41. 41. ) 3
  42. 42. ' ( ( ' !
  43. 43. #$% ! 0 Tuesday, June 1, 2010
  44. 44. 3 D 8)
  45. 45. ) 3 45 +
  46. 46. ' ( ( ' !
  47. 47. #$% ! Tuesday, June 1, 2010
  48. 48. Version-On-Write? • We could remove cycles using Version-On-Write • Every read creates a new version of the process • Every write creates a new version of the file • But this results in 8 versions • Huge management overhead Tuesday, June 1, 2010
  49. 49. Cycle Avoidance Algorithm • Uses local information about the object • Create a new version of an object whenever a new ancestor is added • Different versions are considered to be “new” ancestors • Not every write causes a new version Tuesday, June 1, 2010
  50. 50. The Algorithm • Assume new data: A1 depends on B2 • If B is not in A’s dependencies, create a new version of A • Else if B is already in A’s dependencies: • If B2 is in dependencies, discard (no new information) • If B3 is in dependencies, discard (no new causality) • If B1 is in dependencies, create new version of A Tuesday, June 1, 2010
  51. 51. 3 D '
  52. 52. '
  53. 53. )' )(
  54. 54. ' ' !
  55. 55. #$% ! ! Tuesday, June 1, 2010
  56. 56. 3 D '
  57. 57. '
  58. 58. )( )6 3(
  59. 59. ' ( ( ' !
  60. 60. #$% ! / Tuesday, June 1, 2010
  61. 61. 3 D
  62. 62. '
  63. 63. '
  64. 64. 5 0 )( )6 3( 36 ! ' ( ( ' !
  65. 65. #$% ! /5 Tuesday, June 1, 2010
  66. 66. Graph Finesse • As before: A1 depends on B2 • If B2 is already in A’s history, discard • Otherwise, check for a path from B2 - A1 • If yes, we have a cycle. Make a new version of A1 • Otherwise, add A1- B2 to the dependency graph Tuesday, June 1, 2010
  67. 67. 3 D 9)
  68. 68. )' 3' 3( ' ( ( ' !
  69. 69. #$% ! /0 Tuesday, June 1, 2010
  70. 70. '
  71. 71. ' )( )6 3( 36 ' ( ( ' 7 8+
  72. 72. 9) )' 3' 3( ' ( ( ' !
  73. 73. #$% ! / Tuesday, June 1, 2010
  74. 74. '
  75. 75. ' 9) . ?' . 9+ * * '
  76. 76. !
  77. 77. #$% ! /1 Tuesday, June 1, 2010
  78. 78. Evaluation • Run-time overhead • Space overhead • Recovery costs • All results are average of 5 runs • Less than 5% standard deviation Tuesday, June 1, 2010
  79. 79. Workloads used • Linux compile (CPU intensive) • Postmark (I/O intensive) • Applying patches with Mercurial (developer workload) • blast protein-sequencing (scientific workload) Tuesday, June 1, 2010
  80. 80. Algorithms used • Without causal data: • Ext2: Baseline (Lasagna, Harvard’s versioning FS, on top of ext2) • VER: Plain open-close versioning • With causal data • OC: Open-close • CA: Cycle-Avoidance • GF: Graph Finesse • ALL: version on every write Tuesday, June 1, 2010
  81. 81. ?6 )= )
  82. 82. $ 6$$$ ;B+C: , (;$$ A '%+6: ('+6: 'B+': ($$$ ''+: ?@ ';$$ '$$$ ;$$ $ ( = 4 78 !
  83. 83. #$% ! 0 Tuesday, June 1, 2010
  84. 84. ?6 )= #)' 8
  85. 85. 6+$ '('+D: (+; (+$ ?7@ '+; ';+%: 'B+D: ';+%: (+: '+$ $+; $+$ ( = 4 78 !
  86. 86. #$% ! 07 Tuesday, June 1, 2010
  87. 87. ' '= )
  88. 88. $ 'C$$+$ , A %+D: '($$+$ D'+6: '$$$+$ (;+: (%+%: (B+: ?@ %$$+$ D$$+$ C$$+$ ($$+$ $+$ ( = 4 78 !
  89. 89. #$% ! 5 Tuesday, June 1, 2010
  90. 90. ' '= #)' 8
  91. 91. '+C ;6+B: '+( 6'+D: 6$+(: 6'+: (D+D: '+$ ?7@ $+% $+D $+C $+( $+$ ( = 4 78 !
  92. 92. #$% ! 0 Tuesday, June 1, 2010
  93. 93. ' ', )* ' ) ' **' '+',
  94. 94. )' ) ' !
  95. 95. #$% ! Tuesday, June 1, 2010
  96. 96. ' ' ',
  97. 97. 3 ) !
  98. 98. #$% ! 1 Tuesday, June 1, 2010
  99. 99. ' '+',= #)' . 8 1 5 541 04 9 570 04 ?? 41! 5!49 !
  100. 100. #$% ! 4 Tuesday, June 1, 2010
  101. 101. ' $ 6$ = ?@ (; ($ '; 78 '$ ; $ = ! ' = ! ; = !
  102. 102. #$% ! 7 Tuesday, June 1, 2010
  103. 103. ' $ %$$ (;+'- B$$ 78 = ?@ D$$ ;$$ 'B+- C$$ 6$$ +6- ($$ '$$ $ = ! ' = ! ; = !
  104. 104. #$% ! ! Tuesday, June 1, 2010
  105. 105. Conclusions • Both algorithms require less time and space than Version-On-Write • Both algorithms offer finer grained control than Open-Close • Graph-Finesse creates fewer unnecessary versions • Cycle-Avoidance has overhead comparable to Open-Close Tuesday, June 1, 2010
  106. 106. Expanding on it • Not just good for disaster recovery • Search • Social network analysis Tuesday, June 1, 2010

×