Causality Based Versioning

1. Causality-Based Versioning Kiran-Kumar Muniswamy-Reddy and David A. Holland Slides By Authors And Aleatha Parker-Wood Tuesday, June 1, 2010

2. Versioning • Already popular • Saves back up “versions” of ﬁles as they change • Two ﬂavors: versioning (event based) and snapshotting (time based) • Snapshots: WAFL, Venti... • Versioning: Elephant, VersionFS... Tuesday, June 1, 2010

3. Why Version/Snapshot? • Disaster recovery is baked into the ﬁle system • “Oops, I needed that...” • “Oops, I didn’t mean to click that virus...” • “Oops, that new driver patch broke everything...” • Maintains backup ﬁles to which you can recover (without going offsite) Tuesday, June 1, 2010

4. Causality • Depends on time (to cause Y, X must be before it) • Uni-directional (If X causes Y, Y cannot cause X) • Deﬁned in terms of data ﬂow • A reads B ⇒ B causes A • A writes B ⇒ A causes B • PASS, Intrusion Dectection Systems (BackTracker, Taser...) Tuesday, June 1, 2010

5. Why Causality? • Track propagation of data • Find out what ﬁles were modiﬁed by what processes • Reconstruct the scene of the crime Tuesday, June 1, 2010

6. Causality-Based Versioning • Decide when to version using causal relationships between two files • Has advantages of versioning file systems or snapshots • Eases recovery from corruption, viruses, and user mistakes • In addition, creates causal links between files • Easier to decide what to restore • Sort of like transactions on steroids Tuesday, June 1, 2010

7. Applications • Intrusion Recovery • System conﬁguration management • IP compliance • Reproduction of research results Tuesday, June 1, 2010

8. A Scenario... • Apache split-logﬁle Vulnerability • Vulnerability in Apache 1.3 • Vulnerability allows attacker to overwrite any ﬁle with a .log extension • Let’s look at the current versioning options... Tuesday, June 1, 2010

9. #' $% *+ $ ,- '$ '' ,- '() * !

10. #$% ! 7 Tuesday, June 1, 2010

11. 8) ' $% $ % '.*+ '$ '() * !

12. #$% ! ! Tuesday, June 1, 2010

13. $% $ '.*+ '$0! (.*+ - .*+ '$ /'.*+ /(.*+ !

14. #$% ! 5 Tuesday, June 1, 2010

15. The Goal • One of these has too much information • The other not enough • Can we leverage causality to create just enough versions? Tuesday, June 1, 2010

16. Creating Just Enough Versions • Building on top of the Provenance Aware Storage System (PASS) • Two options • Cycle Avoidance • Graph Finesse Tuesday, June 1, 2010

17. How PASS works • Translates system calls to provenance records (read/write become edges in a dependency graph) • Maintains provenance for transient objects such as pipes and processes, and creates virtual objects as needed • Analyzes to ensure there are no cyclic dependencies between objects • Causality based versioning extends the analysis phase Tuesday, June 1, 2010

18. The big idea • Cycles are violations of causality • The creation of a cycle is an indicator that this is an interesting event • We can prevent cycles by creating a new version every time a cycle is about to occur Tuesday, June 1, 2010

19. 6) ' 3 D

23. #$% ! 5! Tuesday, June 1, 2010

24. 3 D 8)

26. ' !

27. #$% ! Tuesday, June 1, 2010

28. 3 D 8)

29. ) 3

30. ' ' !

31. #$% ! 5 Tuesday, June 1, 2010

32. 3 D 8)

33. ) 3

34. ' ( ' !

35. #$% ! Tuesday, June 1, 2010

36. 3 D 8)

37. ) 3

38. ' ( ( ' !

39. #$% ! / Tuesday, June 1, 2010

40. 3 D 8)

41. ) 3

42. ' ( ( ' !

43. #$% ! 0 Tuesday, June 1, 2010

44. 3 D 8)

45. ) 3 45 +

46. ' ( ( ' !

47. #$% ! Tuesday, June 1, 2010

48. Version-On-Write? • We could remove cycles using Version-On-Write • Every read creates a new version of the process • Every write creates a new version of the ﬁle • But this results in 8 versions • Huge management overhead Tuesday, June 1, 2010

49. Cycle Avoidance Algorithm • Uses local information about the object • Create a new version of an object whenever a new ancestor is added • Different versions are considered to be “new” ancestors • Not every write causes a new version Tuesday, June 1, 2010

50. The Algorithm • Assume new data: A1 depends on B2 • If B is not in A’s dependencies, create a new version of A • Else if B is already in A’s dependencies: • If B2 is in dependencies, discard (no new information) • If B3 is in dependencies, discard (no new causality) • If B1 is in dependencies, create new version of A Tuesday, June 1, 2010

51. 3 D '

53. )' )(

54. ' ' !

55. #$% ! ! Tuesday, June 1, 2010

56. 3 D '

58. )( )6 3(

59. ' ( ( ' !

60. #$% ! / Tuesday, June 1, 2010

61. 3 D

64. 5 0 )( )6 3( 36 ! ' ( ( ' !

65. #$% ! /5 Tuesday, June 1, 2010

66. Graph Finesse • As before: A1 depends on B2 • If B2 is already in A’s history, discard • Otherwise, check for a path from B2 - A1 • If yes, we have a cycle. Make a new version of A1 • Otherwise, add A1- B2 to the dependency graph Tuesday, June 1, 2010

67. 3 D 9)

68. )' 3' 3( ' ( ( ' !

69. #$% ! /0 Tuesday, June 1, 2010

71. ' )( )6 3( 36 ' ( ( ' 7 8+

72. 9) )' 3' 3( ' ( ( ' !

73. #$% ! / Tuesday, June 1, 2010

75. ' 9) . ?' . 9+ * * '

77. #$% ! /1 Tuesday, June 1, 2010

78. Evaluation • Run-time overhead • Space overhead • Recovery costs • All results are average of 5 runs • Less than 5% standard deviation Tuesday, June 1, 2010

79. Workloads used • Linux compile (CPU intensive) • Postmark (I/O intensive) • Applying patches with Mercurial (developer workload) • blast protein-sequencing (scientiﬁc workload) Tuesday, June 1, 2010

80. Algorithms used • Without causal data: • Ext2: Baseline (Lasagna, Harvard’s versioning FS, on top of ext2) • VER: Plain open-close versioning • With causal data • OC: Open-close • CA: Cycle-Avoidance • GF: Graph Finesse • ALL: version on every write Tuesday, June 1, 2010

81. ?6 )= )

82. $ 6$$$ ;B+C: , (;$$ A '%+6: ('+6: 'B+': ($$$ ''+: ?@ ';$$ '$$$ ;$$ $ ( = 4 78 !

83. #$% ! 0 Tuesday, June 1, 2010

84. ?6 )= #)' 8

85. 6+$ '('+D: (+; (+$ ?7@ '+; ';+%: 'B+D: ';+%: (+: '+$ $+; $+$ ( = 4 78 !

86. #$% ! 07 Tuesday, June 1, 2010

87. ' '= )

88. $ 'C$$+$ , A %+D: '($$+$ D'+6: '$$$+$ (;+: (%+%: (B+: ?@ %$$+$ D$$+$ C$$+$ ($$+$ $+$ ( = 4 78 !

89. #$% ! 5 Tuesday, June 1, 2010

90. ' '= #)' 8

91. '+C ;6+B: '+( 6'+D: 6$+(: 6'+: (D+D: '+$ ?7@ $+% $+D $+C $+( $+$ ( = 4 78 !

92. #$% ! 0 Tuesday, June 1, 2010

93. ' ', )* ' ) ' **' '+',

94. )' ) ' !

95. #$% ! Tuesday, June 1, 2010

96. ' ' ',

97. 3 ) !

98. #$% ! 1 Tuesday, June 1, 2010

99. ' '+',= #)' . 8 1 5 541 04 9 570 04 ?? 41! 5!49 !

100. #$% ! 4 Tuesday, June 1, 2010

101. ' $ 6$ = ?@ (; ($ '; 78 '$ ; $ = ! ' = ! ; = !

102. #$% ! 7 Tuesday, June 1, 2010

103. ' $ %$$ (;+'- B$$ 78 = ?@ D$$ ;$$ 'B+- C$$ 6$$ +6- ($$ '$$ $ = ! ' = ! ; = !

104. #$% ! ! Tuesday, June 1, 2010

105. Conclusions • Both algorithms require less time and space than Version-On-Write • Both algorithms offer ﬁner grained control than Open-Close • Graph-Finesse creates fewer unnecessary versions • Cycle-Avoidance has overhead comparable to Open-Close Tuesday, June 1, 2010

106. Expanding on it • Not just good for disaster recovery • Search • Social network analysis Tuesday, June 1, 2010

Causality Based Versioning

Recommended

Recommended

More Related Content

Similar to Causality Based Versioning

Similar to Causality Based Versioning (8)

Recently uploaded

Recently uploaded (20)

Causality Based Versioning