Camouflage: Automated Anonymization of Field Data (ICSE 2011)

272 views
200 views

Published on

Published in: Technology, Health & Medicine
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
272
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Camouflage: Automated Anonymization of Field Data (ICSE 2011)

  1. 1. CAMOUFLAGE:AUTOMATED ANONYMIZATIONOF FIELD DATAJames ClauseUniversity of DelawareAlessandro OrsoGeorgia Instituteof Technology
  2. 2. THE BIG PICTURE
  3. 3. THE BIG PICTURE
  4. 4. THE BIG PICTURE
  5. 5. THE BIG PICTURE
  6. 6. THE BIG PICTURE• Apple crash reporter• Windows error reporting• Ubuntu Apport• Gnome BugBuddy• Mozilla / Google Breakpad• many others• Chilimbi and colleagues ’09• Elbaum and Diep ’05• Hilbert and Redmiles ’00• Liblit and colleagues ’05• Pavlopoulou andYoung ’99• many others
  7. 7. PRIVACY CONCERNS
  8. 8. PRIVACY CONCERNS
  9. 9. Handling concerns in practicePRIVACY CONCERNS
  10. 10. Handling concerns in practicePRIVACY CONCERNS• Ignore them
  11. 11. Handling concerns in practicePRIVACY CONCERNS• Ignore them• Privacy policies
  12. 12. Handling concerns in practicePRIVACY CONCERNS• Ignore them• Privacy policies• Collect limited amounts ofinformation• less likely to be sensitive• can rely on user checking
  13. 13. PRIVACY CONCERNSUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulness
  14. 14. PRIVACY CONCERNSPrivacy concernsUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulness
  15. 15. PRIVACY CONCERNSPrivacy concernsUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulnessGOAL: Enable the collection of detailed informationwhile reducing or eliminating privacy concerns.
  16. 16. PRIVACY CONCERNSPrivacy concernsUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulnessGOAL: Enable the collection of detailed informationwhile reducing or eliminating privacy concerns.
  17. 17. OUTLINE• Intuition• Castro and colleagues’ technique• Our improvements• Path condition relaxation• Breakable input conditions• Evaluation• Related work• Conclusions and future work
  18. 18. Sensitiveinput (I)that causes FInput domainINTUITION
  19. 19. Sensitiveinput (I)that causes FInput domainInputs thatcause FINTUITION
  20. 20. Sensitiveinput (I)that causes FInput domainInputs thatcause FINTUITIONAnonymizedinput (I’)that alsocauses F
  21. 21. Inputs that satisfyF’s path condition Sensitiveinput (I)that causes FInput domainInputs thatcause FINTUITIONAnonymizedinput (I’)that alsocauses F
  22. 22. CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)Path condition: set of constraints on a program’sinputs that encode the conditions necessary for aspecific path to be executed.
  23. 23. boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
  24. 24. boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0(sensitive)
  25. 25. Path Condition:Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0(sensitive)
  26. 26. Path Condition:Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
  27. 27. Path Condition:Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
  28. 28. Path Condition:i1 <= 5Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
  29. 29. Path Condition:i1 <= 5Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
  30. 30. Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
  31. 31. Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
  32. 32. Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3∧ i2+i1*2 > 10(sensitive)
  33. 33. Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3∧ i2+i1*2 > 10(sensitive)
  34. 34. Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3∧ i2+i1*2 > 10∧ i3 == 0(sensitive)
  35. 35. i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
  36. 36. ConstraintSolveri1 <= 5∧ i2+i1*2 > 10∧ i3 == 0CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
  37. 37. ConstraintSolveri1 <= 5∧ i2+i1*2 > 10∧ i3 == 0i1 == 5i2 == 3i3 == 0CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
  38. 38. OUR IMPROVEMENTSIncrease the number ofpossible choices for I’Chose I’ such that it isas different as possible from I
  39. 39. OUR IMPROVEMENTSIncrease the number ofpossible choices for I’Chose I’ such that it isas different as possible from I
  40. 40. PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
  41. 41. PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
  42. 42. PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
  43. 43. PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
  44. 44. PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
  45. 45. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
  46. 46. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
  47. 47. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);
  48. 48. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);abc abd
  49. 49. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);Traditional:x0 == y0∧ x1 == y1∧ x2 != y2abc abd
  50. 50. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);Traditional:x0 == y0∧ x1 == y1∧ x2 != y2Relaxed:x0 != y0∨ x1 != y1∨ x2 != y2abc abd
  51. 51. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
  52. 52. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
  53. 53. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}
  54. 54. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}5
  55. 55. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 55
  56. 56. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsRelaxed:x == 5∨ x == 3switch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 55
  57. 57. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}
  58. 58. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}10
  59. 59. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 1010
  60. 60. PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 10Relaxed:x != 1∧ x != 3∧ x != 510
  61. 61. OUR IMPROVEMENTSIncrease the number ofpossible choices for I’Chose I’ such that it isas different as possible from I
  62. 62. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5i2 == 3i3 == 0
  63. 63. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5i2 == 3i3 == 0boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}5 3 0
  64. 64. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
  65. 65. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
  66. 66. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
  67. 67. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
  68. 68. ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0i1 == 4i2 == 10i3 == 0
  69. 69. ASSUMPTIONS
  70. 70. ASSUMPTIONS1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.
  71. 71. ASSUMPTIONS1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.2. Any input that satisfies the path condition results in f.• Non-determinism‣ common to all debugging techniques; requires a deterministicreplay mechanism• Implicit checks (e.g., division by zero)‣ likely that they do not involve relevant inputs‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
  72. 72. ASSUMPTIONS
  73. 73. ASSUMPTIONS✘
  74. 74. ASSUMPTIONS✘
  75. 75. 1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.
  76. 76. 1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.2. Any input that satisfies the path condition results in f.• Non-determinism‣ common to all debugging techniques; requires a deterministicreplay mechanism• Implicit checks (e.g., division by zero)‣ likely that they do not involve relevant inputs‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
  77. 77. EVALUATION
  78. 78. EVALUATION1 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?
  79. 79. EVALUATION1 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?StrengthHow much informationabout the original inputsis revealed?2
  80. 80. EVALUATIONEffectivenessAre the anonymizedinputs safe to send todevelopers?31 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?StrengthHow much informationabout the original inputsis revealed?2
  81. 81. EVALUATIONEffectivenessAre the anonymizedinputs safe to send todevelopers?31 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?StrengthHow much informationabout the original inputsis revealed?2 4 ImprovementDoes the use of pathcondition relaxation andbreakable inputconditions provide anybenefits over the basicapproach?
  82. 82. i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
  83. 83. i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0JavaPathfinderJavaPathfinder
  84. 84. i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0JavaPathfinderJavaPathfinderYices
  85. 85. i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0JavaPathfinderJavaPathfinderYicesRubyscriptsExecutableinputs
  86. 86. SUBJECTS• Columba: 1 fault• htmlparser: 1 fault• Printtokens: 2 faults• NanoXML: 16 faults(20 faults, total)
  87. 87. SUBJECTS• Columba: 1 fault• htmlparser: 1 fault• Printtokens: 2 faults• NanoXML: 16 faultsSelect sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size(20 faults, total)
  88. 88. SUBJECTS• Columba: 1 fault• htmlparser: 1 fault• Printtokens: 2 faults• NanoXML: 16 faultsSelect sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size(Assume all of each input is potentially sensitive)(20 faults, total)
  89. 89. RQ1:FEASIBILITY015030045060005101520columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Averageexecutiontime(s)Averagesolvertime(s)
  90. 90. RQ1:FEASIBILITY015030045060005101520columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Averageexecutiontime(s)Averagesolvertime(s)Inputs can be anonymized in a reasonableamount of time (easily done overnight)
  91. 91. Average % Bits Revealed Average % ResidueRQ2: STRENGTH
  92. 92. Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionLittleinformation revealed
  93. 93. Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionLots ofinformation revealed
  94. 94. Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionMeasures how much of theanonymized input is identicalto the original inputAAAAAAsecretAAAAAA...AAAAAABBBBBBsecretBBBBBB...BBBBBBI’Lots ofinformation revealedI
  95. 95. Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionMeasures how much of theanonymized input is identicalto the original inputAAAAAAsecretAAAAAA...AAAAAABBBBBBsecretBBBBBB...BBBBBBI’Lots ofinformation revealedI
  96. 96. RQ2: STRENGTH02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Average%BitsRevealedAverage%Residue
  97. 97. RQ2: STRENGTH02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Average%BitsRevealedAverage%ResidueAnonymized inputs reveal, on average, between60% (worst case) and 2% (best case) of theinformation in the original inputs
  98. 98. RQ3: EFFECTIVENESSNANOXML<!DOCTYPE Foo [   <!ELEMENT Foo (ns:Bar)>   <!ATTLIST Foo       xmlns CDATA #FIXED http://nanoxml.n3.net/bar       a     CDATA #REQUIRED>   <!ELEMENT ns:Bar (Blah)>   <!ATTLIST ns:Bar       xmlns:ns CDATA #FIXED http://nanoxml.n3.net/bar>   <!ELEMENT Blah EMPTY>   <!ATTLIST Blah       x    CDATA #REQUIRED       ns:x CDATA #REQUIRED>]><!-- comment --><Foo a=very b=secret c=stuff>vaz   <ns:Bar>       <Blah x="1" ns:x="2"/>   </ns:Bar></Foo>
  99. 99. RQ3: EFFECTIVENESSNANOXML<!DOCTYPE [   <! >   <!ATTLIST        #FIXED         >   <!E >   <!ATTLIST        #FIXED >   <!E >   <!ATTLIST        #        : # >]><!-- -->< = = = >   < : >       < =" " : =" "/>   </ :
  100. 100. Wayne,Bartley,Bartley,Wayne,wbartly@acp.com,,Ronald,Kahle,Kahle,Ron,ron.kahle@kahle.com,,Wilma,Lavelle,Lavelle,Wilma,,lavelle678@aol.com,Jesse,Hammonds,Hammonds,Jesse,,hamj34@comcast.com,Amy,Uhl,Uhl,Amy,uhla@corp1.com,uhla@gmail.com,Hazel,Miracle,Miracle,Hazel,hazel.miracle@corp2.com,,Roxanne,Nealy,Nealy,Roxie,,roxie.nearly@gmail.com,Heather,Kane,Kane,Heather,kaneh@corp2.com,,Rosa,Stovall,Stovall,Rosa,,sstoval@aol.com,Peter,Hyden,Hyden,Pete,,peteh1989@velocity.net,Jeffrey,Wesson,Wesson,Jeff,jwesson@corp4.com,,Virginia,Mendoza,Mendoza,Ginny,gmendoza@corp4.com,,Richard,Robledo,Robledo,Ralph,ralphrobledo@corp1.com,,Edward,Blanding,Blanding,Ed,,eblanding@gmail.com,Sean,Pulliam,Pulliam,Sean,spulliam@corp2.com,,Steven,Kocher,Kocher,Steve,kocher@kocher.com,,Tony,Whitlock,Whitlock,Tony,,tw14567@aol.com,Frank,Earl,Earl,Frankie,,,Shelly,Riojas,Riojas,Shelly,srojas@corp6.com,,RQ3: EFFECTIVENESSCOLUMBA, , , , ,,, , , , ,,, , , ,, ,, , , ,, ,, , , , , ,, , , , ,,, , , ,, ,, , , , ,,, , , ,, ,, , , ,, ,, , , , ,,, , , , ,,, , , , ,,, , , ,, ,, , , , ,,, , , , ,,, , , ,, ,, , , ,,,
  101. 101. RQ3: EFFECTIVENESSCOLUMBA, , , , ,,, , , , ,,, , , ,, ,, , , ,, ,, , , , , ,, , , , ,,, , , ,, ,, , , , ,,, , , ,, ,, , , ,, ,, , , , ,,, , , , ,,, , , , ,,, , , ,, ,, , , , ,,, , , , ,,, , , ,, ,, , , ,,,
  102. 102. RQ3: EFFECTIVENESSHTMLPARSER<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title><style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/body {margin: 0px;.../*]]>*/--></style></head><body>...</body>
  103. 103. RQ3: EFFECTIVENESSHTMLPARSER<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title><style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/body {margin: 0px;.../*]]>*/--></style></head><body>...</body>
  104. 104. RQ3: EFFECTIVENESSHTMLPARSER<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title><style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/body {margin: 0px;.../*]]>*/--></style></head><body>...</body>The portions of the inputs that remain afteranonymization tend to be structural in nature andtherefore are safe to send to developers
  105. 105. RQ4: IMPROVEMENT
  106. 106. RQ4: IMPROVEMENT02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16%ImprovementBitsRevealed%ImprovementResidue
  107. 107. RQ4: IMPROVEMENT02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16%ImprovementBitsRevealed%ImprovementResidueInputs anonymized using our improvementsreveal an average of 30% less bits of informationand 40% less residue.(With only a marginal increase in time.)
  108. 108. RELATED WORK
  109. 109. RELATED WORK• Castro and colleagues ’08
  110. 110. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03
  111. 111. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.
  112. 112. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08
  113. 113. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs
  114. 114. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)
  115. 115. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11
  116. 116. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11• Grechanik and colleagues ’11
  117. 117. RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11• Grechanik and colleagues ’11• Dynamic symbolic execution techniques
  118. 118. FUTURE WORK
  119. 119. FUTURE WORK• Additional quality metrics that:• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use
  120. 120. FUTURE WORK• Additional quality metrics that:• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use• Conduction additional (human) studies• additional (larger) subjects
  121. 121. FUTURE WORK• Additional quality metrics that:• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use• Conduction additional (human) studies• additional (larger) subjects• Investigate the combination of anonymization andminimization
  122. 122. SUMMARY
  123. 123. SUMMARY1. An approach for automatically anonymizing failure-inducinginputs• extends Castro and colleagues’ technique through thenovel concepts of path condition relaxation andbreakable input conditions
  124. 124. SUMMARY1. An approach for automatically anonymizing failure-inducinginputs• extends Castro and colleagues’ technique through thenovel concepts of path condition relaxation andbreakable input conditions2. An empirical evaluation that demonstrates, for the subjectsconsidered, our approach is:• feasible — generates anonymized inputs in < 10 minutes• effective — anonymized inputs did not contain sensitiveinformation• an improvement over the state-of-the-art
  125. 125. QUESTIONS?

×