• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Camouflage: Automated Anonymization of Field Data (ICSE 2011)
 

Camouflage: Automated Anonymization of Field Data (ICSE 2011)

on

  • 125 views

 

Statistics

Views

Total Views
125
Views on SlideShare
125
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Camouflage: Automated Anonymization of Field Data (ICSE 2011) Camouflage: Automated Anonymization of Field Data (ICSE 2011) Presentation Transcript

    • CAMOUFLAGE:AUTOMATED ANONYMIZATIONOF FIELD DATAJames ClauseUniversity of DelawareAlessandro OrsoGeorgia Instituteof Technology
    • THE BIG PICTURE
    • THE BIG PICTURE
    • THE BIG PICTURE
    • THE BIG PICTURE
    • THE BIG PICTURE• Apple crash reporter• Windows error reporting• Ubuntu Apport• Gnome BugBuddy• Mozilla / Google Breakpad• many others• Chilimbi and colleagues ’09• Elbaum and Diep ’05• Hilbert and Redmiles ’00• Liblit and colleagues ’05• Pavlopoulou andYoung ’99• many others
    • PRIVACY CONCERNS
    • PRIVACY CONCERNS
    • Handling concerns in practicePRIVACY CONCERNS
    • Handling concerns in practicePRIVACY CONCERNS• Ignore them
    • Handling concerns in practicePRIVACY CONCERNS• Ignore them• Privacy policies
    • Handling concerns in practicePRIVACY CONCERNS• Ignore them• Privacy policies• Collect limited amounts ofinformation• less likely to be sensitive• can rely on user checking
    • PRIVACY CONCERNSUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulness
    • PRIVACY CONCERNSPrivacy concernsUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulness
    • PRIVACY CONCERNSPrivacy concernsUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulnessGOAL: Enable the collection of detailed informationwhile reducing or eliminating privacy concerns.
    • PRIVACY CONCERNSPrivacy concernsUnfortunately:Register valuesStack dumpsBranch profilesPath profilesTest casesUsefulnessGOAL: Enable the collection of detailed informationwhile reducing or eliminating privacy concerns.
    • OUTLINE• Intuition• Castro and colleagues’ technique• Our improvements• Path condition relaxation• Breakable input conditions• Evaluation• Related work• Conclusions and future work
    • Sensitiveinput (I)that causes FInput domainINTUITION
    • Sensitiveinput (I)that causes FInput domainInputs thatcause FINTUITION
    • Sensitiveinput (I)that causes FInput domainInputs thatcause FINTUITIONAnonymizedinput (I’)that alsocauses F
    • Inputs that satisfyF’s path condition Sensitiveinput (I)that causes FInput domainInputs thatcause FINTUITIONAnonymizedinput (I’)that alsocauses F
    • CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)Path condition: set of constraints on a program’sinputs that encode the conditions necessary for aspecific path to be executed.
    • boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
    • boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0(sensitive)
    • Path Condition:Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0(sensitive)
    • Path Condition:Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
    • Path Condition:Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
    • Path Condition:i1 <= 5Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
    • Path Condition:i1 <= 5Symbolic State:boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
    • Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
    • Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3(sensitive)
    • Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3∧ i2+i1*2 > 10(sensitive)
    • Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3∧ i2+i1*2 > 10(sensitive)
    • Path Condition:i1 <= 5Symbolic State:a→i1*2boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)5 3 0x→i1y→i2z→i3∧ i2+i1*2 > 10∧ i3 == 0(sensitive)
    • i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
    • ConstraintSolveri1 <= 5∧ i2+i1*2 > 10∧ i3 == 0CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
    • ConstraintSolveri1 <= 5∧ i2+i1*2 > 10∧ i3 == 0i1 == 5i2 == 3i3 == 0CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
    • OUR IMPROVEMENTSIncrease the number ofpossible choices for I’Chose I’ such that it isas different as possible from I
    • OUR IMPROVEMENTSIncrease the number ofpossible choices for I’Chose I’ such that it isas different as possible from I
    • PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
    • PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
    • PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
    • PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
    • PATH CONDITION RELAXATIONSensitiveinput (I)that causes FInput domain
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);abc abd
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);Traditional:x0 == y0∧ x1 == y1∧ x2 != y2abc abd
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsx.equals(y);Traditional:x0 == y0∧ x1 == y1∧ x2 != y2Relaxed:x0 != y0∨ x1 != y1∨ x2 != y2abc abd
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array reads
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}5
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 55
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsRelaxed:x == 5∨ x == 3switch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 55
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}10
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 1010
    • PATH CONDITION RELAXATION1.Array inequalities 3. Multi-clause conditionals2. Switch statements 4.Array readsswitch(x) {case 1:...break;case 3:case 5:...break;default:...}Traditional:x == 10Relaxed:x != 1∧ x != 3∧ x != 510
    • OUR IMPROVEMENTSIncrease the number ofpossible choices for I’Chose I’ such that it isas different as possible from I
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5i2 == 3i3 == 0
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5i2 == 3i3 == 0boolean foo(int x, int y, int z) {if(x <= 5) {int a = x * 2;if(y + a > 10) {if(z == 0) {return true;}}}return false;}5 3 0
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
    • ConstraintSolverBREAKABLE INPUT CONDITIONSPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0i1 == 4i2 == 10i3 == 0
    • ASSUMPTIONS
    • ASSUMPTIONS1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.
    • ASSUMPTIONS1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.2. Any input that satisfies the path condition results in f.• Non-determinism‣ common to all debugging techniques; requires a deterministicreplay mechanism• Implicit checks (e.g., division by zero)‣ likely that they do not involve relevant inputs‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
    • ASSUMPTIONS
    • ASSUMPTIONS✘
    • ASSUMPTIONS✘
    • 1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.
    • 1. The failure f is observable and can be detected with anassertion.‣ common to all debugging techniques; holds in most, if not all, cases.2. Any input that satisfies the path condition results in f.• Non-determinism‣ common to all debugging techniques; requires a deterministicreplay mechanism• Implicit checks (e.g., division by zero)‣ likely that they do not involve relevant inputs‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
    • EVALUATION
    • EVALUATION1 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?
    • EVALUATION1 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?StrengthHow much informationabout the original inputsis revealed?2
    • EVALUATIONEffectivenessAre the anonymizedinputs safe to send todevelopers?31 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?StrengthHow much informationabout the original inputsis revealed?2
    • EVALUATIONEffectivenessAre the anonymizedinputs safe to send todevelopers?31 FeasibilityCan the approachgenerate, in a reasonableamount of time,anonymized inputs thatreproduce a failure?StrengthHow much informationabout the original inputsis revealed?2 4 ImprovementDoes the use of pathcondition relaxation andbreakable inputconditions provide anybenefits over the basicapproach?
    • i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0
    • i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0JavaPathfinderJavaPathfinder
    • i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0JavaPathfinderJavaPathfinderYices
    • i1 == 4i2 == 10i3 == 0ConstraintSolverPROTOTYPE IMPLEMENTATIONPath Condition:i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0BreakableInput Condition:i1 != 5∧ i2 != 3∧ i3 != 0JavaPathfinderJavaPathfinderYicesRubyscriptsExecutableinputs
    • SUBJECTS• Columba: 1 fault• htmlparser: 1 fault• Printtokens: 2 faults• NanoXML: 16 faults(20 faults, total)
    • SUBJECTS• Columba: 1 fault• htmlparser: 1 fault• Printtokens: 2 faults• NanoXML: 16 faultsSelect sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size(20 faults, total)
    • SUBJECTS• Columba: 1 fault• htmlparser: 1 fault• Printtokens: 2 faults• NanoXML: 16 faultsSelect sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size(Assume all of each input is potentially sensitive)(20 faults, total)
    • RQ1:FEASIBILITY015030045060005101520columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Averageexecutiontime(s)Averagesolvertime(s)
    • RQ1:FEASIBILITY015030045060005101520columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Averageexecutiontime(s)Averagesolvertime(s)Inputs can be anonymized in a reasonableamount of time (easily done overnight)
    • Average % Bits Revealed Average % ResidueRQ2: STRENGTH
    • Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionLittleinformation revealed
    • Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionLots ofinformation revealed
    • Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionMeasures how much of theanonymized input is identicalto the original inputAAAAAAsecretAAAAAA...AAAAAABBBBBBsecretBBBBBB...BBBBBBI’Lots ofinformation revealedI
    • Average % Bits Revealed Average % ResidueRQ2: STRENGTHMeasures how many inputsthat satisfy the pathconditionMeasures how much of theanonymized input is identicalto the original inputAAAAAAsecretAAAAAA...AAAAAABBBBBBsecretBBBBBB...BBBBBBI’Lots ofinformation revealedI
    • RQ2: STRENGTH02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Average%BitsRevealedAverage%Residue
    • RQ2: STRENGTH02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16Average%BitsRevealedAverage%ResidueAnonymized inputs reveal, on average, between60% (worst case) and 2% (best case) of theinformation in the original inputs
    • RQ3: EFFECTIVENESSNANOXML<!DOCTYPE Foo [   <!ELEMENT Foo (ns:Bar)>   <!ATTLIST Foo       xmlns CDATA #FIXED http://nanoxml.n3.net/bar       a     CDATA #REQUIRED>   <!ELEMENT ns:Bar (Blah)>   <!ATTLIST ns:Bar       xmlns:ns CDATA #FIXED http://nanoxml.n3.net/bar>   <!ELEMENT Blah EMPTY>   <!ATTLIST Blah       x    CDATA #REQUIRED       ns:x CDATA #REQUIRED>]><!-- comment --><Foo a=very b=secret c=stuff>vaz   <ns:Bar>       <Blah x="1" ns:x="2"/>   </ns:Bar></Foo>
    • RQ3: EFFECTIVENESSNANOXML<!DOCTYPE [   <! >   <!ATTLIST        #FIXED         >   <!E >   <!ATTLIST        #FIXED >   <!E >   <!ATTLIST        #        : # >]><!-- -->< = = = >   < : >       < =" " : =" "/>   </ :
    • Wayne,Bartley,Bartley,Wayne,wbartly@acp.com,,Ronald,Kahle,Kahle,Ron,ron.kahle@kahle.com,,Wilma,Lavelle,Lavelle,Wilma,,lavelle678@aol.com,Jesse,Hammonds,Hammonds,Jesse,,hamj34@comcast.com,Amy,Uhl,Uhl,Amy,uhla@corp1.com,uhla@gmail.com,Hazel,Miracle,Miracle,Hazel,hazel.miracle@corp2.com,,Roxanne,Nealy,Nealy,Roxie,,roxie.nearly@gmail.com,Heather,Kane,Kane,Heather,kaneh@corp2.com,,Rosa,Stovall,Stovall,Rosa,,sstoval@aol.com,Peter,Hyden,Hyden,Pete,,peteh1989@velocity.net,Jeffrey,Wesson,Wesson,Jeff,jwesson@corp4.com,,Virginia,Mendoza,Mendoza,Ginny,gmendoza@corp4.com,,Richard,Robledo,Robledo,Ralph,ralphrobledo@corp1.com,,Edward,Blanding,Blanding,Ed,,eblanding@gmail.com,Sean,Pulliam,Pulliam,Sean,spulliam@corp2.com,,Steven,Kocher,Kocher,Steve,kocher@kocher.com,,Tony,Whitlock,Whitlock,Tony,,tw14567@aol.com,Frank,Earl,Earl,Frankie,,,Shelly,Riojas,Riojas,Shelly,srojas@corp6.com,,RQ3: EFFECTIVENESSCOLUMBA, , , , ,,, , , , ,,, , , ,, ,, , , ,, ,, , , , , ,, , , , ,,, , , ,, ,, , , , ,,, , , ,, ,, , , ,, ,, , , , ,,, , , , ,,, , , , ,,, , , ,, ,, , , , ,,, , , , ,,, , , ,, ,, , , ,,,
    • RQ3: EFFECTIVENESSCOLUMBA, , , , ,,, , , , ,,, , , ,, ,, , , ,, ,, , , , , ,, , , , ,,, , , ,, ,, , , , ,,, , , ,, ,, , , ,, ,, , , , ,,, , , , ,,, , , , ,,, , , ,, ,, , , , ,,, , , , ,,, , , ,, ,, , , ,,,
    • RQ3: EFFECTIVENESSHTMLPARSER<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title><style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/body {margin: 0px;.../*]]>*/--></style></head><body>...</body>
    • RQ3: EFFECTIVENESSHTMLPARSER<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title><style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/body {margin: 0px;.../*]]>*/--></style></head><body>...</body>
    • RQ3: EFFECTIVENESSHTMLPARSER<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title><style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/body {margin: 0px;.../*]]>*/--></style></head><body>...</body>The portions of the inputs that remain afteranonymization tend to be structural in nature andtherefore are safe to send to developers
    • RQ4: IMPROVEMENT
    • RQ4: IMPROVEMENT02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16%ImprovementBitsRevealed%ImprovementResidue
    • RQ4: IMPROVEMENT02550751000255075100columbahtmlparserprinttokens1printtokens2nanoxml1nanoxml2nanoxml3nanoxml4nanoxml5nanoxml6nanoxml7nanoxml8nanoxml9nanoxml10nanoxml11nanoxml12nanoxml13nenoxml14nanoxml15nanoxml16%ImprovementBitsRevealed%ImprovementResidueInputs anonymized using our improvementsreveal an average of 30% less bits of informationand 40% less residue.(With only a marginal increase in time.)
    • RELATED WORK
    • RELATED WORK• Castro and colleagues ’08
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11• Grechanik and colleagues ’11
    • RELATED WORK• Castro and colleagues ’08• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crashdumps.• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with aclient machine to construct anonymized inputs• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11• Grechanik and colleagues ’11• Dynamic symbolic execution techniques
    • FUTURE WORK
    • FUTURE WORK• Additional quality metrics that:• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use
    • FUTURE WORK• Additional quality metrics that:• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use• Conduction additional (human) studies• additional (larger) subjects
    • FUTURE WORK• Additional quality metrics that:• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use• Conduction additional (human) studies• additional (larger) subjects• Investigate the combination of anonymization andminimization
    • SUMMARY
    • SUMMARY1. An approach for automatically anonymizing failure-inducinginputs• extends Castro and colleagues’ technique through thenovel concepts of path condition relaxation andbreakable input conditions
    • SUMMARY1. An approach for automatically anonymizing failure-inducinginputs• extends Castro and colleagues’ technique through thenovel concepts of path condition relaxation andbreakable input conditions2. An empirical evaluation that demonstrates, for the subjectsconsidered, our approach is:• feasible — generates anonymized inputs in < 10 minutes• effective — anonymized inputs did not contain sensitiveinformation• an improvement over the state-of-the-art
    • QUESTIONS?