Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007)

on

  • 458 views

 

Statistics

Views

Total Views
458
Views on SlideShare
458
Embed Views
0

Actions

Likes
0
Downloads
12
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007) Presentation Transcript

  • 1. Dytan:A GenericDynamic Taint AnalysisFrameworkJames Clause,Wanchun (Paul) Li,and Alessandro OrsoCollege of ComputingGeorgia Institute of TechnologyPartially supported by:NSF awards CCF-0541080 and CCR-0205422 to Georgia Tech,DHS and US Air Force Contract No. FA8750-05-2-0214
  • 2. CAB ZDynamic taint analysis(aka dynamic information-flow analysis)
  • 3. CAB312ZDynamic taint analysis(aka dynamic information-flow analysis)
  • 4. CAB312ZDynamic taint analysis(aka dynamic information-flow analysis)
  • 5. CAB312Z3Dynamic taint analysis(aka dynamic information-flow analysis)
  • 6. Dynamic tainting applicationsInformation policy enforcementAttack detection / preventionTestingData lifetime / scope
  • 7. Dynamic tainting applicationsInformation policy enforcementAttack detection / preventionTestingData lifetime / scopeAttack detection / preventionDetect / prevent attacks such as SQL injection, buffer overruns,stack smashing, cross site scriptinge.g., Suh et al. 04, Newsome and Song 05,Halfond et al. 06, Kong et al. 06, Qin et al. 06
  • 8. Dynamic tainting applicationsInformation policy enforcementAttack detection / preventionTestingData lifetime / scopeInformation policy enforcementensure classified information does not leak outside the systeme.g.,Vachharajani et al. 04, McCamant and Ernst 06
  • 9. Dynamic tainting applicationsInformation policy enforcementAttack detection / preventionTestingData lifetime / scopeTestingCoverage metrics, test data generation heuristic, ...e.g., Masri et al 05, Leek et al. 07
  • 10. Dynamic tainting applicationsInformation policy enforcementAttack detection / preventionTestingData lifetime / scopeData lifetime / scopetrack how long sensitive data, such as passwords or accountnumbers, remain in the applicatione.g., Chow et al. 04
  • 11. Dynamic tainting applicationsInformation policy enforcementAttack detection / preventionTestingData lifetime / scope
  • 12. MotivationAd-hoc taint analysisimplementationResultsAd-hoc taint analysisimplementationAd-hoc taint analysisimplementationResultsResults
  • 13. MotivationAd-hoc taint analysisimplementationResultsAd-hoc taint analysisimplementationAd-hoc taint analysisimplementationResultsResultsAd-hoc taint analysisimplementationResults
  • 14. MotivationAd-hoc taint analysisimplementationResultsAd-hoc taint analysisimplementationAd-hoc taint analysisimplementationResultsResultsAd-hoc taint analysisimplementationResults
  • 15. MotivationConfigurationDytan GenericFrameworkCustom DynamicTaint Analysis Results
  • 16. Motivation•FlexibleConfigurationDytan GenericFrameworkCustom DynamicTaint Analysis Results
  • 17. Motivation•Flexible•Easy to useConfigurationDytan GenericFrameworkCustom DynamicTaint Analysis Results
  • 18. Motivation•Flexible•Easy to use•AccurateConfigurationDytan GenericFrameworkCustom DynamicTaint Analysis Results
  • 19. Outline✓Motivation & overview• Framework (Dytan)• flexibility• ease of use• accuracy• Empirical evaluation• Conclusions
  • 20. Framework: flexibilityTaintsourcesPropagationpolicyTaintsinksConfiguration
  • 21. Framework: flexibilityTaintsourcesPropagationpolicyTaintsinks
  • 22. Framework: flexibilityTaintsourcesTaintsourcesPropagationpolicyTaintsinksWhich data to tag, and how to tag it
  • 23. Framework: flexibilityPropagationpolicyTaintsourcesPropagationpolicyTaintsinksHow tags should be propagated at runtime
  • 24. Framework: flexibilityTaintsinksTaintsourcesPropagationpolicyTaintsinksWhere and how tags should be checked
  • 25. Framework: flexibilityTaintsourcesPropagationpolicyTaintsinks
  • 26. Taint sourcesWhat to tag How to tag
  • 27. Taint sourcesWhat to tag How to tagIdentify what program datashould be assigned tags
  • 28. Taint sourcesWhat to tag How to tagIdentify what program datashould be assigned tags• Variables (local or global)• Function parameters• Function return values• Data from an input streamnetwork, filesystem,keyboard, ...• Specific input stream141.195.121.134:80, a.txt,...
  • 29. Taint sourcesWhat to tag How to tagIdentify what program datashould be assigned tags• Variables (local or global)• Function parameters• Function return values• Data from an input streamnetwork, filesystem,keyboard, ...• Specific input stream141.195.121.134:80, a.txt,...Describe how tags should beassigned for identified data
  • 30. Taint sourcesWhat to tag How to tagIdentify what program datashould be assigned tags• Variables (local or global)• Function parameters• Function return values• Data from an input streamnetwork, filesystem,keyboard, ...• Specific input stream141.195.121.134:80, a.txt,...Describe how tags should beassigned for identified data• Single tag• One tag per source• Multiple tags per source
  • 31. Taint sourcesWhat to tag How to tagIdentify what program datashould be assigned tags• Variables (local or global)• Function parameters• Function return values• Data from an input streamnetwork, filesystem,keyboard, ...• Specific input stream141.195.121.134:80, a.txt,...Describe how tags should beassigned for identified data• Single tag• One tag per source• Multiple tags per source• ...
  • 32. a.txtTaint sourcesWhat to tag: a.txtHow to tag: single tag
  • 33. a.txtTaint sourcesWhat to tag: a.txtHow to tag: single tag
  • 34. Taint sourcesWhat to tag: a.txtHow to tag: single taga.txt
  • 35. Taint sourcesWhat to tag: a.txtHow to tag: single taga.txt1 1 1 1 1 1
  • 36. Taint sourcesWhat to tag: a.txtHow to tag: single taga.txt
  • 37. Taint sourcesWhat to tag: a.txta.txtHow to tag: multiple tags
  • 38. Taint sourcesWhat to tag: a.txta.txt2 31 4 5 nHow to tag: multiple tags
  • 39. Propagation policy3BA123C
  • 40. Affecting data Mapping functionPropagation policy3BA123C
  • 41. Affecting data Mapping functionData that affects the outcome of astatement throughPropagation policy3BA123C
  • 42. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependenciesPropagation policy3BA123C
  • 43. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependencies• Control dependenciesPropagation policy3BA123C
  • 44. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependencies• Control dependenciesA policy can consider both or onlydata dependenciesPropagation policy3BA123C
  • 45. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependencies• Control dependenciesA policy can consider both or onlydata dependenciesDefine how tags associated withaffecting data should be combinedPropagation policy3BA123C
  • 46. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependencies• Control dependenciesA policy can consider both or onlydata dependenciesDefine how tags associated withaffecting data should be combined• UnionPropagation policy3BA123C
  • 47. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependencies• Control dependenciesA policy can consider both or onlydata dependenciesDefine how tags associated withaffecting data should be combined• Union• MaxPropagation policy3BA123C
  • 48. Affecting data Mapping functionData that affects the outcome of astatement through• Data dependencies• Control dependenciesA policy can consider both or onlydata dependenciesDefine how tags associated withaffecting data should be combined• Union• Max• ...Propagation policy3BA123C
  • 49. if(X) {C = A + B;}Propagation policy
  • 50. 3if(X) {C = A + B;}1 2Propagation policy
  • 51. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependenceunionmax
  • 52. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependence✔unionmax
  • 53. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependence✔unionmax✔
  • 54. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependence✔unionmax✔
  • 55. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependence✔unionmax✔1 2
  • 56. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependenceunionmax
  • 57. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependenceunionmax✔✔
  • 58. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependenceunionmax✔✔✔
  • 59. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependenceunionmax✔✔✔
  • 60. 3if(X) {C = A + B;}1 2Propagation policyAffecting data:control dependenceMapping function:data dependenceunionmax✔✔✔3
  • 61. Where to check What to checkTaint SinksHow to check
  • 62. Where to check What to checkLocation in the program toperform a checkTaint SinksHow to check
  • 63. Where to check What to checkLocation in the program toperform a check• Function entry / exit• Statement type• Specific program pointTaint SinksHow to check
  • 64. Where to check What to checkLocation in the program toperform a check• Function entry / exit• Statement type• Specific program pointThe data whose tags shouldbe checkedTaint SinksHow to check
  • 65. Where to check What to checkLocation in the program toperform a check• Function entry / exit• Statement type• Specific program pointThe data whose tags shouldbe checked• Variables• Function parameters• Function return valueTaint SinksHow to check
  • 66. Where to check What to checkLocation in the program toperform a check• Function entry / exit• Statement type• Specific program pointThe data whose tags shouldbe checked• Variables• Function parameters• Function return valueTaint SinksHow to checkSet of conditions to check and a set of actions to perform if theconditions are not met.
  • 67. Where to check What to checkLocation in the program toperform a check• Function entry / exit• Statement type• Specific program pointThe data whose tags shouldbe checked• Variables• Function parameters• Function return valueTaint SinksHow to checkSet of conditions to check and a set of actions to perform if theconditions are not met.• validate presence of tags (exit or log)• ensure absence of tags (exit or log)• ...
  • 68. Taint Sinkscmd = read(file);args = read(socket);cmd = trim(cmd + args);...tok[] = parse(cmd);exec(tok[0], tok[1]);
  • 69. Taint Sinkscmd = read(file);args = read(socket);cmd = trim(cmd + args);...tok[] = parse(cmd);exec(tok[0], tok[1]);2
  • 70. Taint Sinkscmd = read(file);args = read(socket);cmd = trim(cmd + args);...tok[] = parse(cmd);exec(tok[0], tok[1]);23
  • 71. validate presence of:validate absence of:Taint Sinksfunction: exec, param: 0Where / what to check:How to check:Result:cmd = read(file);args = read(socket);cmd = trim(cmd + args);...tok[] = parse(cmd);exec(tok[0], tok[1]);2323
  • 72. validate presence of:validate absence of:Taint Sinksfunction: exec, param: 0Where / what to check:How to check:Result:cmd = read(file);args = read(socket);cmd = trim(cmd + args);...tok[] = parse(cmd);exec(tok[0], tok[1]);23232 3
  • 73. validate presence of:validate absence of:Taint Sinksfunction: exec, param: 0Where / what to check:How to check:Result:cmd = read(file);args = read(socket);cmd = trim(cmd + args);...tok[] = parse(cmd);exec(tok[0], tok[1]);✘23232 3
  • 74. Framework: ease of useProvide two ways to configure the framework
  • 75. Framework: ease of use• Basic• Select sources, propagation policies, and sinksfrom a set of predefined options• XML based configurationProvide two ways to configure the framework
  • 76. Framework: ease of use• Basic• Select sources, propagation policies, and sinksfrom a set of predefined options• XML based configuration• Advanced• Suitable for more esoteric applications• Extend OO implementationProvide two ways to configure the framework
  • 77. Framework: accuracy• Dytan operates at the binary level• consider the actual program semantics• transparently handle libraries• Dytan accounts for both data- and control-flow dependencies
  • 78. Framework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement
  • 79. Two common examples:Framework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement
  • 80. Two common examples:• Implicit operandsadd %eax, %ebx // A = A + BFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement
  • 81. Two common examples:• Implicit operandsadd %eax, %ebx // A = A + Bproduced: %eaxFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement
  • 82. Two common examples:• Implicit operandsadd %eax, %ebx // A = A + Bproduced: %eax, %eflagsFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement
  • 83. • Address Generatorsadd %eax, %ebx // A = A + BTwo common examples:• Implicit operandsadd %eax, %ebx // A = A + Bproduced: %eax, %eflagsFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement
  • 84. • Address Generatorsadd %eax, %ebx // A = A + BTwo common examples:• Implicit operandsadd %eax, %ebx // A = A + Bproduced: %eax, %eflagsFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement[ ] *
  • 85. • Address Generatorsadd %eax, %ebx // A = A + Bconsumed: %eax, [%ebx]Two common examples:• Implicit operandsadd %eax, %ebx // A = A + Bproduced: %eax, %eflagsFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement[ ] *
  • 86. • Address Generatorsadd %eax, %ebx // A = A + Bconsumed: %eax, [%ebx], %ebxTwo common examples:• Implicit operandsadd %eax, %ebx // A = A + Bproduced: %eax, %eflagsFramework: accuracyThe most common source of inaccuracy isincorrectly identifying the informationproduced and consumed by a statement[ ] *
  • 87. Outline✓Motivation & overview✓Framework✓flexibility✓ease of use✓accuracy• Empirical evaluation• Conclusions
  • 88. Empirical evaluation• RQ1: Can Dytan be used to (easily) implementexisting dynamic taint analyses?• RQ2: How do inaccurate propagation policiesaffect the analysis results?• In addition: discussion on performance
  • 89. RQ1: flexibility• Selected two techniques:• Overwrite attack detection [Qin et al. 04]• SQL injection detection [Halfond et al. 06]• Used Dytan to re-implement both techniques• Measure implementation time• Validate against the original implementationGoal: show that Dytan can be used to (easily)implement existing dynamic taint analyses
  • 90. RQ1: results• Implementation time:• Overwrite attack detection: < 1 hour• SQL injection detection: < 1 day• Comparison with original implementations:• Successfully stopped same attacks as theoriginal implementations
  • 91. RQ2: accuracy impactGoal: measure the effect of inaccurate propagationpolicies on analysis results
  • 92. RQ2: accuracy impact• Selected two subjects:• Gzip (75kb w/o libraries)• Firefox (850kb w/o libraries)Goal: measure the effect of inaccurate propagationpolicies on analysis results
  • 93. RQ2: accuracy impact• Selected two subjects:• Gzip (75kb w/o libraries)• Firefox (850kb w/o libraries)• Use Dytan to taint program inputs and measure theamount of heap data tainted at program exitGoal: measure the effect of inaccurate propagationpolicies on analysis results
  • 94. RQ2: accuracy impact• Selected two subjects:• Gzip (75kb w/o libraries)• Firefox (850kb w/o libraries)• Use Dytan to taint program inputs and measure theamount of heap data tainted at program exit• Compare Dytan against inaccurate policies• no implicit operands (no IM)• no address generators (no AG)• no implicit operands, no address generators (noIM, no AG)Goal: measure the effect of inaccurate propagationpolicies on analysis results
  • 95. RQ2: results0%25%50%75%100%Firefox (1 page) Firefox (3 pages) GzipDytan No IM No AG No IM, no IG
  • 96. Performance• Measured for gzip:≈30x for data flow≈50x for data and control flow• High overhead, but...
  • 97. Performance• In line with existing implementations• Measured for gzip:≈30x for data flow≈50x for data and control flow• High overhead, but...
  • 98. Performance• In line with existing implementations• Designed for experimentation• Favors flexibility over performance• Measured for gzip:≈30x for data flow≈50x for data and control flow• High overhead, but...
  • 99. Performance• In line with existing implementations• Designed for experimentation• Favors flexibility over performance• Implementation can be further optimized• Measured for gzip:≈30x for data flow≈50x for data and control flow• High overhead, but...
  • 100. Related work• Existing dynamic tainting approaches[Suh et al. 04, Newsome and Song 05, Halfond et al. 06, Kong et al. 06, ...]• Ad-hoc• Other dynamic taint analysis frameworks[Xu et al. 06 and Lam and Chiueh 06]• Focused on security applications• Single taint mark• No control-flow propagation• Operate at the source code level
  • 101. Conclusions• Dytan• a general framework for dynamic tainting• allows for instantiating and experimenting withdifferent dynamic taint analysis approaches• Initial evaluation• flexible• easy to use• accurate
  • 102. Future directions• Tool release (documentation, code cleanup)http://www.cc.gatech.edu/~clause/dytan/(pre-release on request)• Optimization (general and specific)• Applications• Memory protection• Debugging
  • 103. Questions?http://www.cc.gatech.edu/~clause/dytan/