Improving Software Reliability via  Static and Dynamic Analysis Tao Xie ,  Automated Software Engineering Group Department...
Group Overview <ul><li>Inputs: </li></ul><ul><li>Current funding support </li></ul><ul><ul><li>NSF CyberTrust (3 yrs), NSF...
Group Overview cont. <ul><li>Outputs: </li></ul><ul><li>Research around two major themes:  </li></ul><ul><ul><li>Automated...
Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul...
Mining Textual SE data <ul><li>Bug reports [ICSE 08] </li></ul><ul><ul><li>Detecting duplicate bug reports </li></ul></ul>...
Two duplicate bug reports in Firefox - using only natural language information may fail <ul><li>Bug-260331: After closing ...
Two non-duplicate bug reports in Firefox - using only execution information may fail <ul><li>Bug-244372: &quot; Document c...
Classification of Bug Reports <ul><li>Bugs related to security issues </li></ul><ul><li>Bugs related to design problems </...
Example API Docs <ul><li>javax.resource.cci.Connection </li></ul><ul><li>createInteraction():  “ Creates  an interaction a...
Mining Properties from API Docs
Potential Collaboration Ideas on Text Mining <ul><li>Documents submitted by device manufacturers are in NL and are too man...
Potential Collaboration Ideas on Text Mining – Possible Examples <ul><li>Extract safety-related requirements from document...
Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul...
Motivation
Problem <ul><li>Software system verification: given properties, verification tools can be used to detect whether the syste...
Artifacts in Code Mining <ul><li>Data : usage info from various code locations of using APIs such as malloc, seteuid, and ...
Approach Overview System Code Bases 1 2 N … For  each  external API 2.Trace/ Search MOPS <ul><li>Trace/Search  source file...
Example Target Defect Types <ul><li>Neglected-condition defects </li></ul><ul><li>Error-handling defects </li></ul><ul><li...
Mined Neglected Condition <ul><li>From Grass open source GIS project </li></ul>Developer confirmed “I believe this issue h...
Mined Patterns of Error Handling <ul><li>From Redhat 9.0 r outed-0.17-14 </li></ul>If violated, defects are detected Error...
Mined Patterns of Exception Handling If missing resource cleanup, defects are detected Resource creation  Resource manipul...
Potential Collaboration Ideas on Code Mining <ul><li>Address problems similar to ones targeted by FDA’s previous work on “...
Potential Collaboration Ideas on Code Mining  cont. <ul><li>Our tool development status </li></ul><ul><li>Neglected condit...
Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul...
Dynamic Symbolic Execution <ul><li>Dynamic  symbolic execution combines static and dynamic analysis: </li></ul><ul><li>Exe...
Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and  Monitor Reco...
Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and  Monitor Reco...
Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and  Monitor Reco...
Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and  Monitor Reco...
Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and  Monitor Reco...
Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and  Monitor Reco...
Potential Collaboration Ideas on Automated Testing <ul><li>Address problems similar to ones targeted by FDA’s previous wor...
Potential Collaboration Ideas on Automated Testing cont. <ul><li>Our tool development status </li></ul><ul><li>Most mature...
Potential Collaboration Ideas on Automated Testing cont. <ul><li>Regression test generation/differential testing: Given tw...
Other Research Areas <ul><li>Mining program execution to aid program understanding, debugging, … </li></ul><ul><li>Mining ...
Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul...
Upcoming SlideShare
Loading in...5
×

Static and Dynamic Code Analysis (presentation)

1,236

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,236
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Static and Dynamic Code Analysis (presentation)

  1. 1. Improving Software Reliability via Static and Dynamic Analysis Tao Xie , Automated Software Engineering Group Department of Computer Science North Carolina State University http://ase.csc.ncsu.edu/
  2. 2. Group Overview <ul><li>Inputs: </li></ul><ul><li>Current funding support </li></ul><ul><ul><li>NSF CyberTrust (3 yrs), NSF SoD (3 yrs), ARO (3 yrs), NIST supplement, IBM Faculty Award, Microsoft Research, ABB Research </li></ul></ul><ul><li>Collaboration with agencies and industry </li></ul><ul><ul><li>NIST, NASA, DOE Lab, Army division, Microsoft Research, IBM Rational, ABB Research </li></ul></ul><ul><li>Current student team </li></ul><ul><ul><li>6 Ph.D. students, 1 M.S. student, 5 probation-staged grad students </li></ul></ul>
  3. 3. Group Overview cont. <ul><li>Outputs: </li></ul><ul><li>Research around two major themes: </li></ul><ul><ul><li>Automated Software Testing; Mining Software Engineering Data </li></ul></ul><ul><li>Industry impact </li></ul><ul><ul><li>We found Parasoft Jtest 4.5 generated 90% redundant tests [ASE 04] </li></ul></ul><ul><ul><li>Agitar AgitarOne used a similar technique as our Jov [ASE 03] </li></ul></ul><ul><ul><li>MSR and NASA adopted Symstra technique [TACAS 05] </li></ul></ul><ul><ul><li>MSR Pex adopted our recent techniques </li></ul></ul><ul><li>Research publications </li></ul><ul><ul><li>2008: TOSEM, ICSE, 3*ASE, SIGMETRIC, ISSRE, ICSM, SRDS, ACSAC, … </li></ul></ul><ul><ul><li>2007: ICSE, FSE, 4*ASE, WWW, ICSM, … </li></ul></ul><ul><ul><li>… </li></ul></ul>
  4. 4. Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul><li>Automated testing </li></ul>
  5. 5. Mining Textual SE data <ul><li>Bug reports [ICSE 08] </li></ul><ul><ul><li>Detecting duplicate bug reports </li></ul></ul><ul><ul><li>Classifying bug reports </li></ul></ul><ul><li>API documentation </li></ul><ul><li>Project documentation </li></ul>
  6. 6. Two duplicate bug reports in Firefox - using only natural language information may fail <ul><li>Bug-260331: After closing Firefox, the process is still running. Cannot reopen Firefox after that, unless the previous process is killed manually </li></ul><ul><li>Bug-239223: (Ghostproc) – [Meta] firefox.exe doesn't always exit after closing all windows; session-specific data retained </li></ul>
  7. 7. Two non-duplicate bug reports in Firefox - using only execution information may fail <ul><li>Bug-244372: &quot; Document contains no data &quot; message on continuation page of NY Times article </li></ul><ul><li>Bug-219232: random &quot;The Document contains no data .&quot; Alerts </li></ul><ul><li>Proposed solution [ICSE 08]: mining both textual information of bug reports and execution information of their failing tests </li></ul>
  8. 8. Classification of Bug Reports <ul><li>Bugs related to security issues </li></ul><ul><li>Bugs related to design problems </li></ul><ul><li>Bugs related to insufficient unit testing </li></ul><ul><li>… </li></ul><ul><li>Manually label a subset of bug reports with their categories </li></ul><ul><li>Apply classification algorithms on unlabeled bug reports to predict their categories </li></ul><ul><li>Benefit: reduce manual labeling efforts </li></ul>
  9. 9. Example API Docs <ul><li>javax.resource.cci.Connection </li></ul><ul><li>createInteraction(): “ Creates an interaction associated with this connection ” </li></ul><ul><li> action-resource pair: create-connection </li></ul><ul><li>getMetaData(): “ Gets the information on the underlying EIS instance represented through an active connection ”  action-resource pair: get-connection </li></ul><ul><li>close(): “ Initiates close of the connection handle at the application level”  action-resource pair: close-connection </li></ul>
  10. 10. Mining Properties from API Docs
  11. 11. Potential Collaboration Ideas on Text Mining <ul><li>Documents submitted by device manufacturers are in NL and are too many or long for manual inspection </li></ul><ul><li>Classification problem </li></ul><ul><ul><li>Train learning tools with some labeled documents </li></ul></ul><ul><li>Clustering problem </li></ul><ul><ul><li>Without labeling, group documents based on similarity </li></ul></ul><ul><li>Selection problem </li></ul><ul><ul><li>Similar to duplicate bug report detection </li></ul></ul>
  12. 12. Potential Collaboration Ideas on Text Mining – Possible Examples <ul><li>Extract safety-related requirements from documents  manually extract some and then tools recommend some more based on manually extracted ones </li></ul><ul><li>Classify incident reports (e.g., with ontology)  manually classify some and then tools recommend categories for the rest </li></ul><ul><li>Detect correlations among incident reports  similar to duplicate bug report detection </li></ul><ul><li>Other pre-market textual documents </li></ul><ul><li>Other post-market textual documents </li></ul><ul><li>… </li></ul>
  13. 13. Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul><li>Automated testing </li></ul>
  14. 14. Motivation
  15. 15. Problem <ul><li>Software system verification: given properties, verification tools can be used to detect whether the system violates the properties </li></ul><ul><ul><li>Example: malloc return check </li></ul></ul><ul><li>However, these properties often do not exist </li></ul><ul><ul><li>Who write these property? </li></ul></ul><ul><ul><li>How often these property are written? </li></ul></ul><ul><ul><li>How often these property are known? </li></ul></ul><ul><li>Objective: Mine API properties for static verification from the API client code in existing system code bases </li></ul>
  16. 16. Artifacts in Code Mining <ul><li>Data : usage info from various code locations of using APIs such as malloc, seteuid, and execl </li></ul><ul><li>Patterns : sequencing constraints among collected API invocation sequences and condition checks </li></ul><ul><li>Anomalies : violations of these patterns as potential defects </li></ul>
  17. 17. Approach Overview System Code Bases 1 2 N … For each external API 2.Trace/ Search MOPS <ul><li>Trace/Search source files that use each external API from existing code </li></ul>… Usage Info Around APIs <cond, API 1 > ... … 3.Analyze <ul><li>Analyze collected traces/files to extract usage info around APIs </li></ul>Input System 1.Extract Internal APIs External APIs <ul><li>Extract external APIs from the input system </li></ul>Detected Violations as Bugs 5.Verify <ul><li>Verify the input system against these properties to detect bugs </li></ul>Frequent Patterns around APIs 4.Mine <ul><li>Mine frequent usage patterns around APIs as API properties </li></ul>
  18. 18. Example Target Defect Types <ul><li>Neglected-condition defects </li></ul><ul><li>Error-handling defects </li></ul><ul><li>Exception-handling defects </li></ul><ul><li>These defect types can result in </li></ul><ul><ul><li>Critical security, robustness, reliability issues </li></ul></ul><ul><ul><li>Performance degradation </li></ul></ul><ul><ul><ul><li>Example: Failure to release a resource may decrease the performance </li></ul></ul></ul>
  19. 19. Mined Neglected Condition <ul><li>From Grass open source GIS project </li></ul>Developer confirmed “I believe this issue has uncovered a bug: the pointer returned by the fopen () call isn't checked at all. The code responsible for this particular issue is surprisingly short, to make it a good example on how not to write the code” $ nl -ba main.c ...    71          fp = fopen(&quot;dumpfile&quot;, &quot;w&quot;);    72          BM_file_write(fp, map);    73          fclose(fp); ... $
  20. 20. Mined Patterns of Error Handling <ul><li>From Redhat 9.0 r outed-0.17-14 </li></ul>If violated, defects are detected Error-check specifications Multiple-API specifications close() should be called after socket()
  21. 21. Mined Patterns of Exception Handling If missing resource cleanup, defects are detected Resource creation Resource manipulation Resource cleanup
  22. 22. Potential Collaboration Ideas on Code Mining <ul><li>Address problems similar to ones targeted by FDA’s previous work on “Static Analysis of Medical Device Software using CodeSonar” by Jetley, Jones, and Anderson </li></ul><ul><li>Benefits of our new techniques </li></ul><ul><ul><li>Don’t require the code to be compilable (using partial program analysis) </li></ul></ul><ul><ul><li>Don’t require properties to be manually written down </li></ul></ul><ul><ul><li>Can accumulate knowledge (API usages) within or across devices or manufacturers (or even open source world) </li></ul></ul><ul><ul><li>May ask manufacturers to submit API usages (if not code itself?) </li></ul></ul>
  23. 23. Potential Collaboration Ideas on Code Mining cont. <ul><li>Our tool development status </li></ul><ul><li>Neglected condition bugs: tools for Java and C are ready; tool for C# is being developed </li></ul><ul><li>Error-handling bugs: tool for C is ready </li></ul><ul><li>Exception-handling bugs: tool for Java is ready and tool for C# is being developed </li></ul><ul><li>Working on tools for framework reuse bugs </li></ul>
  24. 24. Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul><li>Automated testing </li></ul>
  25. 25. Dynamic Symbolic Execution <ul><li>Dynamic symbolic execution combines static and dynamic analysis: </li></ul><ul><li>Execute program multiple times with different inputs </li></ul><ul><ul><li>build abstract representation of execution path on the side </li></ul></ul><ul><ul><li>plug in concrete results of operations which cannot be reasoned about symbolically </li></ul></ul><ul><li>Use constraint solver to obtain new inputs </li></ul><ul><ul><li>solve constraint system that represents an execution path not seen before </li></ul></ul>
  26. 26. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings
  27. 27. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; …
  28. 28. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings Path Condition: … ⋀ magicNum != 0x95673948
  29. 29. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings … ⋀ magicNum != 0x95673948 … ⋀ magicNum == 0x95673948
  30. 30. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Finds only real bugs No false warnings a[0] = 206; a[1] = 202; a[2] = 239; a[3] = 190; Initially, choose Arbitrary
  31. 31. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings
  32. 32. Potential Collaboration Ideas on Automated Testing <ul><li>Address problems similar to ones targeted by FDA’s previous work on “Static Analysis of Medical Device Software using CodeSonar” by Jetley, Jones, and Anderson </li></ul><ul><li>Benefits of our new techniques (also in contrast to existing testing techniques) </li></ul><ul><ul><li>No false positives. Each reported issue is a REAL one </li></ul></ul><ul><ul><li>Much more powerful than existing commercial tools (Parasoft C#Test, Parasoft Jtest, Agitar AgitarOne, …) </li></ul></ul>
  33. 33. Potential Collaboration Ideas on Automated Testing cont. <ul><li>Our tool development status </li></ul><ul><li>Most mature/powerful for C# testing (built around MSR Pex by collaborating with MSR Researchers) </li></ul><ul><li>Java testing tools based on NASA Java Pathfinder, jCUTE, </li></ul><ul><li>C testing tools based on Crest and Splat </li></ul>
  34. 34. Potential Collaboration Ideas on Automated Testing cont. <ul><li>Regression test generation/differential testing: Given two versions, try to find test inputs to show different behavior </li></ul><ul><ul><li>Possible idea 1: given a buggy version and claimed fixed version submitted by manufacturers, generate test inputs to show different behaviors </li></ul></ul><ul><ul><li>Possible idea 2: change impact analysis on models or code submitted by manufacturers </li></ul></ul><ul><li>Use code mining to find targets to violate by testing </li></ul><ul><ul><li>Address false positive issues </li></ul></ul>
  35. 35. Other Research Areas <ul><li>Mining program execution to aid program understanding, debugging, … </li></ul><ul><li>Mining version histories </li></ul><ul><li>Security policy testing </li></ul><ul><li>Attack generation </li></ul><ul><li>Design testing </li></ul><ul><li>Web app/service testing </li></ul><ul><li>DB app testing </li></ul><ul><li>Performance testing </li></ul><ul><li>… </li></ul>
  36. 36. Major Research Collaboration Areas <ul><li>Mining textual SE data </li></ul><ul><li>Mining program code data </li></ul><ul><li>Automated testing </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×