Your SlideShare is downloading. ×
Static and Dynamic Code Analysis (presentation)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Static and Dynamic Code Analysis (presentation)

1,197

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,197
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Improving Software Reliability via Static and Dynamic Analysis Tao Xie , Automated Software Engineering Group Department of Computer Science North Carolina State University http://ase.csc.ncsu.edu/
  • 2. Group Overview
    • Inputs:
    • Current funding support
      • NSF CyberTrust (3 yrs), NSF SoD (3 yrs), ARO (3 yrs), NIST supplement, IBM Faculty Award, Microsoft Research, ABB Research
    • Collaboration with agencies and industry
      • NIST, NASA, DOE Lab, Army division, Microsoft Research, IBM Rational, ABB Research
    • Current student team
      • 6 Ph.D. students, 1 M.S. student, 5 probation-staged grad students
  • 3. Group Overview cont.
    • Outputs:
    • Research around two major themes:
      • Automated Software Testing; Mining Software Engineering Data
    • Industry impact
      • We found Parasoft Jtest 4.5 generated 90% redundant tests [ASE 04]
      • Agitar AgitarOne used a similar technique as our Jov [ASE 03]
      • MSR and NASA adopted Symstra technique [TACAS 05]
      • MSR Pex adopted our recent techniques
    • Research publications
      • 2008: TOSEM, ICSE, 3*ASE, SIGMETRIC, ISSRE, ICSM, SRDS, ACSAC, …
      • 2007: ICSE, FSE, 4*ASE, WWW, ICSM, …
  • 4. Major Research Collaboration Areas
    • Mining textual SE data
    • Mining program code data
    • Automated testing
  • 5. Mining Textual SE data
    • Bug reports [ICSE 08]
      • Detecting duplicate bug reports
      • Classifying bug reports
    • API documentation
    • Project documentation
  • 6. Two duplicate bug reports in Firefox - using only natural language information may fail
    • Bug-260331: After closing Firefox, the process is still running. Cannot reopen Firefox after that, unless the previous process is killed manually
    • Bug-239223: (Ghostproc) – [Meta] firefox.exe doesn't always exit after closing all windows; session-specific data retained
  • 7. Two non-duplicate bug reports in Firefox - using only execution information may fail
    • Bug-244372: " Document contains no data " message on continuation page of NY Times article
    • Bug-219232: random "The Document contains no data ." Alerts
    • Proposed solution [ICSE 08]: mining both textual information of bug reports and execution information of their failing tests
  • 8. Classification of Bug Reports
    • Bugs related to security issues
    • Bugs related to design problems
    • Bugs related to insufficient unit testing
    • Manually label a subset of bug reports with their categories
    • Apply classification algorithms on unlabeled bug reports to predict their categories
    • Benefit: reduce manual labeling efforts
  • 9. Example API Docs
    • javax.resource.cci.Connection
    • createInteraction(): “ Creates an interaction associated with this connection ”
    •  action-resource pair: create-connection
    • getMetaData(): “ Gets the information on the underlying EIS instance represented through an active connection ”  action-resource pair: get-connection
    • close(): “ Initiates close of the connection handle at the application level”  action-resource pair: close-connection
  • 10. Mining Properties from API Docs
  • 11. Potential Collaboration Ideas on Text Mining
    • Documents submitted by device manufacturers are in NL and are too many or long for manual inspection
    • Classification problem
      • Train learning tools with some labeled documents
    • Clustering problem
      • Without labeling, group documents based on similarity
    • Selection problem
      • Similar to duplicate bug report detection
  • 12. Potential Collaboration Ideas on Text Mining – Possible Examples
    • Extract safety-related requirements from documents  manually extract some and then tools recommend some more based on manually extracted ones
    • Classify incident reports (e.g., with ontology)  manually classify some and then tools recommend categories for the rest
    • Detect correlations among incident reports  similar to duplicate bug report detection
    • Other pre-market textual documents
    • Other post-market textual documents
  • 13. Major Research Collaboration Areas
    • Mining textual SE data
    • Mining program code data
    • Automated testing
  • 14. Motivation
  • 15. Problem
    • Software system verification: given properties, verification tools can be used to detect whether the system violates the properties
      • Example: malloc return check
    • However, these properties often do not exist
      • Who write these property?
      • How often these property are written?
      • How often these property are known?
    • Objective: Mine API properties for static verification from the API client code in existing system code bases
  • 16. Artifacts in Code Mining
    • Data : usage info from various code locations of using APIs such as malloc, seteuid, and execl
    • Patterns : sequencing constraints among collected API invocation sequences and condition checks
    • Anomalies : violations of these patterns as potential defects
  • 17. Approach Overview System Code Bases 1 2 N … For each external API 2.Trace/ Search MOPS
    • Trace/Search source files that use each external API from existing code
    … Usage Info Around APIs <cond, API 1 > ... … 3.Analyze
    • Analyze collected traces/files to extract usage info around APIs
    Input System 1.Extract Internal APIs External APIs
    • Extract external APIs from the input system
    Detected Violations as Bugs 5.Verify
    • Verify the input system against these properties to detect bugs
    Frequent Patterns around APIs 4.Mine
    • Mine frequent usage patterns around APIs as API properties
  • 18. Example Target Defect Types
    • Neglected-condition defects
    • Error-handling defects
    • Exception-handling defects
    • These defect types can result in
      • Critical security, robustness, reliability issues
      • Performance degradation
        • Example: Failure to release a resource may decrease the performance
  • 19. Mined Neglected Condition
    • From Grass open source GIS project
    Developer confirmed “I believe this issue has uncovered a bug: the pointer returned by the fopen () call isn't checked at all. The code responsible for this particular issue is surprisingly short, to make it a good example on how not to write the code” $ nl -ba main.c ...    71          fp = fopen(&quot;dumpfile&quot;, &quot;w&quot;);    72          BM_file_write(fp, map);    73          fclose(fp); ... $
  • 20. Mined Patterns of Error Handling
    • From Redhat 9.0 r outed-0.17-14
    If violated, defects are detected Error-check specifications Multiple-API specifications close() should be called after socket()
  • 21. Mined Patterns of Exception Handling If missing resource cleanup, defects are detected Resource creation Resource manipulation Resource cleanup
  • 22. Potential Collaboration Ideas on Code Mining
    • Address problems similar to ones targeted by FDA’s previous work on “Static Analysis of Medical Device Software using CodeSonar” by Jetley, Jones, and Anderson
    • Benefits of our new techniques
      • Don’t require the code to be compilable (using partial program analysis)
      • Don’t require properties to be manually written down
      • Can accumulate knowledge (API usages) within or across devices or manufacturers (or even open source world)
      • May ask manufacturers to submit API usages (if not code itself?)
  • 23. Potential Collaboration Ideas on Code Mining cont.
    • Our tool development status
    • Neglected condition bugs: tools for Java and C are ready; tool for C# is being developed
    • Error-handling bugs: tool for C is ready
    • Exception-handling bugs: tool for Java is ready and tool for C# is being developed
    • Working on tools for framework reuse bugs
  • 24. Major Research Collaboration Areas
    • Mining textual SE data
    • Mining program code data
    • Automated testing
  • 25. Dynamic Symbolic Execution
    • Dynamic symbolic execution combines static and dynamic analysis:
    • Execute program multiple times with different inputs
      • build abstract representation of execution path on the side
      • plug in concrete results of operations which cannot be reasoned about symbolically
    • Use constraint solver to obtain new inputs
      • solve constraint system that represents an execution path not seen before
  • 26. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings
  • 27. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; …
  • 28. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings Path Condition: … ⋀ magicNum != 0x95673948
  • 29. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings … ⋀ magicNum != 0x95673948 … ⋀ magicNum == 0x95673948
  • 30. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Finds only real bugs No false warnings a[0] = 206; a[1] = 202; a[2] = 239; a[3] = 190; Initially, choose Arbitrary
  • 31. Whole-program, white-box code analysis Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Result: small test suite, high code coverage Initially, choose Arbitrary Finds only real bugs No false warnings
  • 32. Potential Collaboration Ideas on Automated Testing
    • Address problems similar to ones targeted by FDA’s previous work on “Static Analysis of Medical Device Software using CodeSonar” by Jetley, Jones, and Anderson
    • Benefits of our new techniques (also in contrast to existing testing techniques)
      • No false positives. Each reported issue is a REAL one
      • Much more powerful than existing commercial tools (Parasoft C#Test, Parasoft Jtest, Agitar AgitarOne, …)
  • 33. Potential Collaboration Ideas on Automated Testing cont.
    • Our tool development status
    • Most mature/powerful for C# testing (built around MSR Pex by collaborating with MSR Researchers)
    • Java testing tools based on NASA Java Pathfinder, jCUTE,
    • C testing tools based on Crest and Splat
  • 34. Potential Collaboration Ideas on Automated Testing cont.
    • Regression test generation/differential testing: Given two versions, try to find test inputs to show different behavior
      • Possible idea 1: given a buggy version and claimed fixed version submitted by manufacturers, generate test inputs to show different behaviors
      • Possible idea 2: change impact analysis on models or code submitted by manufacturers
    • Use code mining to find targets to violate by testing
      • Address false positive issues
  • 35. Other Research Areas
    • Mining program execution to aid program understanding, debugging, …
    • Mining version histories
    • Security policy testing
    • Attack generation
    • Design testing
    • Web app/service testing
    • DB app testing
    • Performance testing
  • 36. Major Research Collaboration Areas
    • Mining textual SE data
    • Mining program code data
    • Automated testing

×