Your SlideShare is downloading. ×
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Improving Software Reliability via Mining Software Engineering Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Improving Software Reliability via Mining Software Engineering Data

731

Published on

IEEE Computer Society Distinguished Visitors Program

IEEE Computer Society Distinguished Visitors Program

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
731
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
58
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Improving Software Reliabilityvia Mining Software Engineering DataTao XieDepartment of Computer ScienceNorth Carolina State UniversityRaleigh, USAhttp://www.csc.ncsu.edu/faculty/xieJoint work with Suresh Thummalapenta
  • 2. 2MAIN GOALTransform static record-keeping SE data to activedataMake SE data actionableby uncovering hiddenpatterns and trendsMining Software Engineering DataMailingsBugzillaCoderepositoryExecutiontracesCVS
  • 3. Mining Software Engineering Datacodebaseschangehistoryprogramstatesstructuralentitiessoftware engineering databugreports/nlprogramming defect detection testing debugging maintenancesoftware engineering tasksdata mining techniques……https://sites.google.com/site/asergrp/dmse
  • 4. Mining Software Engineering Datacodebaseschangehistoryprogramstatesstructuralentitiessoftware engineering databugreports/nlprogramming defect detection testing debugging maintenancesoftware engineering tasksdata mining techniques……
  • 5. 55Programmers commonly reuse APIs of existingframeworks or libraries–Advantages: High productivity of development–Challenges: Complexity and lack of documentation–Consequences:•Spend more efforts in understanding APIs•Introduce defects in API client code–Solution: Mining API properties as common patternsacross API client code FrameworksMotivation
  • 6. 6BasicminingalgorithmsSolution-Driven Problem-DrivenAdvancedminingalgorithmsNew/adaptedminingalgorithmsWhere can I apply X miner? What patterns dowe really need?E.g., frequent partial ordermining [ESEC/FSE 07]E.g., association rule mining,frequent itemset mining…E.g., [ICSE 09], [ASE 09]
  • 7. 777Code repositoriesCode repositories1 2 N…1 2mining patternssearching mining patternsCode search enginee.g.,Open source codeon the webEclipse, Linux, …Traditional approachesOur new approachesOften lack sufficient relevant data points (Eg. API call sites)Code repositoriesMining  Searching + Mining
  • 8. 8AgendaMotivationMining Sequence Association Rules(CAR-Miner) [ICSE 09]Detecting Exception-Handling DefectsMining Alternative Patterns(Alattin) [ASE 09]Detecting Neglected Condition DefectsConclusion
  • 9. 9APIs throw exceptions during runtime errorsExample: Session API of Hibernate framework throwsHibernateExceptionAPIs expect client applications to implementrecovery actions after exceptions occurExample: Hibernate Session API expects client application torollback open uncommitted transactions afterHibernateException occursFailure to handle exceptions results inFatal issues, e.g., database lock won’t be released if thetransaction is not rolled backException Handling
  • 10. 10Use exception-handling specification to detectviolations as defectsProblem: Often specifications are not documentedSolution: Mine specifications from existing API client codeChallenges:Limited data points: Only from a few code bases searching + miningLimited expressiveness: Not sufficient to characterizecommon exception-handling behaviors: why?Problem Addressed by CAR-Miner
  • 11. 11Example1 .1 : ...1 .2 : O r a c le D a ta S o u rc e o d s = n u ll; S e s s io n s e s s io n = n u ll;C o n n e c tio n c o n n = n u ll; S ta te m e n t s ta te m e n t = n u ll;1 .3 : lo g g e r .d e b u g (" S ta rtin g u p d a te " );1 .4 : tr y {1 .5 : o d s = n e w O ra c le D a ta S o u rc e ( );1 .6 : o d s .s e tU R L ( " jd b c :o r a c le :th in :s c o tt/tig e r @ 1 9 2 .1 6 8 .1 .2 :1 5 2 1 :c a tfis h " ) ;1 .7 : c o n n = o d s .g e tC o n n e c tio n () ;1 .8 : s t a te m e n t = c o n n .c r e a te S ta te m e n t( );1 .9 : s t a te m e n t .e x e c u t e U p d a te ( " D E L E T E F R O M t a b le 1 " ) ;1 .1 0 : c o n n e c tio n .c o m m it() ; }1 .1 1 : c a tc h ( S Q L E x c e p tio n s e ) {1 .1 3 : lo g g e r .e rr o r (" E x c e p tio n o c c u r re d " ); }1 .1 4 : fin a lly {1 .1 5 : if(s ta te m e n t != n u ll) { s ta te m e n t.c lo s e () ; }1 .1 6 : if(c o n n != n u ll) { c o n n .c lo s e ( ); }1 .1 7 : if(o d s != n u ll) { o d s .c lo s e () ; } }1 .1 8 : }S c e n a r io 1Defect: No rollback donewhen SQLException occursRequires specification suchas “Connection should berolled back when aconnection is created andSQLException occurs”Q: Should every connectioninstance has to be rolledback when SQLExceptionoccurs?Missing “conn.rollback()”
  • 12. 12Example (cont.)2 .1 : C o n n e c tio n c o n n = n u ll;2 .2 : S ta te m e n t s tm t = n u ll;2 .3 : B u ffe re d W rite r b w = n u ll; F ile W rite r fw = n u ll;2 .3 : tr y {2 .4 : fw = n e w F il e W rite r( " o u tp u t.tx t") ;2 .5 : b w = B u ffe r e d W r ite r(fw );2 .6 : c o n n = D riv e r M a n a g e r.g e tC o n n e c tio n (" jd b c :p l:d b " , " p s " , " p s " );2 .7 : S ta t e m e n t s t m t = c o n n .c r e a t e S ta te m e n t( );2 .8 : R e s u ltS e t r e s = s tm t.e x e c u te Q u e r y ( " S E L E C T P a th F R O M F ile s " ) ;2 .9 : w h ile (r e s .n e x t() ) {2 .1 0 : b w .w r ite (r e s .g e tS trin g (1 ));2 .1 1 : }2 .1 2 : re s .c lo s e ( );2 .1 3 : } c a tc h ( IO E x c e p tio n e x ) { lo g g e r.e r ro r (" IO E x c e p tio n o c c u r re d " );2 .1 4 : } fin a lly {2 .1 5 : if( s tm t != n u ll) s tm t.c lo s e () ;2 .1 6 : if( c o n n != n u ll ) c o n n .c lo s e ( );2 .1 7 : if ( b w != n u ll) b w .c lo s e ( );2 .1 8 : }1 .1 : ...1 .2 : O r a c le D a ta S o u rc e o d s = n u ll; S e s s io n s e s s io n = n u ll;C o n n e c tio n c o n n = n u ll; S ta te m e n t s ta te m e n t = n u ll;1 .3 : lo g g e r .d e b u g (" S ta rtin g u p d a te " );1 .4 : tr y {1 .5 : o d s = n e w O ra c le D a ta S o u rc e ( );1 .6 : o d s .s e tU R L ( " jd b c :o r a c le :th in :s c o tt/tig e r @ 1 9 2 .1 6 8 .1 .2 :1 5 2 1 :c a tfis h " );1 .7 : c o n n = o d s .g e tC o n n e c tio n () ;1 .8 : s t a te m e n t = c o n n .c r e a te S ta te m e n t( );1 .9 : s t a te m e n t .e x e c u t e U p d a te ( " D E L E T E F R O M t a b le 1 " ) ;1 .1 0 : c o n n e c tio n .c o m m it() ; }1 .1 1 : c a tc h ( S Q L E x c e p tio n s e ) {1 .1 2 : if ( c o n n != n u ll) { c o n n .ro llb a c k () ; }1 .1 3 : lo g g e r .e rr o r (" E x c e p tio n o c c u r re d " ); }1 .1 4 : fin a lly {1 .1 5 : if(s ta te m e n t != n u ll) { s ta te m e n t.c lo s e () ; }1 .1 6 : if(c o n n != n u ll) { c o n n .c lo s e ( ); }1 .1 7 : if(o d s != n u ll) { o d s .c lo s e () ; } }1 .1 8 : }S c e n a rio 2S c e n a r io 1Specification: “Connection creation => Connection rollback”Satisfied by Scenario 1 but not by Scenario 2But Scenario 2 has no defectc
  • 13. 13Simple association rules of the form “FCa => FCe” arenot expressiveRequires more general association rules (sequenceassociation rules) such as(FCc1 FCc2) Λ FCa => FCe1, whereFCc1 -> Connection conn = OracleDataSource.getConnection()FCc2 -> Statement stmt = Connection.createStatement()FCa -> stmt.executeUpdate()FCe1 -> conn.rollback()Example (cont.)
  • 14. 14Simple association rules of the form “FCa => FCe” arenot expressiveRequires more general association rules (sequenceassociation rules) such as(FCc1 FCc2) Λ FCa => FCe1, whereFCc1 -> Connection conn = OracleDataSource.getConnection()FCc2 -> Statement stmt = Connection.createStatement()FCa -> stmt.executeUpdate() //Triggering ActionFCe1 -> conn.rollback()Example (cont.)
  • 15. 15Simple association rules of the form “FCa => FCe” arenot expressiveRequires more general association rules (sequenceassociation rules) such as(FCc1 FCc2) Λ FCa => FCe1, whereFCc1 -> Connection conn = OracleDataSource.getConnection()FCc2 -> Statement stmt = Connection.createStatement()FCa -> stmt.executeUpdate()FCe1 -> conn.rollback() //Recovery ActionExample (cont.)
  • 16. 16Simple association rules of the form “FCa => FCe” arenot expressiveRequires more general association rules (sequenceassociation rules) such as(FCc1 FCc2) Λ FCa => FCe1, whereFCc1 -> Connection conn = OracleDataSource.getConnection()FCc2 -> Statement stmt = conn.createStatement() //ContextFCa -> stmt.executeUpdate()FCe1 -> conn.rollback()Example (cont.)
  • 17. 17CAR-Miner ApproachInputApplicationCheck whether there areany exception-relateddefectsClasses andFunctionsOpen Source Projects on webOpen Source Projects on web1 2 N……Exception-FlowGraphsStatic TracesSequenceAssociationRulesViolationsExtract classesand functionsreusedIssue queries and collect relevantcode examples. Eg: “lang:javajava.sql.Statement executeUpdate”Construct exception-flow graphsCollect static tracesMine static tracesDetect violations
  • 18. 18CAR-Miner ApproachInputApplicationClasses andFunctionsOpen Source Projects on webOpen Source Projects on web1 2 N……Exception-FlowGraphsStatic TracesSequenceAssociationRulesViolations
  • 19. Exception-Flow-Graph ConstructionBased on a previous algorithm [Sinha&Harrold TSE 00] : normal execution path----: exceptional execution path
  • 20. 20Exception-Flow-Graph ConstructionPrevent infeasible edges using a sound static analysis[Robillard&Murphy FSE 99]
  • 21. 21CAR-Miner ApproachInputApplicationClasses andMethodsOpen Source Projects on webOpen Source Projects on web1 2 N……Exception-FlowGraphsStatic TracesSequenceAssociationRulesViolations
  • 22. 22Static Trace GenerationCollect static traces with the actionstaken when exceptions occurA static trace for Node 7:“4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”
  • 23. 23Static Trace GenerationIncludes 3 sections:Normal function-call sequence (4-> 5 -> 6)Function call (7)Exceptionfunction-callsequence (15 ->16 -> 17)A static trace for Node 7: “4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”
  • 24. 24Trace Post-ProcessingIdentify and remove unrelated functioncalls using data dependency“4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”4: FileWriter fw = new FileWriter(“output.txt”)5: BufferedWriter bw = new BufferedWriter(fw)...7: Statement stmt = conn.createStatement()...Filtered sequence “6 -> 7 -> 15 -> 16“
  • 25. 25CAR-Miner ApproachInputApplicationClasses andMethodsOpen Source Projects on webOpen Source Projects on web1 2 N……Exception-FlowGraphsStatic TracesSequenceAssociationRulesViolations
  • 26. 26Static Trace MiningHandle traces of each function call (triggeringfunction call) individuallyInput: Two sequence databases with a one-to-onemapping•normal function-call sequences (context)•exception function-call sequences (recovery)Objective: Generate sequence association rules of theform(FCc1 ... FCcn) Λ FCa => FCe1 ... FCenContext Trigger Recovery
  • 27. 27Input: Two sequence databases with a one-to-one mappingMining Problem DefinitionObjective: To get association rules of the formFC1 FC2 ... FCm -> FE1 FE2 ... FEnwhere {FC1, FC2, ..., Fcm} Є SDB1 and {FE1, FE2, ..., Fen} Є SDB2Existing association rule mining algorithms cannot be directlyapplied on multiple sequence databasesContext Recovery
  • 28. 28Annotate the sequences to generate a single combined databaseMining Problem SolutionApply frequent subsequence mining algorithm [Wang and Han, ICDE 04]to get frequent sequencesTransform mined sequences into sequence association rulesRank rules based on the support assigned by frequentsubsequence mining algorithm(3 10) Λ FCa => (2 8)Context Trigger Recovery
  • 29. 29CAR-Miner ApproachInputApplicationClasses andMethodsOpen Source Projects on webOpen Source Projects on web1 2 N……Exception-FlowGraphsStatic TracesSequenceAssociationRulesViolations
  • 30. 30Violation DetectionAnalyze each call site of triggering call FCaStep 1: Extract context call sequence “CC1CC2 ... CCm” from the beginning of thefunction to the call site of FCaStep 2: If CC1 CC2 ... CCm is super-sequenceof FCc1 ... FCcnReport any missing function calls of {FCe1 ... FCen} inany exception pathAPI client: (CC1 CC2 ... CCm) Λ FCa => Missing any?isSuperSeqOfAPI Rule: (FCc1 ... FCcn) Λ FCa => FCe1 ... FCenContext Trigger Recovery
  • 31. 31EvaluationResearch Questions:1. Do the mined rules represent real rules?2. Do the detected violations represent realdefects?3. Does CAR-Miner perform better than WN-miner [Weimer and Necula, TACAS 05]?4. Do the sequence association rules helpdetect new defects?
  • 32. 32SubjectsInternal Info: classes and methods belonging to the appExternal Info: classes and methods used by the appCode examples: #files collected through code search engine
  • 33. 33RQ1: Real RulesReal rules: 55% (Total: 294)Usage patterns: 3%False positives: 43%Do the mined rules represent real rules?
  • 34. 34RQ1: Distribution of Real Rules for Axion#false positives is quite low between 1 to 60 rulesDistribution of rules based on ranks assigned by CAR-Miner
  • 35. 35RQ2: Detected ViolationsDo the detected violations represent real defects?Total number of defects: 160New defects not found by WN-Miner approach: 87
  • 36. 36RQ2: Status of Detected ViolationsHsqlDB developers responded on the first 10 reporteddefectsAccepted 7 defectsRejected 3 defectsReason given by HsqlDB developers for rejected defects:“Although it can throw exceptions in general, it should not throw withHsqlDB, So it is fine”
  • 37. 37RQ3: Comparison with WN-minerDoes CAR-Miner performs better than WN-miner?Found 224 new rules and missed 32 rulesCAR-Miner detected most of the rules mined by WN-minerTwo major factors:sequence association rulesIncrease in the data scope
  • 38. 38RQ4: New defects by sequence association rulesDetected 21 new real defects among all applicationsDo the sequence association rules detect new defects?
  • 39. 39AgendaMotivationMining Sequence Association Rules(CAR-Miner) [ICSE 09]Detecting Exception-Handling DefectsMining Alternative Patterns(Alattin) [ASE 09]Detecting Neglected Condition DefectsConclusion
  • 40. 4040Existing approaches produce a large number of falsepositivesOne major observation:Programmers often write code in different ways forachieving the same taskSome ways are more frequent than othersLarge Number of False PositivesFrequentwaysInfrequentwaysMined Patternsmine patterns detect violations
  • 41. 41Example: java.util.Iterator.next()PrintEntries1(ArrayList<string>entries){…Iterator it = entries.iterator();if(it.hasNext()) {string last = (string) it.next();}…}PrintEntries1(ArrayList<string>entries){…Iterator it = entries.iterator();if(it.hasNext()) {string last = (string) it.next();}…}Code Sample 1PrintEntries2(ArrayList<string>entries){…if(entries.size() > 0) {Iterator it = entries.iterator();string last = (string) it.next();}…}PrintEntries2(ArrayList<string>entries){…if(entries.size() > 0) {Iterator it = entries.iterator();string last = (string) it.next();}…}Code Example 2Code Sample 2Java.util.Iterator.next() throws NoSuchElementException when invoked on a listwithout any elements
  • 42. 42Example: java.util.Iterator.next()PrintEntries1(ArrayList<string>entries){…Iterator it = entries.iterator();if(it.hasNext()) {string last = (string) it.next();}…}PrintEntries1(ArrayList<string>entries){…Iterator it = entries.iterator();if(it.hasNext()) {string last = (string) it.next();}…}Code Sample 1PrintEntries2(ArrayList<string>entries){…if(entries.size() > 0) {Iterator it = entries.iterator();string last = (string) it.next();}…}PrintEntries2(ArrayList<string>entries){…if(entries.size() > 0) {Iterator it = entries.iterator();string last = (string) it.next();}…}Code Sample 21243 code examplesSample 1 (1218 / 1243)Sample 2 (6/1243)Mined Pattern from existing approaches:“boolean check on return of Iterator.hasNext before Iterator.next”
  • 43. 43Example: java.util.Iterator.next() Require more general patterns (alternative patterns): P1 or P2P1 : boolean check on return of Iterator.hasNext before Iterator.nextP2 : boolean check on return of ArrayList.size before Iterator.next Cannot be mined by existing approaches, since alternative P2PrintEntries1(ArrayList<string>entries){…Iterator it = entries.iterator();if(it.hasNext()) {string last = (string) it.next();}…}PrintEntries1(ArrayList<string>entries){…Iterator it = entries.iterator();if(it.hasNext()) {string last = (string) it.next();}…}Code Sample 1PrintEntries2(ArrayList<string>entries){…if(entries.size() > 0) {Iterator it = entries.iterator();string last = (string) it.next();}…}PrintEntries2(ArrayList<string>entries){…if(entries.size() > 0) {Iterator it = entries.iterator();string last = (string) it.next();}…}Code Sample 2
  • 44. 44Our Solution: ImMiner Algorithm Mines alternative patterns of the form P1 or P2 Based on the observation that infrequent alternatives such as P2 arefrequent among code examples that do not support P11243 code examplesSample 1 (1218 / 1243)Sample 2 (6/1243)P2 is frequent among codeexamples not supporting P1P2 is infrequent among entire1243 code examples
  • 45. 45Alternative PatternsImMiner mines three kinds of alternativepatterns of the general form “P1 or P2”Balanced: all alternatives (both P1 and P2) are frequentImbalanced: some alternatives (P1) are frequent andothers are infrequent (P2). Represented as “P1 or P^2”Single: only one alternative
  • 46. 46ImMiner AlgorithmUses frequent-itemset mining [Burdick et al. ICDE 01]iterativelyAn input database with the following APIsfor Iterator.next()Input database Mapping of IDs to APIs
  • 47. 47ImMiner Algorithm: Frequent AlternativesInput databaseFrequent itemsetmining(min_sup 0.5)Frequent item: 1P1: boolean-check on the return ofIterator.hasNext() before Iterator.next()
  • 48. 48ImMiner: Infrequent Alternatives of P1Positive database (PSD)Negative database (NSD)Split input database into two databases: Positive and NegativeMine patterns that are frequent in NSD and are infrequent in PSDReason: Only such patterns serve as alternatives for P1 Alternative Pattern : P2 “const check on the return of ArrayList.size()before Iterator.next()”Alattin applies ImMiner algorithm to detect neglected conditions
  • 49. 49Neglected ConditionsNeglected conditions refer toMissing conditions that check the arguments orreceiver of the API call before the API callMissing conditions that check the return orreceiver of the API call after the API callOne of the primary reasons for many fatalissuessecurity or buffer-overflow vulnerabilities [Chang etal. ISSTA 07]
  • 50. 50EvaluationResearch Questions:1. Do alternative patterns exist in realapplications?2. How high percentage of false positives arereduced (with low or no increase of falsenegatives) in detected violations?
  • 51. 51SubjectsTwo categories of subjects:3 Java default API libraries3 popular open source libraries#Samples: #code examples collected from Google code search
  • 52. 52RQ1: Balanced and Imbalanced PatternsHow high percentage of balanced and imbalanced patterns exist in realapps?Balanced patterns: 0% to 30% (average: 9.69%)Imbalanced patterns:30% to 100% (average: 65%) for Java default API libraries0% to 9.5% (average: 5%) for open source librariesExplanation: Java default API libraries provide more different ways ofwriting code compared to open source libraries
  • 53. 53RQ2: False Positives and False NegativesHow high % of false positives are reduced (with low or no increase offalse negatives)? Applied mined patterns (“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”) inthree modes:Existing mode:“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ” P1 ,P2, ... , PiBalanced mode:“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ” “P1 or P2 or ... orPi”Imbalanced mode:“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ” “P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”
  • 54. 54RQ2: False Positives and False NegativesApplication Existing Mode Balanced ModeDefects FalsePositivesDefects FalsePositives% ofreductionFalseNegativesJava Util 37 104 37 104 0 0JavaTransaction51 105 51 105 0 0Java SQL 56 143 56 90 37.06 0BCEL 2 14 2 8 42.86 0HSqlDB 1 0 1 0 0 0Hibernate 10 9 10 8 11.11 0AVERAGE/TOTAL15.17 0Existing Mode vs Balanced ModeBalanced mode reduced false positives by 15.17% withoutany increase in false negatives
  • 55. RQ2: False Positives and False NegativesApplication Existing Mode Imbalanced ModeDefects FalsePositivesDefects FalsePositives% ofreductionFalseNegativesJava Util 37 104 36 74 28.85 1JavaTransaction51 105 47 76 27.62 4Java SQL 56 143 53 81 43.36 3BCEL 2 14 2 6 57.04 0HSqlDB 1 0 1 0 0 0Hibernate 10 9 10 8 11.11 0AVERAGE/TOTAL28.01 8Existing Mode vs Imbalanced ModeImbalanced mode reduced false positives by 28% with quitesmall increase in false negatives55
  • 56. 56ConclusionProblem-driven methodology by identifying•new problems, patterns•mining algorithms, defectsCAR-Miner [ICSE 09]: mining sequence associationrules of the form(FCc1 ... FCcn) Λ FCa => (FCe1 ... Fcen)Context Trigger Recovery reduce false negativesAlattin [ASE 09]: mining alternative patterns classifiedinto three categories: balanced, imbalanced, and singleP1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j reduce false positives
  • 57. 57Other Selected Work on Mining SE DataAPI/Trace mining•MAPO: mining call sequences for code reuse [ECOOP 09]•MSeqGen: mining call seqs for test gen [ESEC/FSE 09]•MAM: mining API mapping for lang migration [ICSE 10]•Iterative mining of resource-releasing specs [ASE 11]•StackMine: mining callstack traces [ICSE 12]•INDICATOR: mining parameters dependency [WWW 13]Text mining•Mining bug reports@Cisco for security ones [MSR 10]•Mining bug reports+exec traces for duplicates [ICSE 08]•Mining API docs for defect detection [ASE 09, ICSE 12]•Mining requirements for policy extraction [FSE 12]T. Xie, S. Thummalapenta, D. Lo, and C. Liu. Data Mining for Software Engineering.IEEE Computer, August 2009.
  • 58. 58Thank YouQuestions?https://sites.google.com/site/asergrp/
  • 59. 59Alattin ApproachApplicationUnder AnalysisDetect neglectedconditionsClasses andmethodsOpen Source Projects on webOpen Source Projects on web1 2 N……PatternCandidatesAlternativePatternsViolationsExtract classesand methodsreusedPhase 1: Issue queries and collectrelevant code samples. Eg: “lang:javajava.util.Iterator next”Phase 2: Generatepattern candidatesPhase 3: Minealternative patternsPhase 4: Detect neglectedconditions statically

×