Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Machine Learning approach to predict Software Defects

77 views

Published on

A Machine Learning approach to predict Software Defects

Published in: Education
  • Be the first to comment

  • Be the first to like this

A Machine Learning approach to predict Software Defects

  1. 1. Escalation Prediction on Defects Database Dr. K. V. SubramaniamChetan Hireholi, 01FM14ESE006 GuideProject author
  2. 2. Problem statement  Determine what lead to Escalation by interpreting the Defects Corpus of the customer support cases  Alert on the Escalation based on the nature of the Defects, correlate the Escalations on defects discovered by the customers and find the trigger point which leads to one such Escalation
  3. 3. Data Source Incident Database CRs Database  The Incident Database: Contained the Customer Support cases.  The CRs Database: Internally used database which details the cases which were Change Requests
  4. 4. Data Cleansing The data in the Incidents and CRs Database had a lot of discrepancy (Ex. Rows not in order, special characters in the Date Field, Multiple discrepancy in the company names viz. Boeing, Boeing Inc.,) Tools such as OpenRefine & Microsoft Excel helped in removing such discrepancies. 3831 125 329 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Green Red Yellow Total Green Red Yellow Incident Database
  5. 5. Understanding the workflow
  6. 6. Algorithms 20.779 70.22 J48 Decision Tree Correctly Classified Incorrectly classified 1. J 48 Decision Tree: 2. Naïve Bayes (RED & YELLOW corpus): Attributes selected: Escalation, Expectation, Modules, Severity. Motivation to do Textual Analysis: The discussion between the client and the developer is captured in the ‘Comments’ attribute in the Incidents Database. By analyzing this can unearth additional info about the defects (viz. what triggered the escalation?, initial escalation of a defect, nature of the client, etc.). This lead to the use of R to do Text Mining a. Attributes selected: Escalation, Expectation, Modules, Severity. b. Probability distribution for: i. RED Escalation: 0.242 (24.2%) ii. YELLOW Escalation: 0.758 (75.8%) iii. When Escalation is RED, then it is more likely that the Severity is URGENT, with its probability distribution: 0.449 (44.9%) iv. When Escalation is YELLOW, then it is more likely that the Severity is HIGH, with its probability distribution: 0.634 (63.4%) 3. Simple K Means method: a. Cluster 1 formed: YELLOW, Investigate Issue & Hotfix required, Installation, High b. Cluster 2 formed: RED, Investigate Issue, Installation, High
  7. 7. Text Mining using R Why R over NLTK (Python)? Easy to code, abundant packages Faster Pre Processing of the text Mining the E- mail dump Create Corpus (RED, YELLOW & GREEN) Pre Processing of the Text (Removing punctuations, Stop words, Numbers, Noise) Apply ‘tm’ package for Text Mining the Corpus Extract Graphs, Word Clouds of the trigger points which are causing Escalations
  8. 8. Results from Text mining
  9. 9. Final escalation state= GREEN; Observations made prior to RED Most frequently usedThe affected module
  10. 10. Final escalation state= GREEN; Observations made prior to YELLOW Aiding words / Prefix- Postfix Most frequently used Words with highest frequency mined
  11. 11. Final escalation state= YELLOW; Observations made prior to RED (only 4 cases) Developer who is associated with the bug/incident
  12. 12. Final escalation state= RED; Observations made prior to RED (Incidents jumped to RED from YELLOW state) Most frequently usedThe affected module
  13. 13. Observations made on RED corpus (The whole RED escalated dump) The term “escalation” used along with “please” and “support” indicates that the escalation is RED or it will get converted to RED
  14. 14. Observations made on GREEN corpus (The whole GREEN escalated dump) The use of “Please” is not frequent; which in turn indicates- there are no much RED escalations happening in the incident history Escalation count on the defect dump 3831 125 329 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Green Red Yellow Total Green Red Yellow
  15. 15. Other observations made on Incidents  For RED cases:  (Where SEVERITY is URGENT) The Average number of days for a case to get escalated = 13.56 days  (Where SEVERITY is HIGH) The Average number of days for a case to get escalated= 25.29 days  (Where SEVERITY is MEDIUM) The Average number of days for a case to get escalated= 19.66 days
  16. 16. Analyzing Incidents: Customers vs Escalations RHEINENERGIE, HEWLETT PACKARD, DEUTSCHE BUNDESBANK: Highest number of RED escalations 4 33 222222222222222 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 RHEINENERGIE VR-LEASINGAG SWIFTINC EURIWARES.A. CTCTechnology INTESASANPAOLOS.P.A. FASTWEBS.P.A. THEBOEINGCOMPANY TOYOTA TELECOMITALIASPA USCensus TATACONSULTANCYSERVICESLTD THECAPITALGROUPCOMPANIESINC Walmart HPCMS STADTKÖLN TycoElectronics;CITEC;GOVERNMENT… SIACNYSEGROUP BANKOFINDIA RockwellCollins ITCBANGALOREDATACENTRE WELLSFARGO NTTDATA PostNordic HEWLETT-PACKARDGMBH PepsiCo. FOXTELEVISIONSTATIONSINC PACIFICORP T-SYSTEMSINTERNATIONALGMBH NTTWest/HPPSO BOEHRINGERINGELHEIM MEDCOHEALTH BANGALOREELECTRICITYSUPPLY McKesson NTTDATA Total Total
  17. 17.  Total RED escalations: 125/6433; The below shows the highest number of escalations on modules Ops - Action Agent (opcacta) & Installation: Highest number of RED escalations 31 12 9 9 8 6 5 5 4 4 4 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 0 5 10 15 20 25 30 35 Total Total Analyzing Incidents: Modules vs Escalations
  18. 18. 28 12 12 11 11 10 9 7 6 6 6 3 3 1 0 5 10 15 20 25 30 8.6 11.14 11.02 11.03 11 11.11 11.13 8.60.501 11.12 11.04 11.01 11.1 unknown 8.53 patch Count of ESCALATION Analyzing Incidents: S/w release vs Escalations Row Labels Count of ESCALATION 8.6 28 11.14 12 11.02 12 11.03 11 11 11 11.11 10 11.13 9 8.60.501 7 11.12 6 11.04 6 11.01 6 11.1 3 unknown 3 8.53 patch 1 Grand Total 125
  19. 19. Analyzing Incidents: OS vs Escalations 83 4 3 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 0 10 20 30 40 50 60 70 80 90 Red Red
  20. 20. Analyzing Incidents: Developer vs Escalations prasad.m.k_hp.com: Handled high number of escalations 29 15 10 10 8 8 5 5 5 5 3 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 0 5 10 15 20 25 30 35 Total Total
  21. 21. Analyzing CR data 10219 75 93 0 2000 4000 6000 8000 10000 12000 Total Escalations in CR N Showstopper Y N Showstopper Y Grand Total Count of ESCALATION 10219 75 93 10387 Note: For Defects or CRs (QCCR) , Showstopper would be marked for the defects which are must fixes or immediate fix is needed for a release
  22. 22. Analyzing CRs: Customers vs Escalations TATA CONSULTANCY SERVICES LTD: Highest ”Showstopper” escalations Allegis, NORTHROP GRUMMAN,PepperWeed: Highest escalations 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0.5 1 1.5 2 2.5 Showstopper Y
  23. 23. Analyzing CRs: Modules vs Escalations Ops - Monitor Agent (opcmona) & Installation: Highest ”Showstopper” escalations Installation & Lcore – Other: Highest escalations 17 3 5 4 1 2 1 2 23 4 1 1 2 2 4 1 1 1 31 20 10 5 3 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0 5 10 15 20 25 30 35 Showstopper Y
  24. 24. Analyzing CRs: S/w release vs Escalations 20 5 1 1 4 1 2 10 5 12 5 63 14 6 4 1 1 1 1 0 10 20 30 40 50 60 70 Showstopper Y Release 11 : Highest number of ”Showstopper” and ”Y” escalations
  25. 25. Analyzing CRs: OS vs Escalations Windows (Version number not clear): Highest number of Escalations Both “Showstopper” and “Y” 4 4 4 3 3 2 1 9 1 2 1 1 3 2 1 1 1 0 1 2 3 4 5 6 7 8 9 10 Showstopper Y Note: Submitter of CRs tend to choose the OS fields as they want to. Some choose the exact versions where the issue was seen or reported or some choose just at a high level. No strict rules observed
  26. 26. Analyzing CRs: Developer vs Escalations swati.sinha_hp.com: Handled highest number of Showstopper Escalations umesh.sharoff_hp.com : Handled highest number of Escalations 1 1 1 1 9 2 4 1 3 1 1 2 2 6 1 3 3 2 8 1 1 1 1 2 1 1 1 1 1 6 2 2 1 1 8 7 6 6 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 10 Showstopper Y
  27. 27. Company behavior analysis: RHEINENERGIE (Had maximum RED escalations) 28 incident cases Patterns observed: ◦ 6 RED escalation ◦ Mostly contains RED escalations (6/28); 21.28% chance that an incident logged in will be a RED escalation ◦ Most reported module: ◦ Ops - Monitor Agent (opcmona) (7 nos.) ; 3 of them were RED escalated ◦ Installation (6 nos.) ◦ Perf – Collector (3 nos.) ◦ Average number of days a single incident handled: 73.5 days ◦ Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs; ◦ All the 6 RED escalations moved to CR; ◦ 8 GREEN escalations moved to CR; ◦ 1 YELLOW escalations moved to CR;
  28. 28. Company behavior analysis: APPLE INC 27 incident cases Patterns observed: No RED escalations ever Mostly contains GREEN escalations (19/27); 70.37% chance that an incident logged in will be a GREEN escalation Most reported modules: ◦ Ops Monitor Agent (4 nos) ◦ Perf Collector (3 nos) ◦ Installation, Ops- Action Agent, Ops- Ops Agent, Perf Other (2 nos each) Average number of days a single incident handled: 463.777 days Number of incidents which move to CR: 10; 37.03% of the incidents move to CRs
  29. 29. Company behavior analysis: BOEING 33 incident cases Patterns observed: 1 RED escalation Mostly contains GREEN escalations (31/33); 93.93% chance that an incident logged in will be a GREEN escalation Most reported module: ◦ Installation (7 nos.) ◦ Perf Collector, Other (5 nos.) ◦ Perf GlancePlus (4 nos.) ◦ Perf ARM (RED escalation); 3% chance that it will be an RED escalation Average number of days a single incident handled: 399.322 days Number of incidents which move to CR: 22; 66.66% of the incidents move to CRs
  30. 30. Other observations made on Incidents  DIFFERENCE_INITIAL_CLOSED and DAYS_SUPPORT_TO_CPE are not matching -400 -300 -200 -100 0 100 200 300 400 500 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100103106109112115118121 DIFFERENCE_INITIAL_CLOSED DAYS_SUPPORT_TO_CPE

×