• Like
  • Save
Dissertation Defense
Upcoming SlideShare
Loading in...5
×
 

Dissertation Defense

on

  • 6,823 views

Adaptive Bug Prediction By Analyzing Software History

Adaptive Bug Prediction By Analyzing Software History

Statistics

Views

Total Views
6,823
Views on SlideShare
5,904
Embed Views
919

Actions

Likes
4
Downloads
0
Comments
0

8 Embeds 919

http://sestory.tistory.com 477
http://dicawoo.com 209
http://www.se.or.kr 202
http://www.slideshare.net 15
http://www.hanrss.com 8
http://kwwoo75.cafe24.com 4
http://neohckim.blogspot.com 3
http://kwwoo.textcube.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Dissertation Defense Dissertation Defense Presentation Transcript

  • Adaptive Bug Prediction By Analyzing Project History Dissertation Defense, Aug 21, 2006 Sunghun Kim <hunkim@cs.ucsc.edu> University of California, Santa Cruz
  •  
  •  
  •  
  •  
  • Motivation
    • Finding bugs is difficult
      • Presenting bug prediction algorithms
    • Project history (SCM) is available
      • Leveraging project history for bug prediction
    • Each software project has unique bug properties
      • Proposing adaptive bug prediction approaches rather than static
  • Motivation - Finding bugs is difficult
    • “ Software bugs are costing the US economy an estimated $59.6 billions each year ” – US federal study by RTI (2002)
    • “ 50% of development time is typically spent on testing and debugging ” – (Beizer 1990)
    • “ Debugging, testing, and verification costs 50%-70% of the total software development cost ” – (Hailpern and Santhanam 2002)
  • Motivation - Finding bugs is difficult
    • Seven debugging steps (Zeller 2006)
      • Track the problem
      • Reproduce the failure
      • Automate and simplify the test case
      • Find possible infection origins
      • Focus on the most likely origins
      • Isolate the infection chain
      • Correct the defect (bug)
  • Motivation - Finding bugs is difficult
    • Seven debugging steps (Zeller 2006)
      • Track the problem
      • Reproduce the failure
      • Automate and simplify the test case
      • Find possible infection origins
      • Focus on the most likely origins
      • Isolate the infection chain
      • Correct the defect (bug)
  • Motivation - Project history is available
    • Software projects today record their development history using Software Configuration Management (SCM) tools
    • SCM records bug occurrence and the corresponding bug fix experiences
    • SCM data is not fully leveraged for bug prediction
  • Motivation – Unique bug properties
    • A bug prediction model cannot be generalized
    • &quot; Predictors are accurate only when obtained from the same or similar projects . &quot; (Nagappan, Ball et al. 2006)
    • Horizontal and vertical bugs (Kim, Pan et al. 2006)
    if ( bar==null ) { System.out.println(bar.foo); } JEditTextArea.java at transaction 86 - setSelectedText(&quot; &quot;); + insertTab(); JEditTextArea.java at transaction 114 - setSelectedText(&quot; &quot;); + insertTab(); A horizontal bug A vertical bug
  • “ Adaptive bug predictors by leveraging software history”
    • Bug Cache
    • Change Classification
  • Talk Overview
    • Terminology
    • Creating corpus (training and test data sets)
    • Bug cache/Change classification
      • Basic idea
      • Hypothesis/Goals
      • Algorithm
      • Evaluation methods
      • Experiment results
      • Summary and future work
    • Related works
    • Conclusions and contributions
  • Terminology – What is a bug (Zeller 2006)?
    • This pointer, being null, is a bug
      • An incorrect program state
    • This software crashes; this is a bug
      • An incorrect program execution
    • This line 11 is buggy
      • An incorrect program code
    if (bar==null) { System.out.println(bar.foo); } if (bar!=null) { System.out.println(bar.foo); } A bug A fix
  • Terminology
    • File change (change) : Software development proceeds by changing files that are usually stored in an SCM system. A file change is an instance of a file modification stored in an SCM system.
    • Version : During software evolution, files are changing and have many instances. The term version is used to identify different instances of the same file after a change. For example, ‘ foo ’ file version 1 and ‘ foo ’ file version 2 indicate two different instances of the ‘ foo ’ file after changes.
    • Commit : Submitting changes to an SCM system is defined as a commit. A commit often includes more than one file change. Developers usually write a change log message when they commit.
    • Change log : When developers perform a commit, they write a brief message that describes the purpose of the change, including which files are being modified and why. Each commit has a change log message, and the complete set of change log messages is known as a files’ change log. Change log messages can be analyzed to characterize the type of change that was made, such as a bug fix change.
    • Revision (Transaction) : Groups of file changes in one commit are called a revision or transaction. Usually revisions are in chronological order. For example, revision 1 is a prior change group to revision 2. Some SCM systems such as CVS use the term ‘revision’ is used to present a version of files. For example, each changed file in CVS has different revisions. The term ‘revision’ is used to present a group of changes in one commit in other SCM systems such as Subversion. In this dissertation, a revision or a transaction is a group of file changes in one commit.
    • Bug (defect) : A bug is zero or more lines of source code whose inclusion (or omission) causes anomalous software behavior. The term defect and fault are also widely used to describe anomalous software behavior [20, 66]. The terms bug, defect, and fault have slightly different meaning when they are used in different domains or used by different researchers. In this dissertation, the term bug describes the actual source lines that cause anomalous software behavior. The following sentence shows the context and meaning of the term bug used in this dissertation:
    • “ A bug in line #39 in Foo.c causes this application to hang”
  • Terminology
    • Fix : When developers locate bugs, they are required to repair the bug to remove the anomalous software behavior. Developers remove buggy source code and replace it with correct code. The act of removing bug(s) is called a fix, typically enacted by changing buggy source code.
    • Fix change : A change that includes more than one fix is called a fix change. Since a fix change removes buggy code and replaces it with correct code, the removed code is considered to be buggy code.
    • Bug introducing change (buggy change) : The change that initially introduces a bug is called a bug introducing change.
    • Change delta: In a file change, the changed lines are called change delta. The deleted lines in the old file are called deleted delta, and added lines in the new file are called added delta.
    • Hunk : In a file change, contiguous line changes in a single are called a hunk, as shown in Figure 2‑3. A file change delta may include more than one hunk, since the changed lines in a delta may sparse.
    • Bug hunk : If a change is a fix, the deleted or modified part of source code in the old file is considered buggy and is called a bug hunk.
    • Fix hunk : If a change is a fix, the added or modified part of source code in the new file is considered fix and is called a fix hunk.
    • Feature (factor) : In this dissertation, a feature means a property of a software change and is used for classifying changes. For example, the author, commit date, and a keyword in the deleted delta are features of changes. In the software engineering literature, the term feature usually means a distinct software functionality, but in the machine learning literature, a feature is a factor or properties of instances in a training or testing corpus [2].
    • Feature engineering : A collection of techniques to select good (predictive) features and extract them from change instances.
  • Commits, Transactions & Configurations transactions configurations CVS file commits Added feature X Fixed null ptr bug Modified button text Added feature Y log message
  • Kenyon Processing SCM Repository Filesystem Extract Automated configuration extraction Save Persist gathered metrics & facts Kenyon Repository (RDBMS/ Hibernate) Analyze Query DB, add new facts Analysis Software (e.g., IVA) Compute Fact extraction (metrics, static analysis) Kenyon
  • Creating Corpus - Retrieving bug fix changes
    • As developers make changes, they record a reason along with the change (in the change log message)
    • When developers fix a bug in the software, they tend to record log messages with some variation of the words “fixed” or “bug”
      • “ Fixed null pointer bug”
    • It is possible to mine the change history of a software project to uncover these bug-fix changes
    • That is, we retrospectively recover those changes that developers have marked as containing a bug fix
      • We assume they are not lying
    • This heuristic is widely used in the literature (Mockus and Votta 2000; Cubranic and Murphy 2003; Fischer, Pinzger et al. 2003)
  • Creating Corpus – Identifying bug-introducing changes Development history of foo.java SCM log message: “Bug #567 fixed” “ bug-fix ” Bug #567 entered into issue tracking system (bug finally observed and recorded)
  • Creating Corpus – Identifying bug-introducing changes Bug #567 entered into issue tracking system (bug finally observed and recorded) Software change that introduces the bug “ buggy ” … . … . … . … . … . … . Bug-fix change
  • Creating Corpus – Identifying bug-introducing changes buggy Foo rev 1 Foo rev 2 Foo rev 3 Foo rev 4 Foo rev 5 Foo rev 6
  • Creating Corpus – Identifying bug-introducing changes buggy Foo rev 1 Foo rev 2 Foo rev 3 Foo rev 4 Foo rev 5 Foo rev 6 clean
  • Creating Corpus – Analyzed projects SVN SCM C Subversion (SVN) SVN Bug tracker Java Scarab (SCA) CVS DBMS C/C++ PostgreSQL (POS) SVN Content management Python Plone (PLO) CVS Java Development/IDE Java Eclipse JDT (ECL) CVS Web browser C/C++/JS/XML Mozilla (MOZ) CVS Editor Java Jedit (JED) CVS Collaborative development PHP GForge (GFO) CVS Instance messenger C/C++ Gaim (GAI) CVS Mail client Java Columba (COL) CVS Bug tracker Perl Bugzilla (BUG) SVN HTTP server C Apache HTTP 1.3 (A1) SCM Software type Language Project
  • Bug Cache
  • Basic idea
    • Setup a cache (a list)
    • Put software entity (file/function/method) names that likely have future bugs into the cache
    • The entity list in the cache is used for bug prediction
    4 5 6 7 8 9 1 2 3 Entire software 1 9 consult Bug cache
  • Hypothesis
    • Bug occurrences are local not random
    • A small size of cache (about 10%) can predict most of future bugs (about 70%)
  • Goal: Reduce Costs, Improve Quality
    • If we can predict the location of bugs before they occur:
      • Inspect only a part (10%) of software
      • Focus additional software QA efforts on those locations
        • Software inspections, additional test cases
      • Inform developers when they are working on “risky” software
        • Developer can create additional unit test cases
        • Reflect on common causes of error
  • Bug Localities
    • Temporal
      • If an entity introduced a bug recently, it will tend to introduce another bug soon
    • Changed entity
      • If an entity has changed recently, it will tend to introduce bugs soon
    • New Entity
      • If a new entity has been added recently, it will introduce bugs soon
    • Spatial
      • If an entity introduced a bug recently, “nearby” (in the sense of logical coupling) entities will tend to introduce new bugs soon
    • Update bug cache based on the bug localities
  • Bug Cache Operation/Evaluation
    • Initial pre-load bug cache
      • With largest entities (LOC) in initial project revision, creating the initial bug cache state
    • Compute hits and misses
      • Observe the locations of bug-introducing changes at revision n .
      • If a bug occurs in an entity that is currently in the bug cache (revision 1 to n-1), it is a cache hit. Otherwise it is a miss.
      • A cache hit can be viewed as a successful prediction of the bug.
  • Bug Cache Operation/Evaluation
    • For a cache miss
      • Fetch the entity (temporal locality) and nearby entities (spatial locality) into the cache
    • Pre-fetch
      • Add to cache entities that between revisions n-1 and n have been:
        • Created (new entity locality)
        • Modified (changed-entity)
    • Cache replacement
      • Use a cache replacement policy to make room for new entries
  • Cache Replacement Policies
    • Least recently used (LRU)
      • Unloads least recently found bug (hit)
      • Intuition: bugs have strong temporal locality
    • Change count weighted (CHANGE)
      • Unloads entity with least number of total changes
      • Intuition: many changes yield lots of bugs
    • Bug count weighted (BUG)
      • Unloads entity with fewest number of total bugs
      • Intuition: a file/entity with many bugs will continue to have bugs
  • (1) check/miss (2) fetch Cache Operation/Evaluation 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 Bug cache (1) check/miss (2) fetch 1 (1) check/miss (2) fetch 9 Bug cache 1 9 (1) check/ hit! (1) check/miss (2) fetch 2 (1) check/ hit! 1 2 9 (1) check/ hit! (3) replace 3 1 2 3 Bug cache: Bug prone entity list Bug-introducing change
  • Results - File Level Predictive accuracy at file level, 10% cache size
  • Results - Function/Method Level Predictive accuracy at function/method level, 10% cache size
  • 10 % cached entity LOC/total LOC About 16~33% of LOC
  • Various Cache Size and Hit Rates (no pre-fetch)
  • Cache Replacement File level: LRU or BUG Function/method level: mostly BUG
  • Summary and Future Work
    • 10% of cache size covers
      • 73-95% of future bugs in file level
      • 49-68% of future bugs in the function/method level
    • Future work
      • Simultaneous different granularity level cache
      • IDE integration
  • Future Work – Example IDE integration
  • Change Classification
  • When we introduce bugs? Foo rev 1 Foo rev 2 Foo rev 3 Foo rev 4 ……
  • When we introduce bugs? Foo rev 1 Foo rev 2 Foo rev 3 Foo rev 4 ……
  • Basic Idea Foo rev 1 Foo rev 2 Foo rev 3 Foo rev 4 Foo rev n Foo rev n-1 …… ?
  • Hypothesis
    • Each file change is classifiable as buggy or clean
      • Clean changes and buggy changes have their own unique properties (features)
      • Machine learning algorithms can be used to
        • learn unique features from changes
        • classify changes
  • Goal: Reduce Costs, Improve Quality
    • If we can predict whether a given change has a bug:
      • Focus additional software quality assurance (QA) efforts on those changes
        • Software inspections, additional test cases
      • Inform developers as they check-in their code that they have likely just made an error
      • Inform developers even during an edit session
  • Change Classification
    • Extract changes from SCM and label each change as buggy or clean (creating corpus)
    • Extract features from each change
    • Train a classifier using machine learning algorithms and labeled data set
      • Naïve Bayes
      • Support Vector Machine (SVM)
    • Classify unknown changes (test set) and evaluate the classifier
  • Black-box View of Machine Learning
    • Supervised machine learning
    • Input : Training data set (instances)
      • Label
      • Set of features ( F)
    • Output : predicted label using set of features
    Train using I={label, F } Unknown F Predict label
  • Male/Female Example M/F height weight male 6.5 160 female 5.6 120 female 5.8 130 male 6.0 170 label feature
  • Male/Female Example M/F height weight male 6.5 160 female 5.6 120 female 5.8 130 male 6.0 170 label feature 6.1 150 ?
  • Male/Female Example M/F height weight male 6.5 160 female 5.6 120 female 5.8 130 male 6.0 170 label feature 6.1 150 ? height female male weight
  • Male/Female Example M/F height weight male 6.5 160 female 5.6 120 female 5.8 130 male 6.0 170 label feature 6.1 150 ? height female male weight
  • Male/Female Example M/F height weight male 6.5 160 female 5.6 161 female 5.8 130 male 6.0 170 label feature 6.1 165 ? height female male weight ?
  • Extracting Features Author : hunkim Check-in time : March 23, 2006 11:30 AM Log message : Added foo function … . … . … . … . … . … . Rev 10 Rev 11 /src/jdt/core/ast/ASTParser.java
  • Extracting Features
    • Use everything except for the kitchen sink
      • Every word in the program text (rev. n ) and delta (change between n and n+1 )
        • Variables, program keywords, numbers, operands
        • Bag of words
      • File names/directory names
      • Complexity metrics
        • All that Understand for C/Java can compute
      • All words in SCM change logs
      • Change metadata
        • Author, date/time of change
  • Projects Analyzed 150,940 33.2* 3,518 8,272 N/A N/A Total 14,856 13.0 288 1,925 01/2002-03/2002 500-1000 SVN 5,710 50.5 366 358 06/2001-08/2001 500-1000 SCA 23,247 24.2 273 853 11/1996-02/1997 500-1000 POS 6,127 19.6 112 457 07/2002-02/2003 500-1000 PLO 16,192 10.1 67 592 10/2001-11/2001 500-750 ECL 13,648 29.9 169 395 08/2003-08/2004 500-1000 MOZ 13,879 37.5 377 626 08/2002-03/2003 500-750 JED 8,996 49.6 334 339 01/2003-03/2004 500-1000 GFO 9,281 37.8 451 742 08/2000-03/2001 500-1000 GAI 17,411 29.4 530 1,270 05/2003-09/2003 500-1000 COL 10,148 73.7 417 149 03/2000-08/2001 500-1000 BUG 11,445 23.6 134 566 10/1996-01/1997 500-1000 A1 # of features % of buggy changes # of buggy changes # of clean changes Period Revisions. Project
  • Extracting Features
  • Training Classifiers
    • Use common machine learning techniques
      • Naïve Bayes
      • Support Vector Machine (SVM)
      • Implementations from the WEKA toolkit
        • A standard toolkit containing reasonable implementations of many machine learning techniques
  • 10-fold-cross validation 1 2 3 4 5 6 7 8 9 10 Corpus
  • 10-fold-cross validation 1 2 3 4 5 6 7 8 9 10 T est set Training set
  • 10-fold-cross validation 1 2 3 4 5 6 7 8 9 10 T est set Training set
  • 10-fold-cross validation 1 2 3 4 5 6 7 8 9 10 T est set Training set
  • 10-fold-cross validation 1 2 3 4 5 6 7 8 9 10 T est set Training set
  • Evaluation
    • 4 possible outcomes from using a classifier
      • Classifying a buggy change as buggy ( n b->b )
      • Classifying a buggy change as clean ( n b->c )
      • Classifying a clean change as clean ( n c->c )
      • Classifying a clean change as buggy ( n c->b )
    • Accuracy:
    • Recall: , Precision:
    n b->b + n b->c + n c->c + n c->b n b->b + n c->c n b->b + n b->c n b->b n b->b + n c->b n b->b
  • Bug Predictive Accuracy
    • Mostly in the 70’s, with 3 better, and 1 worse
    • Recalls/Precisions are generally 50~60’s
  • Eclipse/Mozilla Accuracy by Feature Combination A - Added delta D - Deleted delta F - Directory/file name L - Change log N - New source code M - Metadata C - Complexity metrics ~X - all but X
  • Average Accuracy by Feature Combination A - Added delta D - Deleted delta F - Directory/file name L - Change log N - New source code M - Metadata C - Complexity metrics ~X - all but X
  • Best Accuracy Feature Combination A - Added delta D - Deleted delta F - Directory/file name L - Change log N - New source code M - Metadata C - Complexity metrics ~X - all but X
  • Important Features for Some Projects
  • Important Features for Some Projects
  • Why ‘author’ is not a significant feature? One revision (author) can have buggy and clean changes at the same time
  • Summary
    • Use machine learning techniques to analyze changes to a software system
    • After training, can predict whether a new change has a bug, or doesn’t have a bug
    • Accuracy is typically in the 70%’s
    • Recall/Precision is around 50~60%’s
  • Summary
    • Use machine learning techniques to analyze changes to a software system
    • After training, can predict whether a new change has a bug, or doesn’t have a bug
    • Accuracy is typically in the 70%’s
    • Recall is around 50~60%’s
    • Best features for classification vary by project
    • Classification accuracy/recall varies by machine learning algorithm
    • No cross-project model
  • Future Works
    • IDE Integration
      • Monitoring changes and alert developers when the changed part was identified as buggy change
    • SCM integration
      • After committing a change, send feedback to developers
    • Online learning
      • Currently it is a batch learning
      • Need to be online learning for practical use
    • Mining on important features
      • Find human comprehensible bug patterns
  • Related Work
    • Bug Prediction
      • Identifying problematic entities
      • Predicting fault density
      • Leveraging project history
    • Source code classification, clustering, and association
      • Traceability
    • Text classification
  • Related Work – Bug Prediction
    • Identifying problematic entities
      • (Gyimothy, Ferenc et al. 2005) predict fault classes
        • Mozilla project in each release (releases 1.0~1.6)
        • Using regression and machine learning algorithms
        • Their recall and precision are about 70%
        • They are release based and the class level prediction
      • Top ten list (Hassan and Holt 2005)
      • Top 20% of problematic files (Ostrand, Weyuker et al. 2004)
        • Both approaches have lower accuracy
        • Use one factor at a time while bug cache uses various factors at the same time
  • Related Work – Bug Prediction
    • Predicting fault density
      • Predict fault density of each file or module
      • Software module fault density prediction (Graves, Karr et al. 2000)
        • using a weighted time damp model
      • Change type based prediction (Mockus and Weiss 2002)
      • Relative code change measurements (Nagappan and Ball 2005)
      • And many others….
    • Leveraging project history
      • Hipikat: building project memory (Cubranic, Murphy et al. 2005)
      • Project histories to improve existing bug finding tools (Williams and Hollingsworth 2005)
      • Mining history to discover common method usage patterns (Livshits and Zimmermann 2005)
  • Related Work – Code classification and association
    • Classifying software projects (Krovetz, Ugurel et al. 2003)
      • Into broad functional categories
      • Communications, databases, games, and math
      • We are classifying source code as buggy and clean
    • Associating source code with other artifacts (Marcus and Maletic 2003; Kuhn, Ducasse et al. 2005)
      • Source code and design documents
  • Related Work – Text classification
    • A well-studied area with a long research history (Sebastiani 2002)
    • Spam filtering (Zhang, Zhu et al. 2004)
    • Change classification borrows ideas from text classification
    • Feature engineering techniques are very different
      • plain text (or email) vs. source code with metadata
  • Threats to Validity
    • Systems examined might not be representative
      • Only analyzed 12 projects
    • Systems are all open source
      • Commercial products with stronger deadline pressure could lead different buggy change pattern
    • Bug fix, bug-introducing data is incomplete
      • Change logs are incomplete and ambiguous
      • Too many bug ids in change logs for Bugzilla and Scarab
      • Fix, bug introducing algorithms have limitations (Śliwerski, Zimmermann et al. 2005; Kim, Zimmermann et al. 2006)
  • Conclusion
    • It is now possible:
      • To classify files as bug-prone or not-bug-prone with high accuracy
      • To classify individual file changes as bug-likely or not-bug-likely with very good accuracy
  • Contributions
    • Presented two adaptive bug prediction approaches
      • Bug cache and change classification
    • Leveraged bug-introducing changes
    • Discovered properties of bug occurrences – localities
    • Presented feature engineering techniques for software changes
    • Identified important features for change classification
    • Showed project history is a good data source for bug prediction
  • References
    • Cubranic, D. and G. C. Murphy (2003). Hipikat: Recommending pertinent software development artifacts . 25th International Conference on Software Engineering (ICSE 2003), Portland, Oregon.
    • Cubranic, D., G. C. Murphy, et al. (2005). &quot;Hipikat: A Project Memory for Software Development.&quot; IEEE Trans. Software Engineering 31 (6): 446-465.
    • Fischer, M., M. Pinzger, et al. (2003). Populating a Release History Database from Version Control and Bug Tracking Systems . 19th International Conference on Software Maintenance (ICSM 2003), Amsterdam, The Netherlands.
    • Graves, T. L., A. F. Karr, et al. (2000). &quot;Predicting Fault Incidence Using Software Change History.&quot; IEEE Transactions on Software Engineering 26 (7): 653-661.
    • Gyimothy, T., R. Ferenc, et al. (2005). &quot;Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction.&quot; IEEE Transactions on Software Engneering 31 (10): 897- 910.
    • Hassan, A. E. and R. C. Holt (2005). The Top Ten List: Dynamic Fault Prediction . 21st International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary.
    • Kim, S., K. Pan, et al. (2006). Memories of Bug Fixes . the 2006 ACM SIGSOFT Foundations of Software Engineering (FSE 2006), Portland, Oregon.
  • References
    • Kim, S., T. Zimmermann, et al. (2006). Automatic Identification of Bug Introducing Changes . 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan.
    • Krovetz, R., S. Ugurel, et al. (2003). Classification of Source Code Archives . the 26th International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada.
    • Kuhn, A., S. Ducasse, et al. (2005). Enriching Reverse Engineering with Semantic Clustering . 12th Working Conference on Reverse Engineering (WCRE 2005), Pittsburgh, Pennsylvania, USA.
    • Livshits, B. and T. Zimmermann (2005). DynaMine: Finding Common Error Patterns by Mining Software Revision Histories . the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal.
    • Marcus, A. and J. I. Maletic (2003). Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing . the 25th International Conference on Software Engineering, Portland, Oregon.
    • Mockus, A. and L. G. Votta (2000). Identifying Reasons for Software Changes Using Historic Databases . 16th International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA.
  • References
    • Mockus, A. and D. M. Weiss (2002). &quot;Predicting Risk of Software Changes.&quot; Bell Labs Technical Journal 5 (2): 169-180.
    • Nagappan, N. and T. Ball (2005). Use of Relative Code Churn Measures to Predict System Defect Density . 27th International Conference on Software Engineering (ICSE 2005), Saint Louis, Missouri, USA.
    • Nagappan, N., T. Ball, et al. (2006). Mining Metrics to Predict Component Failures . 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China.
    • Ostrand, T. J., E. J. Weyuker, et al. (2004). Where the Bugs Are . 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, Boston, Massachusetts, USA.
    • Sebastiani, F. (2002). &quot;Machine Learning in Automated Text Categorization.&quot; ACM Computing Surveys 34 (1): 1-47.
    • Śliwerski, J., T. Zimmermann, et al. (2005). When Do Changes Induce Fixes? Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA.
    • Williams, C. C. and J. K. Hollingsworth (2005). &quot;Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques.&quot; IEEE Trans. Software Engineering 31 (6): 466-480.
    • Zeller, A. (2006). Why Programs Fail , Elsevier.
    • Zhang, L., J. Zhu, et al. (2004). &quot;An Evaluation of Statistical Spam Filtering Techniques.&quot; ACM Transactions on Asian Language Information Processing (TALIP) 3 (4): 243-269.
  • Acknowledgements
    • My wife, Yeon
    • The best advisor – Jim Whitehead
    • Committee members - David Helmbold and Cormac Flanagan
    • Lab mates – Kai Pan, Guozheng Ge, Mark Slater, Jennifer Bevan, and Elias Sinderson
    • Collaborators - Thomas Zimmermann, Andreas Zeller, and Miryung Kim.
    • Fellow graduate students
  • Adaptive Bug Prediction By Analyzing Project History Dissertation Defense, Aug 21, 2006 Sunghun Kim <hunkim@cs.ucsc.edu> University of California, Santa Cruz