SlideShare a Scribd company logo
1 of 24
Predicting More from Less:
Synergies of Learning
Ekrem Kocaguneli, ekrem@kocaguneli.com
Bojan Cukic, bojan.cukic@mail.wvu.edu,
Huihua Lu, hlu3@mix.wvu.edu
RAISE'13 
2nd International NSF sponsored Workshop
on Realizing Artificial Intelligence Synergies in Software Engineering
5/25/2013
RAISE'13
Collecting data is important
SourceForge currently hosts
324K projects with a user
base of 3.4M1
GoogleCode hosts 250K open
source projects2
1. http://sourceforge.net/apps/trac/sourceforge/wiki/What%20is%20SourceForge.net
2. https://developers.google.com/open-source/
1
Also, there is an abundant
amount of SE repositories
ISBSG1 PROMISE2
Eclipse Bug Data3
TukuTuku4
1. C. Lokan, T. Wright, P. Hill, and M. Stringer. Organizational bench- marking using the ISBSG data repository. IEEE Software, 18(5):26–
32, 2001.
2. T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering
data, June 2012.
3. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In International Workshop on Predictor Models in Software
Engineering, 2007. PROMISE’07: ICSE Workshops 2007.
4. http://www.metriq.biz/tukutuku/ 2
We have mountains of data,
but then what?
3
Abundance of data is promising for predictive
modeling and supervised learning
Yet, dependent variable information is
not always available!
Dependent variables (labels, effort values
etc.) may be missing, outdated or
available for a limited number of
instances
4
When an organization has no local
data or the local data is outdated,
transferring data helps
When only a limited amount of data is
labeled, we can use the existing labels
to label other training instances
When no labels exist, we can request
labels from experts with a cost
Transfer
learning
Semi-
supervised
learning
Active
learning 5
How to transfer data data between
domains and projects?
How to accommodate prediction
problems for which a limited amount
of labeled instances are available?
How to handle prediction problems in
which no instances have labels?
Transfer
learning
Semi-
supervised
learning
Active
learning 6
What is the current
state-of-the-art?
7
Transfer learning is a set of learning methods that allow
the training and test sets to have different domains
and/or tasks (Ma2012 [1]).
Transfer learning - 1
[1] Y. Ma, G. Luo, X. Zeng, and A. Chen. Transfer learning for cross- company software defect
prediction. Information and Software Technol- ogy, 54(3):248 – 256, 2012.
SE transfer learning studies (a.k.a. cross-company
learning) have the same task yet different domains
(data coming from different organizations or different
time frames).
8
Transfer learning results in SE report instability and
significant variability if data is used as-is
(Kitchenham2007 [1], Zimmermann2009[2])
Transfer learning - 2
[1] B.A.Kitchenham,E.Mendes,andG.H.Travassos.Crossversuswithin- company cost estimation studies: A systematic review. IEEE Trans. Softw.
Eng., 33(5):316–329, 2007.
[2] T.Zimmermann,N.Nagappan,H.Gall,E.Giger,andB.Murphy.Cross- project defect prediction: A large scale experiment on data vs. domain vs.
process. ESEC/FSE, pages 91–100, 2009.
[3] B. Turhan, T. Menzies, A. Bener, and J. Di Stefano. On the relative value of cross-company and within-company data for defect prediction.
Empirical Software Engineering, 14(5):540–578, 2009.
[4] E. Kocaguneli and T. Menzies. How to find relevant data for effort es- timation. In ESEM’11: International Symposium on Empirical Software
Engineering and Measurement, 2011.
Filtering-based approaches support prior results
(Turhan2009[3], Kocaguneli2011[4])
• Transferring all cross data yields poor performance
• Filtering cross data significantly improves estimation
9
SSL methods are a group of machine learning algorithms
that learn from a set of training instances among which
only a small subset has pre-assigned labels [1].
Semi-supervised learning (SSL) -1
[1] O. Chapelle, B. Schlkopf, and A. Zien. Semi-supervised Learning. MIT Press, Cambridge, MA, USA, 2006.
SSL helps relax the dependent variable dependence
of supervised methods
Hence, we can supplement supervised
estimation methods.
10
Despite the promise, SSL appears to be
less than thoroughly investigated in SE
Semi-supervised learning (SSL) - 2
[1] Huihua Lu, Bojan Cukic, and Mark Culp. 2012. Software defect prediction using semi-supervised learning with dimension reduction. In
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012).
[2] M. Li, H. Zhang, R. Wu, and Z.-H. Zhou. Sample-based software defect prediction with active and semi-supervised learning. Automated
Software Engineering, 19:201–230, 2012.
Lu et al. use an SSL algorithm augmented with multi-
dimensional scaling (MDS) as pre-processor, which
outperforms corresponding supervised methods
Li et al. developed a framework which
maps ensemble learning and random
forests into an SSL setting [19].
11
AL methods are unsupervised methods working on an
initially unlabeled data set.
Active Learning (AL) - 1
[1] M.-F.Balcan, A.Beygelzimer, andJ.Langford. “Agnostic active learning”. Proceedings of the 23rd international conference on Machine learning
- ICML ’06, pages 65–72, 2006.
AL methods can query an oracle, which can provide
labels. Yet, each label comes with a cost. Hence, we
need as few queries as possible.
e.g. Balcan et al. show AL provides the
same performance as a supervised
learner with substantially smaller
samples sizes [1]
12
In SE, AL methods hold a good
potential to reduce the labeling costs
Active Learning (AL) - 2
[1] Huihua Lu and Bojan Cukic. 2012. An adaptive approach with active learning in software fault prediction. In Proceedings of the 8th
International Conference on Predictive Models in Software Engineering (PROMISE '12).
[2] Kocaguneli, E.; Menzies, T.; Keung, J.; Cok, D.; Madachy, R., "Active Learning and Effort Estimation: Finding the Essential Content of Software
Effort Estimation Data," Software Engineering, IEEE Transactions on , vol.PP, no.99, pp.1,1, 0
Lu et al. propose an AL-based fault prediction
method, which outperforms supervised techniques
by using 20% or less of the data [1]
Kocaguneli et al. use AL in SEE. The proposed
method performs comparable to supervised
methods with 31% of the original data [2]
13
So what do we do?
14
Strengths and Weaknesses
Supervised Learning (SL)
Strengths
• Successfully used in SE for predictive
purposes.
• Provides successful estimation
performance.
Challenges
• Requires retrospective local data.
• Requires dependent variable
information.
Transfer Learning (TL)
Strengths
• Enables data to be transferred between
different organizations or time frames.
• Provides a solution to the lack of local data.
• After relevancy filtering, cross data can
perform as well as within data.
Challenges
• Use of cross-data in an as is manner results in
unstable performance results.
• TL filters relevant cross data, which reduces
the transferred cross data amount.
Semi-supervised Learning (SSL)
Strengths
• Enables learning from small sets of labeled
instances.
• Supplements the learning with unlabeled instances.
• Relaxes the requirement of dependent variables.
Challenges
• Although being small, it still requires an initially
labeled set of training instances.
• For datasets with large number of independent
features, it requires feature subset selection.
Active Learning (AL)
Strengths
• Helps find the essential content of the data.
• Decreases the number of dependent variable
information, thereby reducing the associated
data collection costs.
Challenges
• Susceptible to unbalanced class distributions
in classification problems.
15
Strengths and Weaknesses
Supervised Learning (SL)
• Requires retrospective local data.
Transfer Learning (TL)
• Provides a solution to the lack of local data.
• TL filters relevant cross data, which reduces
the transferred cross data amount.
Semi-supervised Learning (SSL)
• Enables learning from small sets of labeled
instances.
Active Learning (AL)
• Helps find the essential content of the data.
1
2
3
16
Synergy #1
Synergy #1 is already being pursued in SE
With successful applications of
transferring data among:
• Domain
• Time frame
17
Filtering labeled cross data yields a very limited
amount of locally relevant data
SSL can use filtered cross data to provide pseudo-
labels for the unlabeled within data
Synergy #2
18
SE data (defect and effort) can be summarized
with its essential content
Transfer learning may benefit from using
essential content instead of all the data, which
may contain noise and outliers
Synergy #3
19
Did you try any
of the synergies?
20
Within test project(s)
Cross data
Es ma on
Method
Estimate
TEAK
filter
Filtered cross data
Past within data
(without labels)
QUICK
Essential
within data
SSL
Essential within data
with pseudo labels
1
2
3
4
Experiments with
Synergy #3
21
Experiments with
Synergy #3
Estimation from
pseudo-labeled
within data
Within data is
summarized to at
most 15%
Opportunity for
within data to be
locally interpreted
22
What have we covered?
23

More Related Content

What's hot

Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_moheyDoaa Mohey Eldin
 
Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Doaa Mohey Eldin
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Doaa Mohey Eldin
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataJ48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataCSCJournals
 
Query aware determinization of uncertain objects
Query aware determinization of uncertain objectsQuery aware determinization of uncertain objects
Query aware determinization of uncertain objectsSoftroniics india
 
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...Joeran Beel
 
Active learning for ranking through expected loss optimization
Active learning for ranking through expected loss optimizationActive learning for ranking through expected loss optimization
Active learning for ranking through expected loss optimizationPvrtechnologies Nellore
 
Efficient Refining Of Why-Not Questions on Top-K Queries
Efficient Refining Of Why-Not Questions on Top-K QueriesEfficient Refining Of Why-Not Questions on Top-K Queries
Efficient Refining Of Why-Not Questions on Top-K Queriesiosrjce
 
Advanced Question Paper Generator using Fuzzy Logic
Advanced Question Paper Generator using Fuzzy LogicAdvanced Question Paper Generator using Fuzzy Logic
Advanced Question Paper Generator using Fuzzy LogicIRJET Journal
 
IRJET- Missing Value Evaluation in SQL Queries: A Survey
IRJET- 	  Missing Value Evaluation in SQL Queries: A SurveyIRJET- 	  Missing Value Evaluation in SQL Queries: A Survey
IRJET- Missing Value Evaluation in SQL Queries: A SurveyIRJET Journal
 
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONREVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
 
Data science lecture1_doaa_mohey
Data science lecture1_doaa_moheyData science lecture1_doaa_mohey
Data science lecture1_doaa_moheyDoaa Mohey Eldin
 
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Machine learning testing  survey, landscapes and horizons, the Cliff NotesMachine learning testing  survey, landscapes and horizons, the Cliff Notes
Machine learning testing survey, landscapes and horizons, the Cliff NotesHeemeng Foo
 
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...Joeran Beel
 
Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...
Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...
Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...Joeran Beel
 

What's hot (19)

De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
 
Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataJ48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance Data
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Query aware determinization of uncertain objects
Query aware determinization of uncertain objectsQuery aware determinization of uncertain objects
Query aware determinization of uncertain objects
 
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
 
Active learning for ranking through expected loss optimization
Active learning for ranking through expected loss optimizationActive learning for ranking through expected loss optimization
Active learning for ranking through expected loss optimization
 
Efficient Refining Of Why-Not Questions on Top-K Queries
Efficient Refining Of Why-Not Questions on Top-K QueriesEfficient Refining Of Why-Not Questions on Top-K Queries
Efficient Refining Of Why-Not Questions on Top-K Queries
 
Advanced Question Paper Generator using Fuzzy Logic
Advanced Question Paper Generator using Fuzzy LogicAdvanced Question Paper Generator using Fuzzy Logic
Advanced Question Paper Generator using Fuzzy Logic
 
IRJET- Missing Value Evaluation in SQL Queries: A Survey
IRJET- 	  Missing Value Evaluation in SQL Queries: A SurveyIRJET- 	  Missing Value Evaluation in SQL Queries: A Survey
IRJET- Missing Value Evaluation in SQL Queries: A Survey
 
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONREVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
 
Data science lecture1_doaa_mohey
Data science lecture1_doaa_moheyData science lecture1_doaa_mohey
Data science lecture1_doaa_mohey
 
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Machine learning testing  survey, landscapes and horizons, the Cliff NotesMachine learning testing  survey, landscapes and horizons, the Cliff Notes
Machine learning testing survey, landscapes and horizons, the Cliff Notes
 
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Pers...
 
Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...
Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...
Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse...
 
4 de47584
4 de475844 de47584
4 de47584
 

Viewers also liked

Reflection Support for Communities on the Web
Reflection Support for Communities on the WebReflection Support for Communities on the Web
Reflection Support for Communities on the WebRalf Klamma
 
Can we build software better and faster and cheaper
Can we build software better and faster and cheaperCan we build software better and faster and cheaper
Can we build software better and faster and cheaperCS, NcState
 
Learning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail LearnerLearning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail LearnerRalf Klamma
 
Finding local lessons in software engineering
Finding local lessons in software engineeringFinding local lessons in software engineering
Finding local lessons in software engineeringCS, NcState
 
7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές
7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές
7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστέςChristopher Pappas
 
Lecture 8: More DCGs
Lecture 8: More DCGsLecture 8: More DCGs
Lecture 8: More DCGsCS, NcState
 
2011 A/NZ Cloud Solutions For Smb 20 July
2011 A/NZ Cloud Solutions For Smb 20 July2011 A/NZ Cloud Solutions For Smb 20 July
2011 A/NZ Cloud Solutions For Smb 20 JulyGraeme Wood
 
What is a PhotoCamp?
What is a PhotoCamp?What is a PhotoCamp?
What is a PhotoCamp?Pete Prodoehl
 
Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 

Viewers also liked (9)

Reflection Support for Communities on the Web
Reflection Support for Communities on the WebReflection Support for Communities on the Web
Reflection Support for Communities on the Web
 
Can we build software better and faster and cheaper
Can we build software better and faster and cheaperCan we build software better and faster and cheaper
Can we build software better and faster and cheaper
 
Learning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail LearnerLearning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail Learner
 
Finding local lessons in software engineering
Finding local lessons in software engineeringFinding local lessons in software engineering
Finding local lessons in software engineering
 
7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές
7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές
7 συμβουλές για να γίνεται επιτυχημένοι εξ αποστάσεως σπουδαστές
 
Lecture 8: More DCGs
Lecture 8: More DCGsLecture 8: More DCGs
Lecture 8: More DCGs
 
2011 A/NZ Cloud Solutions For Smb 20 July
2011 A/NZ Cloud Solutions For Smb 20 July2011 A/NZ Cloud Solutions For Smb 20 July
2011 A/NZ Cloud Solutions For Smb 20 July
 
What is a PhotoCamp?
What is a PhotoCamp?What is a PhotoCamp?
What is a PhotoCamp?
 
Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 

Similar to Predicting More from Less: Synergies of Learning

AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 
Comparative performance analysis
Comparative performance analysisComparative performance analysis
Comparative performance analysiscsandit
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
 
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...ijcsa
 
EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...
EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...
EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...cscpconf
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...ijseajournal
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...ijseajournal
 
Research issues in object oriented software testing
Research issues in object oriented software testingResearch issues in object oriented software testing
Research issues in object oriented software testingAnshul Vinayak
 
Federated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesFederated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesAlAtfat
 
A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...IJECEIAES
 
STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...
STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...
STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...IJCSES Journal
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for RequirementsClément Portet
 
Object Oriented Programming using C++.pptx
Object Oriented Programming using C++.pptxObject Oriented Programming using C++.pptx
Object Oriented Programming using C++.pptxparveen837153
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachcsandit
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
 

Similar to Predicting More from Less: Synergies of Learning (20)

AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Comparative performance analysis
Comparative performance analysisComparative performance analysis
Comparative performance analysis
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...
 
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATIONONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
 
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
 
EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...
EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...
EARLY STAGE SOFTWARE DEVELOPMENT EFFORT ESTIMATIONS – MAMDANI FIS VS NEURAL N...
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
 
Research issues in object oriented software testing
Research issues in object oriented software testingResearch issues in object oriented software testing
Research issues in object oriented software testing
 
Federated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesFederated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devices
 
A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...
 
STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...
STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...
STATE-OF-THE-ART IN EMPIRICAL VALIDATION OF SOFTWARE METRICS FOR FAULT PRONEN...
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
 
Object Oriented Programming using C++.pptx
Object Oriented Programming using C++.pptxObject Oriented Programming using C++.pptx
Object Oriented Programming using C++.pptx
 
OOP ppt.pdf
OOP ppt.pdfOOP ppt.pdf
OOP ppt.pdf
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approach
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
 

More from CS, NcState

GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 

More from CS, NcState (20)

Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Predicting More from Less: Synergies of Learning

  • 1. Predicting More from Less: Synergies of Learning Ekrem Kocaguneli, ekrem@kocaguneli.com Bojan Cukic, bojan.cukic@mail.wvu.edu, Huihua Lu, hlu3@mix.wvu.edu RAISE'13 
2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering 5/25/2013 RAISE'13
  • 2. Collecting data is important SourceForge currently hosts 324K projects with a user base of 3.4M1 GoogleCode hosts 250K open source projects2 1. http://sourceforge.net/apps/trac/sourceforge/wiki/What%20is%20SourceForge.net 2. https://developers.google.com/open-source/ 1
  • 3. Also, there is an abundant amount of SE repositories ISBSG1 PROMISE2 Eclipse Bug Data3 TukuTuku4 1. C. Lokan, T. Wright, P. Hill, and M. Stringer. Organizational bench- marking using the ISBSG data repository. IEEE Software, 18(5):26– 32, 2001. 2. T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data, June 2012. 3. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In International Workshop on Predictor Models in Software Engineering, 2007. PROMISE’07: ICSE Workshops 2007. 4. http://www.metriq.biz/tukutuku/ 2
  • 4. We have mountains of data, but then what? 3
  • 5. Abundance of data is promising for predictive modeling and supervised learning Yet, dependent variable information is not always available! Dependent variables (labels, effort values etc.) may be missing, outdated or available for a limited number of instances 4
  • 6. When an organization has no local data or the local data is outdated, transferring data helps When only a limited amount of data is labeled, we can use the existing labels to label other training instances When no labels exist, we can request labels from experts with a cost Transfer learning Semi- supervised learning Active learning 5
  • 7. How to transfer data data between domains and projects? How to accommodate prediction problems for which a limited amount of labeled instances are available? How to handle prediction problems in which no instances have labels? Transfer learning Semi- supervised learning Active learning 6
  • 8. What is the current state-of-the-art? 7
  • 9. Transfer learning is a set of learning methods that allow the training and test sets to have different domains and/or tasks (Ma2012 [1]). Transfer learning - 1 [1] Y. Ma, G. Luo, X. Zeng, and A. Chen. Transfer learning for cross- company software defect prediction. Information and Software Technol- ogy, 54(3):248 – 256, 2012. SE transfer learning studies (a.k.a. cross-company learning) have the same task yet different domains (data coming from different organizations or different time frames). 8
  • 10. Transfer learning results in SE report instability and significant variability if data is used as-is (Kitchenham2007 [1], Zimmermann2009[2]) Transfer learning - 2 [1] B.A.Kitchenham,E.Mendes,andG.H.Travassos.Crossversuswithin- company cost estimation studies: A systematic review. IEEE Trans. Softw. Eng., 33(5):316–329, 2007. [2] T.Zimmermann,N.Nagappan,H.Gall,E.Giger,andB.Murphy.Cross- project defect prediction: A large scale experiment on data vs. domain vs. process. ESEC/FSE, pages 91–100, 2009. [3] B. Turhan, T. Menzies, A. Bener, and J. Di Stefano. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5):540–578, 2009. [4] E. Kocaguneli and T. Menzies. How to find relevant data for effort es- timation. In ESEM’11: International Symposium on Empirical Software Engineering and Measurement, 2011. Filtering-based approaches support prior results (Turhan2009[3], Kocaguneli2011[4]) • Transferring all cross data yields poor performance • Filtering cross data significantly improves estimation 9
  • 11. SSL methods are a group of machine learning algorithms that learn from a set of training instances among which only a small subset has pre-assigned labels [1]. Semi-supervised learning (SSL) -1 [1] O. Chapelle, B. Schlkopf, and A. Zien. Semi-supervised Learning. MIT Press, Cambridge, MA, USA, 2006. SSL helps relax the dependent variable dependence of supervised methods Hence, we can supplement supervised estimation methods. 10
  • 12. Despite the promise, SSL appears to be less than thoroughly investigated in SE Semi-supervised learning (SSL) - 2 [1] Huihua Lu, Bojan Cukic, and Mark Culp. 2012. Software defect prediction using semi-supervised learning with dimension reduction. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012). [2] M. Li, H. Zhang, R. Wu, and Z.-H. Zhou. Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 19:201–230, 2012. Lu et al. use an SSL algorithm augmented with multi- dimensional scaling (MDS) as pre-processor, which outperforms corresponding supervised methods Li et al. developed a framework which maps ensemble learning and random forests into an SSL setting [19]. 11
  • 13. AL methods are unsupervised methods working on an initially unlabeled data set. Active Learning (AL) - 1 [1] M.-F.Balcan, A.Beygelzimer, andJ.Langford. “Agnostic active learning”. Proceedings of the 23rd international conference on Machine learning - ICML ’06, pages 65–72, 2006. AL methods can query an oracle, which can provide labels. Yet, each label comes with a cost. Hence, we need as few queries as possible. e.g. Balcan et al. show AL provides the same performance as a supervised learner with substantially smaller samples sizes [1] 12
  • 14. In SE, AL methods hold a good potential to reduce the labeling costs Active Learning (AL) - 2 [1] Huihua Lu and Bojan Cukic. 2012. An adaptive approach with active learning in software fault prediction. In Proceedings of the 8th International Conference on Predictive Models in Software Engineering (PROMISE '12). [2] Kocaguneli, E.; Menzies, T.; Keung, J.; Cok, D.; Madachy, R., "Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data," Software Engineering, IEEE Transactions on , vol.PP, no.99, pp.1,1, 0 Lu et al. propose an AL-based fault prediction method, which outperforms supervised techniques by using 20% or less of the data [1] Kocaguneli et al. use AL in SEE. The proposed method performs comparable to supervised methods with 31% of the original data [2] 13
  • 15. So what do we do? 14
  • 16. Strengths and Weaknesses Supervised Learning (SL) Strengths • Successfully used in SE for predictive purposes. • Provides successful estimation performance. Challenges • Requires retrospective local data. • Requires dependent variable information. Transfer Learning (TL) Strengths • Enables data to be transferred between different organizations or time frames. • Provides a solution to the lack of local data. • After relevancy filtering, cross data can perform as well as within data. Challenges • Use of cross-data in an as is manner results in unstable performance results. • TL filters relevant cross data, which reduces the transferred cross data amount. Semi-supervised Learning (SSL) Strengths • Enables learning from small sets of labeled instances. • Supplements the learning with unlabeled instances. • Relaxes the requirement of dependent variables. Challenges • Although being small, it still requires an initially labeled set of training instances. • For datasets with large number of independent features, it requires feature subset selection. Active Learning (AL) Strengths • Helps find the essential content of the data. • Decreases the number of dependent variable information, thereby reducing the associated data collection costs. Challenges • Susceptible to unbalanced class distributions in classification problems. 15
  • 17. Strengths and Weaknesses Supervised Learning (SL) • Requires retrospective local data. Transfer Learning (TL) • Provides a solution to the lack of local data. • TL filters relevant cross data, which reduces the transferred cross data amount. Semi-supervised Learning (SSL) • Enables learning from small sets of labeled instances. Active Learning (AL) • Helps find the essential content of the data. 1 2 3 16
  • 18. Synergy #1 Synergy #1 is already being pursued in SE With successful applications of transferring data among: • Domain • Time frame 17
  • 19. Filtering labeled cross data yields a very limited amount of locally relevant data SSL can use filtered cross data to provide pseudo- labels for the unlabeled within data Synergy #2 18
  • 20. SE data (defect and effort) can be summarized with its essential content Transfer learning may benefit from using essential content instead of all the data, which may contain noise and outliers Synergy #3 19
  • 21. Did you try any of the synergies? 20
  • 22. Within test project(s) Cross data Es ma on Method Estimate TEAK filter Filtered cross data Past within data (without labels) QUICK Essential within data SSL Essential within data with pseudo labels 1 2 3 4 Experiments with Synergy #3 21
  • 23. Experiments with Synergy #3 Estimation from pseudo-labeled within data Within data is summarized to at most 15% Opportunity for within data to be locally interpreted 22
  • 24. What have we covered? 23