SlideShare a Scribd company logo

Developing recommendation systems to support open source software developers challenges and lessons learned

Open-source software (OSS) forges contain rich data sources useful for supporting development activities. Several techniques and tools have been promoted to provide open source developers with innovative features, aiming to obtain improvements in development effort, cost savings, and developer productivity. In the context of the EU H2020 CROSSMINER project, different recommendation systems have been conceived to assist software programmers in different phases of the development process by providing them with various artifacts, such as third-party libraries, or documentation about how to use the APIs being adopted, or relevant API function calls. To develop such recommendations, various technical choices have been made to overcome issues related to several aspects, including the lack of baselines, limited data availability, decisions about the performance measures, and evaluation approaches. This lecture provides an introduction to Recommendation Systems in Software Engineering (RSSE) and describes the challenges that have been encountered in the context of the CROSSMINER project. Specific attention is devoted to present the intricacies related to the development and evaluation techniques that have been employed to conceive and evaluate the CROSSMINER recommendation systems. The lessons that have been learned while working on the project are also discussed. https://sites.google.com/gssi.it/csgssi/ph-d-program/se-ai-course-2021

1 of 89
Download to read offline
http://people.disim.univaq.it/diruscio/
davide.diruscio@univaq.it
@ddiruscio
Dipartimento di Ingegneria e Scienze
Università degli Studi dell’Aquila
dell’Informazione e Matematica
Developing recommendation systems to
support open-source software developers:
challenges and lessons learned
Davide Di Ruscio
2
Who am I?
http://people.disim.univaq.it/diruscio/
3
Development of complex software systems by reusing
third-party open source components
Recommendation systems in Software Engineering
4
5
https://www.slideshare.net/CrossingMinds/recommendation-system-explained?from_action=save
6
Problem domain
Recommendation systems (RS) help to match users with items
– Ease information overload
– Sales assistance (guidance, advisory, persuasion,…)
Different system designs / paradigms
– Based on availability of exploitable data
– Implicit and explicit user feedback
– Domain characteristics
RS are software agents that elicit the interests and preferences of individual consumers
[…] and make recommendations accordingly. They have the potential to support and
improve the quality of the decision's consumers make while searching for and selecting
products online.
[Xiao & Benbasat, MISQ, 2007]
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx

Recommended

On the way of listening to the crowd for supporting modeling activities
On the way of listening to the crowd for supporting modeling activitiesOn the way of listening to the crowd for supporting modeling activities
On the way of listening to the crowd for supporting modeling activitiesDavide Ruscio
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19OW2
 
Developer-Centric Knowledge Mining from Large OSS Repositories
Developer-Centric Knowledge Mining from Large OSS RepositoriesDeveloper-Centric Knowledge Mining from Large OSS Repositories
Developer-Centric Knowledge Mining from Large OSS RepositoriesCROSSMINER European Project
 
Developer Experience (DX) as a Fitness Function for Platform Teams
Developer Experience (DX) as a Fitness Function for Platform TeamsDeveloper Experience (DX) as a Fitness Function for Platform Teams
Developer Experience (DX) as a Fitness Function for Platform TeamsAndy Marks
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Sonya Liberman
 
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...John-Paul Navarro
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project ManagementShauryaGupta38
 

More Related Content

Similar to Developing recommendation systems to support open source software developers challenges and lessons learned

Customer to Customer recommendation system
Customer to Customer recommendation systemCustomer to Customer recommendation system
Customer to Customer recommendation systemsksaif95
 
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...OW2
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research softwareShoaib Sufi
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Sonya Liberman
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareJoel Nothman
 
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...Au Gai
 
NISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateNISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateTACNISO
 
Chapter 7 Development StrategiesInformation Technology Project Management .pptx
Chapter 7 Development StrategiesInformation Technology Project Management  .pptxChapter 7 Development StrategiesInformation Technology Project Management  .pptx
Chapter 7 Development StrategiesInformation Technology Project Management .pptxAxmedMaxamuudYoonis
 
Conference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesConference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesAliaksandr Birukou
 
Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Michael Petychakis
 
SE_Module1new.ppt
SE_Module1new.pptSE_Module1new.ppt
SE_Module1new.pptADARSHN40
 
System Development Overview Assignment 3
System Development Overview Assignment 3System Development Overview Assignment 3
System Development Overview Assignment 3Ashley Fisher
 
GFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private SectorGFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private SectorWorld Resources Institute (WRI)
 
chapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdfchapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdfAxmedMaxamuud6
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systemsvivatechijri
 
Improving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsImproving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsDianaGray10
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemSeval Çapraz
 

Similar to Developing recommendation systems to support open source software developers challenges and lessons learned (20)

Customer to Customer recommendation system
Customer to Customer recommendation systemCustomer to Customer recommendation system
Customer to Customer recommendation system
 
1802_Crossminer_OCF2018
1802_Crossminer_OCF20181802_Crossminer_OCF2018
1802_Crossminer_OCF2018
 
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source Software
 
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
 
NISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateNISO-STM RA21 Project Update
NISO-STM RA21 Project Update
 
Chapter 7 Development StrategiesInformation Technology Project Management .pptx
Chapter 7 Development StrategiesInformation Technology Project Management  .pptxChapter 7 Development StrategiesInformation Technology Project Management  .pptx
Chapter 7 Development StrategiesInformation Technology Project Management .pptx
 
Conference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesConference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferences
 
Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...
 
SE_Module1new.ppt
SE_Module1new.pptSE_Module1new.ppt
SE_Module1new.ppt
 
System Development Overview Assignment 3
System Development Overview Assignment 3System Development Overview Assignment 3
System Development Overview Assignment 3
 
GFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private SectorGFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
 
chapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdfchapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdf
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Improving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsImproving the User Experience of UiPath Apps
Improving the User Experience of UiPath Apps
 
Software Analytics
Software AnalyticsSoftware Analytics
Software Analytics
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 
243
243243
243
 

More from Davide Ruscio

Detecting java software similarities by using different clustering
Detecting java software similarities by using different clusteringDetecting java software similarities by using different clustering
Detecting java software similarities by using different clusteringDavide Ruscio
 
FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns
FOCUS:  A Recommender System for Mining API Function Calls and  Usage PatternsFOCUS:  A Recommender System for Mining API Function Calls and  Usage Patterns
FOCUS: A Recommender System for Mining API Function Calls and Usage PatternsDavide Ruscio
 
CrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsCrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsDavide Ruscio
 
Use of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source SoftwareUse of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source SoftwareDavide Ruscio
 
Consistency Recovery in Interactive Modeling
Consistency Recovery in Interactive ModelingConsistency Recovery in Interactive Modeling
Consistency Recovery in Interactive ModelingDavide Ruscio
 
Edelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactoringsEdelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactoringsDavide Ruscio
 
Semantic based model matching with emf compare
Semantic based model matching with emf compareSemantic based model matching with emf compare
Semantic based model matching with emf compareDavide Ruscio
 
Collaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping StudyCollaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping StudyDavide Ruscio
 
Model repositories: will they become reality?
Model repositories: will they become reality?Model repositories: will they become reality?
Model repositories: will they become reality?Davide Ruscio
 
Mining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel MetricsMining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel Metrics Davide Ruscio
 
MDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platformMDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platformDavide Ruscio
 

More from Davide Ruscio (11)

Detecting java software similarities by using different clustering
Detecting java software similarities by using different clusteringDetecting java software similarities by using different clustering
Detecting java software similarities by using different clustering
 
FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns
FOCUS:  A Recommender System for Mining API Function Calls and  Usage PatternsFOCUS:  A Recommender System for Mining API Function Calls and  Usage Patterns
FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns
 
CrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsCrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projects
 
Use of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source SoftwareUse of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source Software
 
Consistency Recovery in Interactive Modeling
Consistency Recovery in Interactive ModelingConsistency Recovery in Interactive Modeling
Consistency Recovery in Interactive Modeling
 
Edelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactoringsEdelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactorings
 
Semantic based model matching with emf compare
Semantic based model matching with emf compareSemantic based model matching with emf compare
Semantic based model matching with emf compare
 
Collaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping StudyCollaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping Study
 
Model repositories: will they become reality?
Model repositories: will they become reality?Model repositories: will they become reality?
Model repositories: will they become reality?
 
Mining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel MetricsMining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel Metrics
 
MDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platformMDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platform
 

Recently uploaded

From Software Development To Branding through Digital Marketing, IT Services
From Software Development To Branding through Digital Marketing, IT ServicesFrom Software Development To Branding through Digital Marketing, IT Services
From Software Development To Branding through Digital Marketing, IT ServicesAnisha Agarwal
 
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزارانتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزارsohilww
 
Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019VICTOR MAESTRE RAMIREZ
 
Role of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxRole of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxMindInventory
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Open Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and ConsOpen Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and ConsSprings
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
Machine Learning Basics for Dummies (no math!)
Machine Learning Basics for Dummies (no math!)Machine Learning Basics for Dummies (no math!)
Machine Learning Basics for Dummies (no math!)Dmitry Zinoviev
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowNaoki (Neo) SATO
 
Orion Context Broker introduction 20240227
Orion Context Broker introduction 20240227Orion Context Broker introduction 20240227
Orion Context Broker introduction 20240227Fermin Galan
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
How AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleHow AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleAmir Moghimi
 
Agile & Scrum, Certified Scrum Master! Crash Course
Agile & Scrum,  Certified Scrum Master! Crash CourseAgile & Scrum,  Certified Scrum Master! Crash Course
Agile & Scrum, Certified Scrum Master! Crash CourseRohan Chandane
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Passbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managmentPassbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managmentThierry Gayet
 
Joseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureJoseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureHironori Washizaki
 
Managing multicast/igmp stream on Docker
Managing multicast/igmp stream on DockerManaging multicast/igmp stream on Docker
Managing multicast/igmp stream on DockerThierry Gayet
 
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...
CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...syedfaisal759877
 

Recently uploaded (20)

From Software Development To Branding through Digital Marketing, IT Services
From Software Development To Branding through Digital Marketing, IT ServicesFrom Software Development To Branding through Digital Marketing, IT Services
From Software Development To Branding through Digital Marketing, IT Services
 
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزارانتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
 
Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019
 
Role of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxRole of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptx
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Open Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and ConsOpen Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and Cons
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
Machine Learning Basics for Dummies (no math!)
Machine Learning Basics for Dummies (no math!)Machine Learning Basics for Dummies (no math!)
Machine Learning Basics for Dummies (no math!)
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flow
 
Orion Context Broker introduction 20240227
Orion Context Broker introduction 20240227Orion Context Broker introduction 20240227
Orion Context Broker introduction 20240227
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
How AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleHow AI is preventing account fraud at web scale
How AI is preventing account fraud at web scale
 
Agile & Scrum, Certified Scrum Master! Crash Course
Agile & Scrum,  Certified Scrum Master! Crash CourseAgile & Scrum,  Certified Scrum Master! Crash Course
Agile & Scrum, Certified Scrum Master! Crash Course
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Passbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managmentPassbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managment
 
Joseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureJoseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about Architecture
 
Managing multicast/igmp stream on Docker
Managing multicast/igmp stream on DockerManaging multicast/igmp stream on Docker
Managing multicast/igmp stream on Docker
 
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...
CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...
 

Developing recommendation systems to support open source software developers challenges and lessons learned

  • 1. http://people.disim.univaq.it/diruscio/ davide.diruscio@univaq.it @ddiruscio Dipartimento di Ingegneria e Scienze Università degli Studi dell’Aquila dell’Informazione e Matematica Developing recommendation systems to support open-source software developers: challenges and lessons learned Davide Di Ruscio
  • 3. 3 Development of complex software systems by reusing third-party open source components Recommendation systems in Software Engineering
  • 4. 4
  • 6. 6 Problem domain Recommendation systems (RS) help to match users with items – Ease information overload – Sales assistance (guidance, advisory, persuasion,…) Different system designs / paradigms – Based on availability of exploitable data – Implicit and explicit user feedback – Domain characteristics RS are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. They have the potential to support and improve the quality of the decision's consumers make while searching for and selecting products online. [Xiao & Benbasat, MISQ, 2007] http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 7. 7 Recommender systems RS seen as a function Given: – User model (e.g. ratings, preferences, demographics, situational context) – Items (with or without description of item characteristics) Find: – Relevance score. Used for ranking. Finally: – Recommend items that are assumed to be relevant http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 8. 8 Recommender systems RS seen as a function Given: – User model (e.g. ratings, preferences, demographics, situational context) – Items (with or without description of item characteristics) Find: – Relevance score. Used for ranking. Finally: – Recommend items that are assumed to be relevant But: • Remember that relevance might be context-dependent • Characteristics of the list itself might be important (diversity) http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 9. 9 Paradigms of recommender systems Recommender systems reduce information overload by estimating relevance http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 10. 10 Paradigms of recommender systems Personalized recommendations http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 11. 11 Paradigms of recommender systems Collaborative: "Tell me what's popular among my peers" http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 12. 12 Paradigms of recommender systems Content-based: "Show me more of the same what I've liked" http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 13. 13 Paradigms of recommender systems Knowledge-based: "Tell me what fits based on my needs" http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 14. 14 Paradigms of recommender systems Hybrid: combinations of various inputs and/or composition of different mechanism http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 15. 15 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need
  • 16. 16 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 17. 17 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 18. 18 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 19. 19 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 20. 20 Collaborative-Filtering Technique 20 R1 R2 R3 c1 5 5 2 c2 3 3 4 c3 5 5 ? Internal Meeting, 31 October 2017 User-item matrix: Ratings given to Pizza restaurants by customers
  • 21. 21 Recommendation Systems in Software Engineering A recommendation system in software engineering is “. . . a software application that provides information items estimated to be valuable for a software engineering task in a given context.”
  • 22. 22 Development of complex software systems by reusing third-party open source components Recommendation systems in Software Engineering
  • 23. 23 Development of complex software systems by reusing third-party open source components Recommendation systems in Software Engineering
  • 24. 24 Context Related activities - Searching for candidate components - Evaluating a set of retrieved candidate components to find the most suitable one - Understand how to use the selected components - Monitoring the selected components Development of new software systems by reusing existing open source components
  • 25. 25 Context Source code Q&A systems Bug Reports API Documentation Tutorials Configuration Management Systems
  • 26. 26 Context Related activities - Searching for candidate components - Evaluating a set of retrieved candidate components to find the most suitable one - Understanding how to use the selected components - Monitoring the selected components Development of new software systems by reusing existing open source components
  • 27. 27 Selecting and Using OSS components Challenging tasks - assessing quality, maturity, activity of development and user support is not a straightforward process Different and heterogeneous source of information - e.g., code repositories, communication channels, bug tracking systems Source code Q&A systems Bug Reports API Documentation Tutorials Configuration Management Systems
  • 28. 28 Context Related activities - Searching for candidate components - Evaluating a set of retrieved candidate components to find the most suitable one - Understanding how to use the selected components - Monitoring the selected components Development of new software systems by reusing existing open source components
  • 29. 29 Intelligent IDEs query recommendation feed mine Knowledge Base training prediction Mining and Data Extraction Advanced IDEs Incorporating various recommendation and Machine Learning techniques Aiming to efficiently and effectively mine the existing open-source software repositories
  • 30. 30 Examples of recommendations Use of machine learning algorithms to produce recommendations during development: – Depending on the set of selected third-party libraries, the system is able to recommend additional libraries that should be included in the project being developed – Given a selected library, the system is able to suggest alternative ones that share some similarities with the selected one – Depending on the set of selected libraries, the system shows API documentation and Q&A posts that can help developers to understand how to use the selected libraries – During the development, developers get recommendations about API function calls and usage patterns that might be used – …
  • 32. 32 The CROSSMINER Recommendation Systems CrossSim – Recommending similar projects CrossRec – Recommending third-party libraries FOCUS – Recommending API function calls and usage patterns MNBN – Recommending GitHub topics PostFinder - Recommending StackOverlfow posts
  • 33. CrossRec Recommending third-party libraries Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Massimiliano Di Penta: CrossRec: Supporting software developers by recommending third-party libraries. J. Syst. Softw. 161 (2020)
  • 34. 34
  • 35. 35 University of L'Aquila 35 WCRE 2013 - http://ieeexplore.ieee.org/document/6671293/ CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 36. 36 CROSSMINER Lisbon Meeting, 27-28 February 2018 LibRec: Automated Library Recommendation – can be considered as the most advanced technique for library recommendation – finds relevant libraries, based on the current set of libraries that a project already includes – is able to recommend project libraries with high recall rates
  • 37. 37 37 Collaborative-Filtering Recommendation R1 R2 R3 C1 5 5 2 C2 3 3 4 C3 5 5 ? ◼ User-item matrix: Ratings given to Pizza restaurants by customers ◼ Unknown ratings can be deduced from the most similar customers 37 CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 39. 39 CROSSMINER Lisbon Meeting, 27-28 February 2018 CrossRec: Projects-Libraries Representation
  • 40. 40 CrossRec: Projects-Libraries Representation CROSSMINER Lisbon Meeting, 27-28 February 2018 ◼ Representing the project-library relationships using a user-item ratings matrix ◼ Predict the inclusion of additional libraries
  • 41. 41 Predict the inclusion of additional libraries CROSSMINER Lisbon Meeting, 27-28 February 2018 ◼ Missing “ratings” can be predicted using collaborative-filtering techniques ◼ The row-wise and column-wise relationships are exploited to compute missing ratings
  • 42. 42 CROSSMINER Lisbon Meeting, 27-28 February 2018 Evaluation 1.200 GitHub Java projects
  • 43. 43 CrossRec: The evaluation process CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 44. 44 CROSSMINER Lisbon Meeting, 27-28 February 2018 Ten-fold cross validation The dataset was divided into ten equal parts, so-called folds The validation has been conducted in ten rounds For each round, nine folds are used as training data, and the remaining fold is used as testing data
  • 45. 45 CROSSMINER Lisbon Meeting, 27-28 February 2018 Running Example
  • 46. 46 CROSSMINER Lisbon Meeting, 27-28 February 2018 Running Example
  • 47. 47 ◼ Recall Rate: the rate at which a recommender system can return at least a match among top-N recommended items for every project ◼ Accuracy: Precision and Recall Evaluation Metrics CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 48. 48 ◼ Sales Diversity: the ability of the system to suggest to projects as much libraries as possible ◼ Novelty: It measures if a system is able to expose new and useful libraries to projects Evaluation Metrics CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 49. 49 Recall Rate@N CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 50. 50 University of L'Aquila Accuracy CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 51. FOCUS Recommending API function calls and usage patterns Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, Massimiliano Di Penta: FOCUS: a recommender system for mining API function calls and usage patterns. ICSE 2019: 1050-1060
  • 52. 52 Problem “Which API methods should this piece of client code invoke, considering that it has already invoked these other API methods?”
  • 53. 53 FOCUS: Recommending APIs and code snippets 53
  • 54. 54 Context-aware recommendation University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 54 Predict the inclusion of additional invocations
  • 55. 55 University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 55 Representation of Projects-MDs-MIs 3D user-item-context ratings matrix Mappings: – contexts ←→ projects – users ←→ declarations – items ←→ invocations
  • 56. 56 Recommendation engine: API function calls Generation of a ranked list of API function calls • Additional invocations for the active declaration are predicted by computing the missing ratings • Ranked list of invocations with scores in descending order
  • 57. 57 Recommendation engine: API usage patterns From the ranked list, top-N method invocations are used as query to search for relevant declarations Source code snippets containing the identified relevant declarations are retrieved from the available source code base
  • 58. 58
  • 59. MNBN Recommending GitHub topics Claudio Di Sipio, Riccardo Rubei, Davide Di Ruscio, Phuong T. Nguyen: A Multinomial Naïve Bayesian (MNB) Network to Automatically Recommend Topics for GitHub Repositories. EASE 2020: 71-80
  • 61. 61 Proposed approach Naïve Bayesian network is a probabilistic model based on the Bayesian theorem that expresses the probability of a certain event given a set of preconditions
  • 62. 62 Example of repositories, their topics and the recommended topics
  • 64. 64 Development of the CROSSMINER recommendation systems: main activities
  • 65. 65 Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, Riccardo Rubei. Development of recommendation systems for software engineering: the CROSSMINER experience, Accepted for publication at the Empirical Software Engineering Journal (2021) preprint https://arxiv.org/abs/2103.06987
  • 66. 66 Requirement elicitation phase: main challenge Clear understanding of the needed recommendation systems: • Understanding the functionalities that are expected from the final users of the envisioned recommendation • You might risk spending time on developing systems that are able to provide recommendations, which instead might not be relevant and inline with the actual user needs.
  • 67. 67 Requirement elicitation phase: main challenge Applied solution – We implemented demo projects that reflected real-world scenarios – Explanatory context inputs and corresponding recommendation items that the envisioned recommendation systems should have been able to produce.
  • 68. 68 Development phase: main challenge Clear awareness of existing recommendation techniques – Over the last decades, several recommendation systems have been developed by both academia and industry – It is crucial to have a clear knowledge of the possible techniques and patterns that might be employed to develop new ones – Since the solution space is extensive, comparing and evaluating candidate approaches can be a very daunting task
  • 69. 69 Development phase: main challenge Applied solution – Significant effort has been devoted to analyze existing approaches that might have been used as starting points. Data Preprocessing Capturing Context Producing Recommendations Presenting Recommendations
  • 70. 70
  • 71. 71 Evaluation phase: main challenge There is no golden rule for evaluating all possible recommendation systems due to their intrinsic features as well as heterogeneity – Which evaluation methodology is suitable? – Which metric(s) can be used? – Which dataset is eligible/available for evaluation? – Which baseline(s) can be compared with?
  • 72. 72 University of L'Aquila CROSSMINER Lisbon Meeting, 27-28 February 2018 Evaluation phase: Ten-fold cross validation The dataset was divided into ten equal parts, so-called folds The validation has been conducted in ten rounds For each round, nine folds are used as training data, and the remaining fold is used as testing data
  • 73. 73 Evaluation phase: some CROSSMINER facts
  • 76. 76 Lessons learned User scepticism: target users might be sceptical about the relevance of the potential items that can be recommended. Quality of data: importance of having the availability of big data and high- quality data for training and evaluation activities – The definition of data quality cannot be given in general, and it very much depends on the particular application of interest Baseline availability: Not always it is possible to reuse tools and data of the identified baselines – In our case, k-fold cross evaluation came at rescue – Only for CrossSim we reimplemented the related tools
  • 77. 77 Lessons learned In the case of the FOCUS evaluation, one of the considered datasets was initially consisting of 5,147 Java projects retrieved from the Software Heritage archive To comply with the requirements of the baseline, we first restricted the dataset to the list of projects that use at least one of the considered third- party libraries. To comply with the requirements of FOCUS, we restricted the dataset to those projects containing at least one pom.xml file Because of such constraints, we ended up with a dataset consisting of 610 Java projects – we had to create a dataset ten times bigger than the used one for the evaluation
  • 78. 78
  • 79. 79 What’s next Adversarial Machine Learning – Manipulating training data to perturb recommendations – Understanding attacks to recommender systems – Finding decent countermeasures
  • 80. 80 What’s next Dealing with time-series data in Software Engineering with deep learning – Recommending third-party libraries update for Android apps – Predicting code insertion for LSP based notations, e.g., Visual Studio Code, Theia – Predicting model fragment insertion for GLSP based notations, e.g., EMF cloud, Sprotty for visual language
  • 82. 82 Claudio Di Sipio, Davide Di Ruscio, Phuong T. Nguyen: Democratizing the development of recommender systems by means of low-code platforms. MODELS Companion 2020: 68:1-68:9
  • 83. 83
  • 84. 84
  • 85. 85 Some additional links - http://www.ossmeter.org - http://www.crossminer.org - http://www.eclipse.org/scava
  • 86. 86 Main references (1/3) ● Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, Riccardo Rubei, “Development of recommendation systems for software engineering: the CROSSMINER experience,” Empirical Software Engineering (EMSE), 2021, pre-print https://arxiv.org/abs/2103.06987 ● Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Davide Di Ruscio, Massimiliano Di Penta “Recommending API Function Calls and Code Snippets to Support Software Development,” IEEE Transactions on Software Engineering (TSE), 2021, ISSN: 1939-3520, DOI: 10.1109/TSE.2021.3059907 ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Massimiliano Di Penta, “CrossRec: Supporting Software Developers by Recommending Third-party Libraries,” Journal of Systems and Software (JSS), 2020, ISSN: 0164-1212, DOI: 10.1016/j.jss.2019.110460 ● Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio, “An Automated Approach to Assess the Similarity of GitHub Repositories,” Software Quality Journal (SQJ), 2020, ISSN: 0963-9314, DOI: 10.1007/s11219-019-09483-0
  • 87. 87 Main references (2/3) ● Andrea Capiluppi, Davide Di Ruscio, Juri Di Rocco, Phuong T. Nguyen, Nemitari Ajienka, “Detecting Java Software Similarities by using Different Clustering Techniques,” Information and Software Technology (IST), 2020, ISSN: 0950-5849, DOI: 10.1016/j.infsof.2020.106279 ● Riccardo Rubei, Claudio Di Sipio, Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, “PostFinder: Mining Stack Overflow posts to support software developers,” Information and Software and Technology (IST), 2020, ISSN: 0950-5849, DOI: 10.1016/j.infsof.2020.106367 ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, Massimiliano Di Penta, “FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns,” In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, DOI: 10.1109/ICSE.2019.00109
  • 88. 88 Main references (3/3) ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Alfonso Pierantonio, Ludovico Iovino, “Automated Classification of Metamodel Repositories: A Machine Learning Approach,” MODELS 2019, DOI: 10.1109/MODELS.2019.00011 ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio: “Enabling heterogeneous recommendations in OSS development: what’s done and what’s next in CROSSMINER” In Proceedings of the 23rd Int. Conf. on Evaluation and Assessment on Software Engineering, EASE 2019, DOI: 10.1145/3319008.3319353 ● Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio, “CrossSim: exploiting mutual relationships to detect similar OSS projects,” In Proceedings of the 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2018), ISBN: 978-1-5386-7383-6, DOI: 10.1109/SEAA.2018.00069
  • 89. 89 Thanks Juri Di Rocco, Claudio Di Sipio Phuong T. Nguyen, Alfonso Pierantonio, Riccardo Rubei, for some of the used slides