4. 4
Recommendation systems
Information filtering systems
Deal with choice overload
Focused on user’s:
– Preferences
– Interest
– Observed Behaviour
https://www.slideshare.net/CrossingMinds/recommendation-system-explained?from_action=save
5. 5
Recommendation systems - Examples
Facebook–“People You May Know”
Netflix–“Other Movies You May Enjoy”
LinkedIn–“Jobs You May Be Interested In”
Amazon–“Customer who bought this item also bought …”
YouTube–“Recommended Videos”
Google–“Search results adjusted”
Pinterest–“Recommended Images”
…
https://www.slideshare.net/CrossingMinds/recommendation-system-explained?from_action=save
6. 6
Recommendation systems
Recommendation systems (RS) help to match users with items
– Ease information overload
Different system designs / paradigms
– Based on availability of exploitable data
– Implicit and explicit user feedback
– Domain characteristics
RS are software agents that elicit the interests and preferences of individual consumers
[…] and make recommendations accordingly. They have the potential to support and
improve the quality of the decision's consumers make while searching for and selecting
products online.
[Xiao & Benbasat, MISQ, 2007]
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
7. 7
Recommendation systems
RS seen as a function
Given:
– User model (e.g. ratings, preferences, demographics, situational context)
– Items (with or without description of item characteristics)
Find:
– Relevance score. Used for ranking.
Finally:
– Recommend items that are assumed to be relevant
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
10. 10
Recommendation Systems in Software Engineering
A recommendation system in software
engineering is
“. . . a software application that provides
information items estimated to be
valuable for a software engineering task
in a given context.”
11. 11
Recommendation Systems in Software Engineering
Data Preprocessing Capturing Context
Producing
Recommendations
Presenting
Recommendations
14. 14
Software Analytics
"Software analytics is analytics on software data for managers
and software engineers with the aim of empowering software
development individuals and teams to gain and share insight
form their data to make better decisions."
R. Buse, T. Zimmermann. Information Needs for Software Development Analytics. Proc. Int'l Conf. Software Engineering (ICSE), IEEE CS,
2012
15. 15
Mining Software Repositories field
The Mining Software Repositories (MSR)
field analyzes the rich data available in
software repositories to uncover
interesting and actionable information
about software systems and projects.
http://www.msrconf.org/
Q&A systems
Bug Reports
API
Documentation
16. 16
Some numbers on EMSE research
Research on empirical software engineering has increasingly used data
made available in online repositories or collective efforts
Cumulative number of FOSS projects per year Average number of FOSS projects per year
22. 22
Context
Source code
Q&A systems
Bug Reports
API
Documentation
Tutorials
Configuration
Management Systems
Development of new software systems
by reusing existing open source components
25. 25
Mining and Analysis Tools
CROSSMINER: high-level view
Data Preprocessing Capturing Context
Producing
Recommendations
Presenting
Recommendations
Knowledge Base
Source Code
Miner
NLP
Miner
Configuration
Miner
Cross project
Analysis
OSS forges
Source Code
Natural
language
channels
Configuration
Scripts
lookup/store
mine
26. 26
CROSSMINER: high-level view
Data Preprocessing Capturing Context
Producing
Recommendations
Presenting
Recommendations
Developer
IDE
Knowledge Base
query
recommendations
Data
Storage
Real-time recommendations that serve productivity and quality increase
27. 27
Examples of recommendations
Use of machine learning algorithms to produce recommendations during
development:
– Depending on the set of selected third-party libraries, the system is able to recommend
additional libraries that should be included in the project being developed
– Given a selected library, the system is able to suggest alternative ones that share some
similarities with the selected one
– Depending on the set of selected libraries, the system shows API documentation and Q&A
posts that can help developers to understand how to use the selected libraries
– During the development, developers get recommendations about API function calls and usage
patterns that might be used
– …
28. 28
The CROSSMINER Recommendation Systems
CrossSim – Recommending similar projects
CrossRec – Recommending third-party libraries
FOCUS – Recommending API function calls and usage patterns
MNBN – Recommending GitHub topics
PostFinder - Recommending StackOverlfow posts
MNBN – Recommending GitHub topics
29. 29
The CROSSMINER Recommendation Systems
CrossSim – Recommending similar projects
CrossRec – Recommending third-party libraries
FOCUS – Recommending API function calls and usage patterns
MNBN – Recommending GitHub topics
PostFinder - Recommending StackOverlfow posts
MNBN – Recommending GitHub topics
31. 31
Overview of CrossSim
Graphs for representing different kinds
of relationships in the OSS ecosystem
• e.g., developers commit to repositories,
users star repositories, projects contain
source code files, etc.
Cross Project Relationships for Computing Open Source Software Similarity
32. 32
CrossSim – Recommending similar projects
CrossRec – Recommending third-party libraries
FOCUS – Recommending API function calls and usage patterns
MNBN – Recommending GitHub topics
PostFinder - Recommending StackOverlfow posts
MNBN – Recommending GitHub topics
34. 3434
R1 R2 R3
C1 5 5 2
C2 3 3 4
C3 5 5 ?
◼ User-item matrix: Ratings given to Pizza
restaurants by customers
◼ Unknown ratings can be deduced from the most
similar customers
34CROSSMINER Lisbon Meeting, 27-28 February 2018
Collaborative-Filtering Recommendation
35. 35CROSSMINER Lisbon Meeting, 27-28 February 2018
◼ Representing the project-library relationships using a user-item
ratings matrix
◼ Predict the inclusion of additional libraries
CrossRec: Projects-Libraries Representation
36. 36
CrossSim – Recommending similar projects
CrossRec – Recommending third-party libraries
FOCUS – Recommending API function calls and usage patterns
MNBN – Recommending GitHub topics
PostFinder - Recommending StackOverlfow posts
MNBN – Recommending GitHub topics
37. 37
Problem
“Which API methods should this piece of client code
invoke, considering that it has already invoked these
other API methods?”
52. 52
Requirement elicitation phase: main challenge
Clear understanding of the needed recommendation systems:
• Understanding the functionalities that are expected from the final users of the envisioned
recommendation
• You might risk spending time on developing systems that are able to provide
recommendations, which instead might not be relevant and inline with the actual user
needs.
53. 53
Requirement elicitation phase: main challenge
Solution employed in CROSSMINER
– We implemented demo projects that reflected real-world scenarios
– Explanatory context inputs and corresponding recommendation items that the
envisioned recommendation systems should have been able to produce.
54. 54
Development phase: main challenge
Clear awareness of existing recommendation techniques
– Knowledge of techniques and patterns that might be employed
– Comparing and evaluating candidate approaches can be a very daunting task
55. 55
Development phase: main challenge
Applied solution
– Significant effort has been devoted to analyze existing approaches that might
have been used as starting points.
Data Preprocessing Capturing Context
Producing
Recommendations
Presenting
Recommendations
57. 57
Evaluation phase: main challenge
There is no golden rule for evaluating all possible recommendation
systems due to their intrinsic features as well as heterogeneity
– Which evaluation methodology is suitable?
– Which metric(s) can be used?
– Which dataset is eligible/available for evaluation?
– Which baseline(s) can be compared with?
58. 58
Lessons learned
User scepticism: target users might be sceptical about the relevance of
the potential items that can be recommended
Quality of data: importance of having the availability of big data and
high-quality data for training and evaluation activities
Baseline availability: Not always it is possible to reuse tools and data of
the identified baselines
59. 59
Lessons learned
In the case of the FOCUS evaluation, one of the considered datasets
was initially consisting of 5,147 Java projects retrieved from the
Software Heritage archive
To comply with the requirements of the baseline and of FOCUS, we had
to restrict the dataset
- we ended up with a dataset consisting of 610 Java projects
- we had to create a dataset ten times bigger than the used one for
the evaluation
61. 61
Model recommenders
A recommender system for model driven software
engineering can combine data from different sources in
order to infer a list of relevant and actionable model
changes in real time.
Stefan Kögel, Recommender system for model driven software development
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of
Software Engineering
63. 63
Model recommenders
Mussbacher, G., Combemale, B., Kienzle, J. et al. Opportunities in
intelligent modeling assistance. Softw Syst Model 19, 1045–1053 (2020).
65. 65
Google’s AI-related software
The lines of code in Google’s AI-related software
D. Sculley et al., Hidden technical debt in machine learning systems, in Proc. 28th Int. Conf. Neural Information Processing Systems,
vol. 2. Cambridge, MA: MIT Press, pp. 2503–2511. [Online]. Available: http://dl.acm.org/citation .cfm?id=2969442.2969519
67. 67
Model recommenders
The devil is in the details data
The availability of source code forges enabled so
many research directions and possibilities in EMSE
What’s the situation concerning
repositories of modeling artifacts?
68. 68
Model recommenders
The devil is in the details data
The availability of source code forges enabled so
many research directions and possibilities in EMSE
What’s the situation concerning
repositories of modeling artifacts?
All of them seem to struggle in
attracting contributions from the
community
69. 69
CloudMDE 2015
Model-Driven Engineering on and for the Cloud
Proceedings of the
3rd International Workshop on Model-Driven Engineering on and for the Cloud
18th International Conference on Model Driven Engineering Languages and Systems
(MoDELS 2015)
Ottawa, Canada, September 29, 2015.
Edited by Richard Paige, Jordi Cabot, Marco Brambilla, James H. Hill
70. 70
CloudMDE 2015
Model-Driven Engineering on and for the Cloud
Proceedings of the
3rd International Workshop on Model-Driven Engineering on and for the Cloud
18th International Conference on Model Driven Engineering Languages and Systems
(MoDELS 2015)
Ottawa, Canada, September 29, 2015.
Edited by Richard Paige, Jordi Cabot, Marco Brambilla, James H. Hill
71. 71
My main points to conclude
The devil is in the details
My “fear” is that:
- technologies are there
- knowledge and expertise are there
But we are missing the necessary raw material
- there are alternatives (e.g., use of synthetic data) even though they
might enable only sub-optimal solutions
data