In this paper, we propose a Personalized Paper Recommender System, a new user-paper based approach that takes into consideration the user academic curriculum vitae. To build the user profiles, we use a Brazilian academic platform called CV-Lattes. Furthermore, we examine some issues related to user profiling, such as (i) we define and compare different strategies to build and represent the user profiles, using terms and using concepts; (ii) we verify how much past information of a user is required to provide good recommendations; (iii) we compare our approaches with the state-of-art in paper recommendation using the CV-Lattes. To validate our strategies, we conduct a user study experiment involving 30 users in the Computer Science domain. Our results show that (i) our approaches outperform the state-of-art in CV-Lattes; (ii) concepts profiles are comparable with the terms profiles; (iii) analyzing the content of the past four years for terms profiles and five years for concepts profiles achieved the best results; and (iv) terms profiles provide better results but they are slower than concepts profiles, thus, if the system needs real time recommendations, concepts profiles are better.
2. Warm up!
• How does research in Brazil work?
• What is Lattes?
• Why is Lattes a big deal in Brazil?
3. Lattes = Opportunity
• The information available in Lattes creates a
great opportunity to recommend science
related content to researchers in Brazil…
– Projects
– Contributors
– Call for Papers
– Papers
• … and to test different algorithtms!
4. Our Work
• In this paper:
– (1) We present a Personalized Paper
Recommender System
• A user-paper approach that takes into consideration
the Lattes information.
– (2) We test different profiling strategies
– (3) We test how much older information is
necessary in order to provide better
recommendations.
– (4) We compare our strategy with state of art
6. Profiling Strategies
• Concepts Profile
–The vector is composed of predefined
concepts.
• Terms Profile
–The vector is composed of the set of
terms that compose the dictionary.
7. Research Questions
• 𝑄1: How many years of the user curriculum
are necessary to use in order to provide great
recommendations?
• 𝑄2: Is there any difference between the
concepts profile and terms profile?
• 𝑄3: Is Lopes’s algorithm better than them?
• 𝑄4: Which method should we choose?
8. Evaluation
• To answer our research questions, we
conducted a user study experiment
• We developed a system to collect user’s
impression about a set of papers
9. Evaluation
• We used user’s impressions to ranking the
papers and compared with the outcome of
the recommendation algorithms to answer
our research questions
10. Results
The NDCG@5, NDCG@10 and length means of the methods of generated
profiles. We execute Shapiro-Wilk test to verify the data normality. The
symbol (*) indicates that the data is not normally distributed, i.e.,
𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05
11. Results
Results of hypothesis testes performed to compare the strategis. Both tests are
performed with parameters 𝛼 = 0.05, alternative = “greater”, paired=TRUE
(>> and > denote significance levels pof 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.01 and 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05,
respectively).
12. 𝑄1: How many years of the user curriculum are
necessary to use in order to provide great
recommendations?
Answer: It depends of the profiling strategy: four
years for TP and five years for CP, apparently.
13. 𝑄2: Is there any difference between the
concepts profile and terms profile?
Answer: Comparing the terms profile TP4 with
concepts profile CP5, we verify a not statically
proved superiority. Thus, there is no difference.
14. 𝑄3: Is Lopes’s algorithm better than them?
Answer: Yes, both approaches (TP4 and CP5)
achieved statiscaly better performance than
Lopes.
15. 𝑄4: Which method should we choose?
Answer: It depends on the context, because there is a
trade-off between the techniques. If the system needs an
online recommendation with reasonable quality, the CP
profiles are the best choice. On the other hand, if the
systems can compute the recommendations offline, and
the time consuming is not a problem, the T P is better.
16. Conclusion and Future Work
• We presented and evaluated our approach to a
paper Recommender System that considers the
user curriculum crawled from the CV-Lattes
• Our main contributions are:
– Our algorithms achieved better performance than
state-of-art paper recommendation algorithm dealing
with Lattes
– We observed no statistical difference between both
profiling strategies.
– We build a dataset that can be used for future
research in the are
17. Conclusion and Future Work
• Our planning:
– To confront our results with data from others CV-
oriented networks
– To work in a integration algorithm to combine
data from multiple sources
– To improve the recommendation model using
paper related information