This document describes a literature recommendation software that provides paper recommendations based on a paragraph input. It preprocesses data, extracts topics using LDA and NMF models, and finds similar papers using cosine similarity. It found that cleaning, tokenization, stop word removal and lemmatization worked best for preprocessing. LDA performed better for short inputs while NMF was better for longer inputs, but after getting user feedback, NMF performed better overall. Future work includes trying additional models like BERT and using user feedback to improve recommendations.
2. Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
5. Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input
➢ Motivation
○ First recommender system based on just a paragraph input
○ Specific area based paper recommendation
○ Wide area to try different technique combinations
○ Make easier the writing thesis
○ Time saving
○ Specific domain
6. Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
7. ➢ Related Works
○ Scienstein: A Research Paper Recommender System
■ Paper recommender
■ Hybrid filtering
■ Citation, author and source analysis
■ Preliminary data (citation analysis, author analysis, source analysis )
○ Science Concierge: A Fast Content-Based Recommendation System for
Scientific Publications
■ Paper recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (users’ votes)
○ ScienceDirect: Topic Modeling Driven Content-Based Jobs Recommendation
Engine for Recruitment Industry
■ Job recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (job description, user details)
Related Works
8. Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
22. Results
➢ Validation with user feedback
○ Before user feedback
■ Accuracy with content 3
● LDA is better than NMF
■ Accuracy with content 10
● NMF is better than LDA
○ After user feedback
■ NMF is better than LDA
23. Agenda
➢ Introduction
○ Problem Statement
○ Motivation
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
24. Conclusion & Future Works
➢ Conclusion
○ Found optimal data preprocessing model
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Compared 2 different topic modelling techniques
■ LDA, and NMF
○ Compared model accuracies
○ User ratings
■ Models with LDA, and NMF
25. ➢ Future Works
➢ Try another techniques such as BERT and check if the result of these
techniques give better result on user rating feedback.
➢ Use user ratings to improve recommendation system
➢ Add new features to the website
➢ Try different topic modellings
➢ Try different similarity functions
➢ Train a model use the extracted topics
➢ Tune the hyperparameters according to new techniques
Conclusion & Future Works
26. Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions