Literature Recommendation Software

Literature Recommendation Software
Faruk Cankaya
Melike Keskin
Supervisor: Florian Schramm
Professor: Prof. Dr. Jürgen Ernstberger
April 15, 2021

Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions

Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input

Introduction
➢ Keyword based input (X)
➢ Reference based recommendation (X)
➢ Mostly cited papers (X)

https://images.unsplash.com/photo-1526721940322-10fb6e3ae94a?utm_medium=medium&w=700&q=50&auto=format
https://cdn-images-1.medium.com/max/880/0*LHnFAic3Jw4N_IdP
https://images.unsplash.com/photo-1532012197267-da84d127e765?utm_medium=medium&w=700&q=50&auto=format

Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input
➢ Motivation
○ First recommender system based on just a paragraph input
○ Specific area based paper recommendation
○ Wide area to try different technique combinations
○ Make easier the writing thesis
○ Time saving
○ Specific domain

➢ Related Works
○ Scienstein: A Research Paper Recommender System
■ Paper recommender
■ Hybrid filtering
■ Citation, author and source analysis
■ Preliminary data (citation analysis, author analysis, source analysis )
○ Science Concierge: A Fast Content-Based Recommendation System for
Scientific Publications
■ Paper recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (users’ votes)
○ ScienceDirect: Topic Modeling Driven Content-Based Jobs Recommendation
Engine for Recruitment Industry
■ Job recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (job description, user details)
Related Works

Methodology
➢ Used Method
○ Content-based
○ Data Preprocessing
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Topic modelling
■ LDA
■ NMF
○ Similarity Function
■ Cosine Similarity

➢ Data preparation
○ Number of documents: ~12.000 papers
○ Tokenization, Cleaning text, Stop word removal, Stemming,
Lemmatization, Synonym replacement, POS, etc.
Our Model:
Cleaning + Tokenization + Stop Word Removing + Lemmatization
Methodology

Methodology
➢ Vectorization
Vectorization
● Bag of words
● TF-IDF……...
Preprocessed input
text
Vectorized data

Methodology
➢ Vectorization
○ Bag-Of-Words
○ TF-IDF
terms, features or corpus
items or
documents

Methodology
➢ Topic Extraction
○ Applied Topic Modeling Technique
■ LDA
■ NMF

Methodology
Vectorized data
➢ Topic Extraction
Terms in each topic
Topic Probability of each document

Methodology
➢ Prediction / Recommendation
○ based on Cosine Similarity
Topic Probability
Matrix of dataset
Topic Probability
Vector of input

Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Finding Similarity
➢ Results
➢ Questions

Results
➢ Data preprocessing steps effect

Results
➢ Number of Words in User Input

Results
➢ Validation with user feedback
○ Before user feedback
■ Accuracy with content 3
● LDA is better than NMF
■ Accuracy with content 10
● NMF is better than LDA
○ After user feedback
■ NMF is better than LDA

Agenda
➢ Introduction
○ Problem Statement
○ Motivation
➢ Related Works
➢ Methodology
○ Finding Similarity
➢ Results
➢ Questions

Conclusion & Future Works
➢ Conclusion
○ Found optimal data preprocessing model
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Compared 2 different topic modelling techniques
■ LDA, and NMF
○ Compared model accuracies
○ User ratings
■ Models with LDA, and NMF

➢ Future Works
➢ Try another techniques such as BERT and check if the result of these
techniques give better result on user rating feedback.
➢ Use user ratings to improve recommendation system
➢ Add new features to the website
➢ Try different topic modellings
➢ Try different similarity functions
➢ Train a model use the extracted topics
➢ Tune the hyperparameters according to new techniques
Conclusion & Future Works

Literature Recommendation Software

Recommended

Recommended

More Related Content

Similar to Literature Recommendation Software

Similar to Literature Recommendation Software (20)

Recently uploaded

Recently uploaded (20)

Literature Recommendation Software