The power of unstructured data: Recommendation systems
1. The Power of Unstructured Data
Olga Scrivner, PhD
Research Scientist, CNS, Indiana University
Visiting Lecturer, Data Science Program, Indiana University
Corporate Faculty, Data Analytics, Harrisburg University of Science & Technology
Recommendation Systems
2. Transforming Data into Insights
80% of data will be unstructured
(IDC)
Data-Driven Decision Making (credits: PwC)
“Information is the
currency of this
digital age”
Carly Fiorina, Former CEO
of HP
2025
1 zettabyte = 1021 bytes
1 175 zettabytes of data globally
(IDC)
85% of customer interaction will
be without human interaction
(Gartner)
2
3
3. Use Cases
Jim Kitterman. 2018. The Why behind the What.
Banking (Fraud prediction
& Recommendations)
Human Resources
(Automated HR)
Marketing (Automated
Customer service)
Retail (Product
Recommendations)
Two of the leading drivers for AI adoption are delivering a
better customer experience and helping employees to get
better at their jobs (IDC, 2020)
Leading AI Use Cases: automated customer service agents, recommendation, and automation
5. Formal Language
(Chiang, 2018)
Natural Language
- Full of ambiguity
- Use of contextual
clues and other
information
Ambiguity
- Nearly or completely
unambiguous
- Any statement has exactly
one meaning, regardless of
context
- Verbose to reduce
ambiguity
- Redundant
Redundancy- Concise
- Less redundant
- More than one
meaning
- Many idioms and
metaphors
Literalness- Exactly one meaning
She spilled the beans
http://www.idioms4you.com/complete-idioms/spill-the-beans.html
https://www.quora.com/When-was-the-first-English-idiom-used-Why-was-it-used
7. Sarkar, D. 2018. Deep Learning Methods for Text Data – Word2Vec, GloVe, FastText. Towards Data Science
Based on distributed representations (a dense
representations of words in a low-dimensional vector
space): Word2Vec, FastText
Prediction-Based
Models
Word is associated with a
continuous vector
representation
NLP Feature Extractions
Count-based: TF, TF-IDF, N-grams
Bag-of-Words
Models
9. NLP Application – Recommender System
1
2
3
Improving with Use: Customer retention
Improving Cart Value: Filter system (Amazon)
Improving Engagement: Using subscriptions (YouTube)
Corinna Underwood. 2020. Use Cases of Recommendation Systems.
10. Recommendation System Types
Collaborative
Filtering
Shortcoming: Cold Start Problem
Content-Based
Systems
User-Based
Users Similarity
(Classification task)
Item-Based
Items Similarity based on
Ratings (Pearson)
Similarity between Features
(Nearest Neighbor)
User Likes and Feedback
Rounak Banik. 2018. Hands-On Recommendation Systems with Python.
12. Job Description Preprocessing
Data: Kaggle - job-recommendation-datasets
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.
1. Remove stop words
2. Remove not alphanumeric characters
3. Lemmatize the columns
4. Extract features (TF-IDF)
5. Use Cosine similarity (scores close to
one = more similarity between items)
Combined title, company, city, job type, description
vector1 vector2
Euclidean Distance
components of vectors
13. What is Next?
Career path recommendation
Skill recommendation
Course recommendation
e-recruiting
Graph-Based approach + NLP
Job recommendation
(Zhu et al., 2020)