Movie2Books by Sumin Tang

•Download as PPTX, PDF•

1 like•390 views

tangsm

Data & Analytics Entertainment & Humor Education

Recommending Books from your favorite Movie
Sumin Tang
Movie2Books.com

Data Sources and Processing Flow
20M Reviews
Genres
2800 Movies & 1000 Books
with 20+ reviews & no missing attribute
Similarity Scores
Book Recommendations for Each Movie
Book cover images

Collaborative filtering using user rating scores?
Unfortunately the data is too sparse...
Poor performance even after SVD
80% movie-book pairs have
0 common user

Similarity Metrics for Movie-Book Pairs
Review Text
Cosine Similarity (C)
Genres
Jaccard Similarity (J)
Final Similarity Score

Validation
Users liked movie A
Users liked book B
Users liked both
• Based on rating scores from
users who rated both
movies and books
• For each movie, calculate
Jaccard index between the
movie and:
– Jrec: recommended books
– Jbase: all the books
• Median(Jrec/Jbase)=26:
people are 26x more likely to
like Movie2Books
recommendation than the
random baseline

Sumin Tang
https://www.linkedin.com/in/sumintang

Out of 20 million reviews
from 3.7 million users,
about half of the reviews
were provided by 10% of
the users.
Books
Movies
Top 10% users
Top 10% users
Some fun stuff…

Is this a highly rated movie at Amazon?
Don’t like it Really like it

Is this a highly rated movie at Amazon?
=
Ratings of the Movie Ratings of All Movies Re-scaled scores=
=

$Most vs Least Reviewed Items • Both have very skewed distribution in ratings, with mode being 5 • The most reviewed items have higher fraction of 5s: popular products are indeed more liked by people. Books Movies$

Most vs Least
Active Users
The least active users
give more bad ratings
(score=1):
they are more likely
to write a review if
they really don’t like
the product?
Books
Movies

Viewers also liked

{世界摄影全集：人体艺术卷].minjie wu

[世界摄影全集：静物景物动物卷]minjie wu

ウォレットセキュリティーレビューJonathan Underwood

Russian Revolution 3.shadevan p k

ビットコインにおける「マルチシグ」とはJonathan Underwood

楕円曲線セキュリティーJonathan Underwood

ビットコイン～トランザクション展性についてJonathan Underwood

Viewers also liked (7)

{世界摄影全集：人体艺术卷].

[世界摄影全集：静物景物动物卷]

ウォレットセキュリティーレビュー

Russian Revolution 3.

ビットコインにおける「マルチシグ」とは

楕円曲線セキュリティー

ビットコイン～トランザクション展性について

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制vexqp

Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14

Sequential and reinforcement learning for demand side management by Margaux B...Paris Women in Machine Learning and Data Science

怎样办理伦敦大学毕业证（UoL毕业证书）成绩单学校原版复制vexqp

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg

Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila

Digital Transformation Playbook by Graham WareGraham Ware

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation

Discover Why Less is More in B2B Researchmichael115558

Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131

怎样办理伦敦大学城市学院毕业证（CITY毕业证书）成绩单学校原版复制vexqp

Ranking and Scoring Exercises for ResearchRajesh Mondal

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark

Data Analyst Tasks to do the internship.pdftheeltifs

SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu

怎样办理旧金山城市学院毕业证（CCSF毕业证书）成绩单学校原版复制vexqp

Recently uploaded (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制

Lecture_2_Deep_Learning_Overview-newone1

Sequential and reinforcement learning for demand side management by Margaux B...

怎样办理伦敦大学毕业证（UoL毕业证书）成绩单学校原版复制

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

Aspirational Block Program Block Syaldey District - Almora

Digital Transformation Playbook by Graham Ware

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange

Discover Why Less is More in B2B Research

Dubai Call Girls Peeing O525547819 Call Girls Dubai

怎样办理伦敦大学城市学院毕业证（CITY毕业证书）成绩单学校原版复制

Ranking and Scoring Exercises for Research

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格

Data Analyst Tasks to do the internship.pdf

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation

怎样办理旧金山城市学院毕业证（CCSF毕业证书）成绩单学校原版复制

Movie2Books by Sumin Tang

1. Recommending Books from your favorite Movie Sumin Tang Movie2Books.com

3. Data Sources and Processing Flow 20M Reviews Genres 2800 Movies & 1000 Books with 20+ reviews & no missing attribute Similarity Scores Book Recommendations for Each Movie Book cover images

4. Collaborative filtering using user rating scores? Unfortunately the data is too sparse... Poor performance even after SVD 80% movie-book pairs have 0 common user

5. Similarity Metrics for Movie-Book Pairs Review Text Cosine Similarity (C) Genres Jaccard Similarity (J) Final Similarity Score

6. Validation Users liked movie A Users liked book B Users liked both • Based on rating scores from users who rated both movies and books • For each movie, calculate Jaccard index between the movie and: – Jrec: recommended books – Jbase: all the books • Median(Jrec/Jbase)=26: people are 26x more likely to like Movie2Books recommendation than the random baseline

7. Sumin Tang https://www.linkedin.com/in/sumintang

9. Out of 20 million reviews from 3.7 million users, about half of the reviews were provided by 10% of the users. Books Movies Top 10% users Top 10% users Some fun stuff…

10. Is this a highly rated movie at Amazon? Don’t like it Really like it

11. Is this a highly rated movie at Amazon? = Ratings of the Movie Ratings of All Movies Re-scaled scores= =

12. Most vs Least Reviewed Items • Both have very skewed distribution in ratings, with mode being 5 • The most reviewed items have higher fraction of 5s: popular products are indeed more liked by people. Books Movies

13. Most vs Least Active Users The least active users give more bad ratings (score=1): they are more likely to write a review if they really don’t like the product? Books Movies

Movie2Books by Sumin Tang

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Recently uploaded

Recently uploaded (20)

Movie2Books by Sumin Tang