SlideShare a Scribd company logo
Adam Warski

Recommendation
systems

5/11/2013 Warszawa-JUG

Introduction using Mahout

@adamwarski
Agenda

❖

What is a recommender?!

❖

Types of recommenders, input data!

❖

Recommendations with Mahout: single node!

❖

Recommendations with Mahout: multiple nodes

5/11/2013 Warszawa-JUG

@adamwarski
Who Am I?
❖

Day: coding @ SoftwareMill!

❖

Afternoon: playgrounds, Duplos, etc.!

❖

Evenings: blogging, open-source!
❖

Original author of Hibernate Envers!

❖

ElasticMQ, Veripacks, MacWire!

❖

http://www.warski.org

5/11/2013 Warszawa-JUG

@adamwarski
Me + recommenders
❖

Yap.TV!

❖

Recommends shows to users
basing on their favorites!
❖
❖

❖

from Yap!
from Facebook!

Some business rules

5/11/2013 Warszawa-JUG

@adamwarski
Some background

5/11/2013 Warszawa-JUG

@adamwarski
Information …

❖

Retrieval!
❖

❖

given a query, select items!

Filtering!
❖

given a user, filter out irrelevant items

5/11/2013 Warszawa-JUG

@adamwarski
What can be recommended?

❖

Items to users!

❖

Items to items!

❖

Users to users!

❖

Users to items

5/11/2013 Warszawa-JUG

@adamwarski
Input data

❖

(user, item, rating) tuples!
❖

❖

rating: e.g. 1-5 stars!

(user, item) tuples!
❖

unary

5/11/2013 Warszawa-JUG

@adamwarski
Input data
❖

Implicit!
❖
❖

reads!

❖
❖

clicks!

watching a video!

Explicits!
❖

explicit rating!

❖

favorites

5/11/2013 Warszawa-JUG

@adamwarski
Prediction vs recommendation
❖

Recommendations!
❖
❖

❖

suggestions!
top-N!

Predictions!
❖

ratings

5/11/2013 Warszawa-JUG

@adamwarski
Collaborative Filtering

❖

Something more than search keywords!
❖
❖

❖

tastes!
quality!

Collaborative: data from many users

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders

❖

Define features and feature values!

❖

Describe each item as a vector of features

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders
❖

User vector!
❖
❖

❖

e.g. counting user tags!
sum of item vectors!

Prediction: cosine between two vectors!
❖

profile!

❖

item

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Measure similarity between users!

❖

Prediction: weighted combination of existing ratings

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Domain must be scoped so that there’s agreement!

❖

Individual preferences should be stable

5/11/2013 Warszawa-JUG

@adamwarski
User similarity

❖

Many different metrics!
!

❖

Pearson correlation:

5/11/2013 Warszawa-JUG

@adamwarski
User neighbourhood
❖

Threshold on similarity!

❖

Top-N neighbours!
❖

❖

25-100: good starting point!

Variations:!
❖

trust networks, e.g. friends in a social network

5/11/2013 Warszawa-JUG

@adamwarski
Predictions in UU-CF

5/11/2013 Warszawa-JUG

@adamwarski
Item-Item CF

❖

Similar to UU-CF, only with items!

❖

Item similarity!
❖

Pearson correlation on item ratings!

❖

co-occurences

5/11/2013 Warszawa-JUG

@adamwarski
Evaluating recommenders
❖

How to tell if a recommender is good?!
❖
❖

❖

compare implementations!
are the recommendations ok?!

Business metrics!
❖

what leads to increased sales

5/11/2013 Warszawa-JUG

@adamwarski
Leave one out

❖

Remove one preference!

❖

Rebuild the model!

❖

See if it is recommended!

❖

Or, see what is the predicted rating

5/11/2013 Warszawa-JUG

@adamwarski
Some metrics

❖

Mean absolute error:!

❖

Mean squared error

5/11/2013 Warszawa-JUG

@adamwarski
Precision/recall
❖

Precision: % of recommended items that are relevant!

❖

Recall: % of relevant items that are recommended!

❖

Precision@N, recall@N
Recall

Ideal recs

Precision
Recs
5/11/2013 Warszawa-JUG

All items
@adamwarski
Problems
❖

Novelty: hard to measure!
❖

❖

leave one out: punishes recommenders which
recommend new items!

Test on humans!
❖

A/B testing

5/11/2013 Warszawa-JUG

@adamwarski
Diversity/serendipity
❖

Increase diversity:!
❖
❖

❖

top-n!
as items come in, remove the ones that are too similar
to prior items!

Increase serendipity:!
❖

downgrade popular items

5/11/2013 Warszawa-JUG

@adamwarski
Netflix challenge

❖

$1m for the best recommender!

❖

Used mean error!

❖

Winning algorithm won on predicting low ratings

5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node

5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node
❖

User-User CF!

❖

Item-Item CF!

❖

Various metrics!
❖
❖

Pearson!

❖
❖

Euclidean!

Log-likelihood!

Support for evaluation

5/11/2013 Warszawa-JUG

@adamwarski
Let’s code

5/11/2013 Warszawa-JUG

@adamwarski
Mahout multi-node
❖

Based on Hadoop!

❖

Pre-computed recommendations!

❖

Item-based recommenders!
❖

❖

Co-occurrence matrix!

Matrix decomposition!
❖

Alternating Least Squares

5/11/2013 Warszawa-JUG

@adamwarski
Running Mahout w/ Hadoop
hadoop jar mahout-core-0.8-job.jar !
! org.apache.mahout.cf.taste.hadoop.item.RecommenderJob !
! --booleanData !
! --similarityClassname SIMILARITY_LOGLIKELIHOOD !
! --output output !
! --input input/data.dat

5/11/2013 Warszawa-JUG

@adamwarski
Links
❖

http://www.warski.org!

❖

http://mahout.apache.org/!

❖

http://grouplens.org/!

❖

http://graphlab.org/graphchi/!

❖

https://class.coursera.org/recsys-001/class!

❖

https://github.com/adamw/mahout-pres

5/11/2013 Warszawa-JUG

@adamwarski
Thanks!

❖

Questions?!

❖

Stickers ->!

❖

adam@warski.org

5/11/2013 Warszawa-JUG

@adamwarski

More Related Content

More from Adam Warski

What have the annotations done to us?
What have the annotations done to us?What have the annotations done to us?
What have the annotations done to us?
Adam Warski
 
Hejdyk maleńka część mazur
Hejdyk maleńka część mazurHejdyk maleńka część mazur
Hejdyk maleńka część mazur
Adam Warski
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
Adam Warski
 
Evaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesEvaluating persistent, replicated message queues
Evaluating persistent, replicated message queues
Adam Warski
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
Adam Warski
 
ElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS server
Adam Warski
 
The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
Adam Warski
 
Scala Macros
Scala MacrosScala Macros
Scala Macros
Adam Warski
 
CDI Portable Extensions
CDI Portable ExtensionsCDI Portable Extensions
CDI Portable Extensions
Adam Warski
 

More from Adam Warski (9)

What have the annotations done to us?
What have the annotations done to us?What have the annotations done to us?
What have the annotations done to us?
 
Hejdyk maleńka część mazur
Hejdyk maleńka część mazurHejdyk maleńka część mazur
Hejdyk maleńka część mazur
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
 
Evaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesEvaluating persistent, replicated message queues
Evaluating persistent, replicated message queues
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
ElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS server
 
The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
 
Scala Macros
Scala MacrosScala Macros
Scala Macros
 
CDI Portable Extensions
CDI Portable ExtensionsCDI Portable Extensions
CDI Portable Extensions
 

Recently uploaded

Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
alexjohnson7307
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
Tech Guru
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
DianaGray10
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
Enterprise Knowledge
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
isBullShit
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
DianaGray10
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
kk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdfkk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdf
KIRAN KV
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
AimanAthambawa1
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
bellared2
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 

Recently uploaded (20)

Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
kk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdfkk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdf
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 

Recommendation systems with Mahout: introduction