SlideShare a Scribd company logo
1 of 76
Download to read offline
MEDIA, DATA, CONTEXT...
And The Holy Grail of User Taste Prediction




             Xavier Amatriain




                MAT, UCSB
          Santa Barbara, March '11
But first...




About me and Telefonica
About me
Up until 2005
About me
2005 ­ 2007
About me
2007 ­ ..
Telefonica is a fast-growing Telecom


                            1989                   2000                    2008
  Clients                        About 12          About 68          About 260
                                  million           million           million
                                subscribers       customers          customers
 Services                        Basic        Wireline and mobile    Integrated ICT
                            telephone and       voice, data and     solutions for all
                             data services     Internet services       customers
Geographies
                                                 Operations in      Operations in
                              Spain                                 25 countries
                                                 16 countries

   Staff
                     About 71,000                About 149,000         About 257,000
                     professionals                professionals         professionals

 Finances                   Rev: 4,273 M€       Rev: 28,485 M€      Rev: 57,946 M€
                            EPS(1): 0.45 €      EPS(1): 0.67 €        EPS: 1.63 €
            (1) EPS: Earnings per share
Currently among the largest in the world
       Telco sector worldwide ranking by market cap (US$ bn)




                                                         Source: Bloomberg, 06/12/09




       Just announced 2010 results: record net earnings, 
       first Spanish company ever to make > 10B €
Leader in South America

Data as of March ‘09



 1      2 Argentina: 20.9 million                                                            Wireline market rank 
 2      1 Brazil: 61.4 million                                                               Mobile market rank
        2 Central America: 6.1 million
 1      2 Colombia: 12.6 million
 1      1 Chile: 10.1 million
        2 Ecuador: 3.3 million
        2 Mexico: 15.7 million
 1      1 Peru: 15.2 million
        1
          Uruguay: 1.5 million
        2 Venezuela: 12.0 million
                                                                      Total Accesses (as of March ‘09)
                                                                                       159.5 million

Notes:
- Central America includes Guatemala, Panama, El Salvador and Nicaragua
- Total accesses figure includes Narrowband Internet accesses of Terra Brasil and Terra Colombia, and Broadband
Internet accesses of Terra Brasil, Telefónica de Argentina, Terra Guatemala and Terra México.
And a significant footprint in Europe

                                      Wireline market rank
                                      Mobile market rank
Data as of March ‘09



                             1 1   Spain: 47.2 million
                                 1 UK: 20.8 million
                                 4 Germany: 16.0 million
                                 2
                                   Ireland: 1.7 million
                                   Czech Republic: 7.7 million
                             1   2
                                   Slovakia: 0.4 million
                                 3


                                     Total Accesses (as of March ’09)
                                              93.8 million
Scientific Research
                  Mobile and Ubicomp
Multimedia Core                                    User Modelling &
                                                     Data Mining



                                            HCIR

                                                   DATA MINING



                                                        Wireless Systems
Content Distribution & P2P
                                Social Networks
Enough introductions...
Information Overload
More is Less
                         W
                          or
                            se
                                 D
                                  ec
                                    is
         ns
                                      io
       io

                                        ns
     is
   ec
  D
   s
 es
L
Search engines don’t always hold the answer
What about discovery?
What about curiosity?
What about information to help take decisions?
The Age of Search has come to
                           an end

... long live the Age of Recommendation!
●


●
    Chris Anderson in “The Long Tail”
    ●
        “We are leaving the age of information and entering the age of
        recommendation”
●
    CNN Money, “The race to create a 'smart' Google”:
    ●
        “The Web, they say, is leaving the era of search and entering
        one of discovery. What's the difference? Search is what you do
        when you're looking for something. Discovery is when
        something wonderful that you didn't know existed, or didn't
        know how to ask for, finds you.”
But, what are
               Recommender
                 Systems?



Read this!

             Attend this conference!
The value of recommendations

●
    Netflix: 2/3 of the movies rented are recommended
●
    Google News: recommendations generate 38% more
    clickthrough
●
    Amazon: 35% sales from recommendations
●
    Choicestream: 28% of the people would buy more music if
    they found what they liked.


      u
The “Recommender problem”

●
  Estimate a utility function that is able to
automatically predict how much a user will like an
item that is unknown for her. Based on:
    ●
        Past behavior
    ●
        Relations to other users
    ●
        Item similarity
    ●
        Context
    ●
        ...
Data mining +
           all those other things
●   User Interface
●   System requirements (efficiency, scalability,
    privacy....)
●   Business Logic
●   Serendipity
●   ....
The Netflix Prize

●   500K users x 17K movie
    titles = 100M ratings = $1M
    (if you “only” improve
    existing system by 10%!
    From 0.95 to 0.85 RMSE)
    ●   49K contestants on 40K teams from
        184 countries.
    ●   41K valid submissions from 5K
        teams; 64 submissions per day
    ●   Wining approach uses hundreds of
        predictors from several teams
Approaches to
                       Recommendation
Collaborative Filtering
●

    ●
        Recommend items based only on the users past behavior
    ●
        User-based
        ●
            Find similar users to me and recommend what they liked
    ●
        Item-based
        ●
            Find similar items to those that I have previously liked

Content-based
●

    ●
        Recommend based on features inherent to the items

Social recommendations (trust-based)
●
What works

●
    It depends on the domain and particular problem
    ●
        As a general rule, it is usually a good idea to combine:
        Hybrid Recommender Systems
●
 However, in the general case it has been
demonstrated that (currently) the best isolated
approach is CF.
    ●
        Item-based in general more efficient and better but
        mixing CF approaches can improve result
    ●
        Other approaches can be hybridized to improve
        results in specific cases (cold-start problem...)
The CF Ingredients

● List of m Users and a list of n Items
● Each user has a list of items with associated opinion

  ● Explicit opinion - a rating score (numerical scale)


  ● Implicit feedback – purchase records or listening

    history
● Active user for whom the prediction task is performed


● A metric for measuring similarity between users


● A method for selecting a subset of neighbors


● A method for predicting a rating for items not rated by

the active user.

                                                        27
But ...
User Feedback is Noisy




                       DID YOU HEAR WHAT 
                             I LIKE??!!




...and limits Our Prediction
          Accuracy
The Magic Barrier

●   Magic Barrier = Limit on prediction accuracy
    due to noise in original data
●   Natural Noise = involuntary noise introduced by
    users when giving feedback
    ●   Due to (a) mistakes, and (b) lack of resolution in
        personal rating scale (e.g. In a 1 to 5 scale a 2 may mean the
        same than a 3 for some users and some items).
●   Magic Barrier >= Natural Noise Threshold
    ●   We cannot predict with less error than the
        resolution in the original data
Our related research questions
X. Amatriain, J.M. Pujol, N. Oliver (2009) "I like It... I like It Not: Measuring Users
 Ratings Noise in Recommender Systems", in UMAP 09


    ●   Q1. Are users inconsistent when providing
        explicit feedback to Recommender Systems via
        the common Rating procedure?
    ●   Q2. How large is the prediction error due to
        these inconsistencies?
    ●   Q3. What factors affect user inconsistencies?
Experimental Setup

●   100 Movies selected from Netflix dataset doing
    a stratified random sampling on popularity
●   Ratings on a 1 to 5 star scale
    ●   Special “not seen” symbol.
●   Trial 1 and 3 = random order; trial 2 = ordered
    by popularity
●   118 participants
User Feedback is Noisy

●   Users are inconsistent
●   Inconsistencies are not
    random and depend on
    many factors
    ●   More inconsistencies for mild
        opinions
    ●   More inconsistencies for
        negative opinions
    ●   How the items are presented
        affects inconsistencies
User’s ratings are far from
          ground truth




Pairwise comparison between trials, RMSE is already > 0.55 or > 0.69 in the best case
  (Netflix Prize was to get below 0.85 !!!)
Rate it Again
X. Amatriain, J.M. Pujol, N. Tintarev, N. Oliver (2009)"Rate it Again: Increasing
 Recommendation Accuracy by User re-Rating", 2009 ACM RecSys


     ●   Given that users are noisy… can we benefit from asking to
         rate the same movie more than once?

     ●   We propose an algorithm to allow for multiple ratings of the
         same <user,item> tuple.
         ●   The algorithm is subjected to two fairness conditions:
              – Algorithm should remove as few ratings as possible (i.e. only when
                there is some certainty that the rating is only adding noise)
              – Algorithm should not make up new ratings but decide on which of
                the existing ones are valid (no averaging, predicting...)
Re-rating Algorithm
• One source re­rating case:

                                             Examples:
                                             {3, 1}    →Ø
                                             {4}       →4
                                             {3, 4}    →3

                                             (2 source)
                                             {3, 4, 5}  →3


• Given the following milding function:   
Results
Rate it again

●   By asking users to rate items again we can
    remove noise in the dataset
    ●   Improvements of up to 14% in accuracy!
●   Because we don't want all users to re-rate all
    items we design ways to do partial denoising
    ●   Data-dependent: only denoise extreme ratings
    ●   User-dependent: detect “noisy” users
The value or a re-rating




                Adding new ratings
                increases performance
                of the CF algorithm
The value or a re-rating



                But you are better off
                doing re-rating than
                new ratings !!
The value or a re-rating



         And much better if you
         know which ratings to
         re-rate!!
Let's recap

●   Users are inconsistent
●   Inconsistencies can depend on many things
    including how the items are presented
●   Inconsistencies produce natural noise
●   Natural noise reduces our prediction accuracy
    independently of the algorithm
●   By asking users to rate items again we can
    remove noise and improve accuracy
But Crowds are not always wise




                           ●   Diversity of opinion
Conditions that are        ●   Independence
needed to guarantee the    ●   Decentralization
Wisdom in a Crowd          ●   Aggregation
Who Can we trust?
Crowds are not always wise




                vs.




        Who  won?
“It is really only experts 
who can reliably account 
   for their reactions”
The Wisdom of the Few
    X. Amatriain et al. "The wisdom of the few: a collaborative filtering
     approach based on expert opinions from the web", SIGIR '09
Expert-based CF
●   expert = individual that we can trust to have produced
    thoughtful, consistent and reliable evaluations (ratings) of
    items in a given domain
●   Expert-based Collaborative Filtering
    ●   Find neighbors from a reduced set of experts instead of
        regular users.
         1. Identify domain experts with reliable ratings
         2. For each user, compute “expert neighbors”
         3. Compute recommendations similar to standard kNN CF
User Study
●   57 participants, only 14.5 ratings/participant
●   50% of the users consider Expert-based CF to be
    good or very good
●   Expert-based CF: only algorithm with an average
    rating over 3 (on a 0-4 scale)
Advantages of the Approach

●   Noise                          ●   Cold Start problem
    ●   Experts introduce less         ●   Experts rate items as
        natural noise                      soon as they are
●   Malicious Ratings                      available
    ●   Dataset can be monitored
                                   ●   Scalability
        to avoid shilling              ●   Dataset is several order of
●   Data Sparsity                          magnitudes smaller
    ●   Reduced set of domain
                                   ●   Privacy
        experts can be motivated       ●   Recommendations can be
        to rate items                      computed locally
Architecture of the approach
Some implementations
J. Ahn and X. Amatriain et al. "Towards Fully Distributed and Privacy-preserving Recommendations via Expert
  Collaborative Filtering and RESTful Linked Data", Web Intelligence '10


     ●    A distributed Music Recommendation engine
Expert Music
Recommendations


              Powered by...
Some implementations (II)
J. Bachs and X. Amatriain et al. "Geolocated Movie Recommendations based on
  Expert Collaborative Filtering", Recsys '10
   ●   A geo-localized Mobile Movie Recommender
       iPhone App
Geo-localized Expert Movie
   Recommendations




                    Powered by...
Context Overload
≠
Mobile phones are “personal”
Mobile users tend to seek “fresh” content
Where is the nearest florist?
Where is that really cool cocktail bar
I went to last month?
Interesting things close to me?
Events near me?
Lost or in an unfamiliar place?
Context-aware Recommendations

●   A clear area of research and interest for
    companies: recommend me something that I
    like and is relevant in my current context.
    ●   Context = any variable that adds a new dimension
        to the 2D user-item problem (e.g. time, geolocation,
        weather...)
User micro-profiles
L. Baltrunas, X. Amatriain "Towards Time-Dependant Recommendation based on Implicit Feedback", in CARS
  (Context-aware Recommender Systems Workshop) Recsys '09


     ●   Our proposal is to represent a user by a
         hierarchy of micro-profiles where each micro-
         profile represents a class in the context variable
Multiverse Recommendation
A. Karatzoglou, X. Amatriain, L. Baltrunas, N. Oliver "Multiverse Recommendation: N-dimensional Tensor
 Factorization for Context-aware Collaborative Filtering", 2010 ACM Recsys Conference


     ●   A different approach: represent the contextual
         recommendation problem by n-dimensional
         matrices (aka Tensors)
Master Planner




Automatic and personalized tourist route recommendations,
         a new approach to discovering the world
Tourism 2.0
●   Tourism is not the same
    since the web
    appeared:
    –   People search for
        information on where to
        go online (reading blogs,
        in their social networks...)
    –   People buy tickets and
        hotel packages online
    –   People post pictures and
        discuss tips online
Tourism 3.0 – Going Mobile
        N. Tintarev, A. Flores, X. Amatriain (2010)"Off the beaten track - a mobile field study
         exploring the long tail of mobile tourist recommendations", 2010 Mobile HCI




●   The mobile web and smartphones are introducing yet another
    revolution
    ●   Tourists can now access information on the go:
        –   Looking for information on a sight
        –   Tips on where to go next
        –   Information about the weather
        –   ....
Master Planner



●   I am in SB, it's March and
    sunny, I have 6 hours to visit
    things and I am interested on
    music, art, literature, and
    sports
●   I need: An automatic tourist
    route recommender system
Master Planner
        ●   Completely automatic
            personalized/contextualized
            tourist recommender system
            ●   Generates automatic city
                models using web resources
            ●   Generates automatic user
                models from regular user
                profiles
            ●   Personalizes/contextualizes
                generic city models
            ●   Recommends optimized
                personalized routes taking
                into account constraints
                using AI techniques
Summary

➢   We need to build tools and approaches to help
    people navigate the abundance of media and
    information
➢   Recommender systems can help by
    leveraging the wisdom of the crowds
➢   But...
    ➢   User feedback is not always our ground truth
    ➢   Crowds are not always wise and we are better off
        using experts
    ➢   Context is becoming part of the content itself
Co-authors
●   Josep M. Pujol and Nuria Oliver (Telefonica)
    worked on Natural Noise and Wisdom of the
    Few projects
●   Neal Lathia (UCL, London), Haewook Ahn
    (KAIST, Korea), Jaewook Ahn (Pittsbourgh
    Univ.), and Josep Bachs (UPF, Barcelona) on
    Wisdom of the Few
●   Linas Baltrunas (Bolzano U., Italy),
    Alexandros Karatzoglou, Paulo Villegas, Toni
    Cebrian (Telefonica) worked on contextual
●   Miquel Ramirez (UPF, Barcelona) and Nava
    Tintarev (Telefonica) worked on Tourist
    Recommendations.
Conclusions

➢   Whether you are an engineer, an artist or a
    scientist (or all of the above), it is important to
    keep the “user” in mind
    ➢   Who are my “users”? (end-user, public, other
        scientists, a grant agency...)
    ➢   How will the output of my work affect users?
    ●   How can I obtain feedback from them?
    ➢   How can I use it?
    ➢   ...
    ➢
Thanks!


        Questions?

    Xavier Amatriain
          xar@tid.es
      http://xavier.amatriain.net
http://technocalifornia.blogspot.com
            @xamat

More Related Content

Similar to Media, data, context... and the Holy Grail of User Taste Prediction

User-driven Approaches to Recsys
User-driven Approaches to RecsysUser-driven Approaches to Recsys
User-driven Approaches to RecsysXavier Amatriain
 
The Beauty of Computing with People
The Beauty of Computing with PeopleThe Beauty of Computing with People
The Beauty of Computing with PeopleXavier Amatriain
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessFrederik De Bruyne
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiProfessor Lili Saghafi
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruptionUsman Qayyum
 
KM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and InnovationKM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and InnovationMYRA School of Business
 
Internet of Things - The Tip of the Iceberg or The Tipping Point
Internet of Things - The Tip of the Iceberg or The Tipping PointInternet of Things - The Tip of the Iceberg or The Tipping Point
Internet of Things - The Tip of the Iceberg or The Tipping PointDr. Mazlan Abbas
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietySURFnet
 
LVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - QualiaLVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - QualiaChris Evans
 
120": Future trends in IoT
120": Future trends in IoT120": Future trends in IoT
120": Future trends in IoTJIC
 
NETFLIX (BIG DATA ANALYTICS )
NETFLIX (BIG DATA ANALYTICS )NETFLIX (BIG DATA ANALYTICS )
NETFLIX (BIG DATA ANALYTICS )ANKUSH
 
Confluence2016
Confluence2016Confluence2016
Confluence2016Bebo White
 
Ps095 - London College of Fashion
Ps095 - London College of FashionPs095 - London College of Fashion
Ps095 - London College of FashionIan Jindal
 
Selecting Trustworthy Content Using Tags
Selecting Trustworthy Content Using TagsSelecting Trustworthy Content Using Tags
Selecting Trustworthy Content Using TagsDaniele Quercia
 
AICPA Leadership Retreat - Technology
AICPA Leadership Retreat - TechnologyAICPA Leadership Retreat - Technology
AICPA Leadership Retreat - TechnologyASAE
 

Similar to Media, data, context... and the Holy Grail of User Taste Prediction (20)

User-driven Approaches to Recsys
User-driven Approaches to RecsysUser-driven Approaches to Recsys
User-driven Approaches to Recsys
 
Being Social
Being SocialBeing Social
Being Social
 
Big Data RF
Big Data RFBig Data RF
Big Data RF
 
The Beauty of Computing with People
The Beauty of Computing with PeopleThe Beauty of Computing with People
The Beauty of Computing with People
 
It's all About the Data
It's all About the DataIt's all About the Data
It's all About the Data
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of business
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
 
KM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and InnovationKM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and Innovation
 
Internet of Things - The Tip of the Iceberg or The Tipping Point
Internet of Things - The Tip of the Iceberg or The Tipping PointInternet of Things - The Tip of the Iceberg or The Tipping Point
Internet of Things - The Tip of the Iceberg or The Tipping Point
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
 
LVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - QualiaLVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - Qualia
 
120": Future trends in IoT
120": Future trends in IoT120": Future trends in IoT
120": Future trends in IoT
 
NETFLIX (BIG DATA ANALYTICS )
NETFLIX (BIG DATA ANALYTICS )NETFLIX (BIG DATA ANALYTICS )
NETFLIX (BIG DATA ANALYTICS )
 
Invention and Innovation
Invention and InnovationInvention and Innovation
Invention and Innovation
 
Case Study - Jonno
Case Study - JonnoCase Study - Jonno
Case Study - Jonno
 
Confluence2016
Confluence2016Confluence2016
Confluence2016
 
Ps095 - London College of Fashion
Ps095 - London College of FashionPs095 - London College of Fashion
Ps095 - London College of Fashion
 
Selecting Trustworthy Content Using Tags
Selecting Trustworthy Content Using TagsSelecting Trustworthy Content Using Tags
Selecting Trustworthy Content Using Tags
 
AICPA Leadership Retreat - Technology
AICPA Leadership Retreat - TechnologyAICPA Leadership Retreat - Technology
AICPA Leadership Retreat - Technology
 

More from Xavier Amatriain

Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthXavier Amatriain
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19Xavier Amatriain
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateXavier Amatriain
 
AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachXavier Amatriain
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneXavier Amatriain
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AIXavier Amatriain
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyXavier Amatriain
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicineXavier Amatriain
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender SystemXavier Amatriain
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldXavier Amatriain
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleXavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedXavier Amatriain
 

More from Xavier Amatriain (20)

Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 update
 
AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Media, data, context... and the Holy Grail of User Taste Prediction

  • 1. MEDIA, DATA, CONTEXT... And The Holy Grail of User Taste Prediction Xavier Amatriain MAT, UCSB Santa Barbara, March '11
  • 2. But first... About me and Telefonica
  • 6. Telefonica is a fast-growing Telecom 1989 2000 2008 Clients About 12 About 68 About 260 million million million subscribers customers customers Services Basic Wireline and mobile Integrated ICT telephone and voice, data and solutions for all data services Internet services customers Geographies Operations in Operations in Spain 25 countries 16 countries Staff About 71,000 About 149,000 About 257,000 professionals professionals professionals Finances Rev: 4,273 M€ Rev: 28,485 M€ Rev: 57,946 M€ EPS(1): 0.45 € EPS(1): 0.67 € EPS: 1.63 € (1) EPS: Earnings per share
  • 7. Currently among the largest in the world Telco sector worldwide ranking by market cap (US$ bn) Source: Bloomberg, 06/12/09 Just announced 2010 results: record net earnings,  first Spanish company ever to make > 10B €
  • 8. Leader in South America Data as of March ‘09 1 2 Argentina: 20.9 million Wireline market rank  2 1 Brazil: 61.4 million Mobile market rank 2 Central America: 6.1 million 1 2 Colombia: 12.6 million 1 1 Chile: 10.1 million 2 Ecuador: 3.3 million 2 Mexico: 15.7 million 1 1 Peru: 15.2 million 1 Uruguay: 1.5 million 2 Venezuela: 12.0 million Total Accesses (as of March ‘09) 159.5 million Notes: - Central America includes Guatemala, Panama, El Salvador and Nicaragua - Total accesses figure includes Narrowband Internet accesses of Terra Brasil and Terra Colombia, and Broadband Internet accesses of Terra Brasil, Telefónica de Argentina, Terra Guatemala and Terra México.
  • 9. And a significant footprint in Europe Wireline market rank Mobile market rank Data as of March ‘09 1 1 Spain: 47.2 million 1 UK: 20.8 million 4 Germany: 16.0 million 2 Ireland: 1.7 million Czech Republic: 7.7 million 1 2 Slovakia: 0.4 million 3 Total Accesses (as of March ’09) 93.8 million
  • 10. Scientific Research Mobile and Ubicomp Multimedia Core User Modelling & Data Mining HCIR DATA MINING Wireless Systems Content Distribution & P2P Social Networks
  • 13. More is Less W or se D ec is ns io io ns is ec D s es L
  • 14. Search engines don’t always hold the answer
  • 15.
  • 18. What about information to help take decisions?
  • 19. The Age of Search has come to an end ... long live the Age of Recommendation! ● ● Chris Anderson in “The Long Tail” ● “We are leaving the age of information and entering the age of recommendation” ● CNN Money, “The race to create a 'smart' Google”: ● “The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.”
  • 20. But, what are Recommender Systems? Read this! Attend this conference!
  • 21. The value of recommendations ● Netflix: 2/3 of the movies rented are recommended ● Google News: recommendations generate 38% more clickthrough ● Amazon: 35% sales from recommendations ● Choicestream: 28% of the people would buy more music if they found what they liked. u
  • 22. The “Recommender problem” ● Estimate a utility function that is able to automatically predict how much a user will like an item that is unknown for her. Based on: ● Past behavior ● Relations to other users ● Item similarity ● Context ● ...
  • 23. Data mining + all those other things ● User Interface ● System requirements (efficiency, scalability, privacy....) ● Business Logic ● Serendipity ● ....
  • 24. The Netflix Prize ● 500K users x 17K movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE) ● 49K contestants on 40K teams from 184 countries. ● 41K valid submissions from 5K teams; 64 submissions per day ● Wining approach uses hundreds of predictors from several teams
  • 25. Approaches to Recommendation Collaborative Filtering ● ● Recommend items based only on the users past behavior ● User-based ● Find similar users to me and recommend what they liked ● Item-based ● Find similar items to those that I have previously liked Content-based ● ● Recommend based on features inherent to the items Social recommendations (trust-based) ●
  • 26. What works ● It depends on the domain and particular problem ● As a general rule, it is usually a good idea to combine: Hybrid Recommender Systems ● However, in the general case it has been demonstrated that (currently) the best isolated approach is CF. ● Item-based in general more efficient and better but mixing CF approaches can improve result ● Other approaches can be hybridized to improve results in specific cases (cold-start problem...)
  • 27. The CF Ingredients ● List of m Users and a list of n Items ● Each user has a list of items with associated opinion ● Explicit opinion - a rating score (numerical scale) ● Implicit feedback – purchase records or listening history ● Active user for whom the prediction task is performed ● A metric for measuring similarity between users ● A method for selecting a subset of neighbors ● A method for predicting a rating for items not rated by the active user. 27
  • 29. User Feedback is Noisy DID YOU HEAR WHAT  I LIKE??!! ...and limits Our Prediction Accuracy
  • 30. The Magic Barrier ● Magic Barrier = Limit on prediction accuracy due to noise in original data ● Natural Noise = involuntary noise introduced by users when giving feedback ● Due to (a) mistakes, and (b) lack of resolution in personal rating scale (e.g. In a 1 to 5 scale a 2 may mean the same than a 3 for some users and some items). ● Magic Barrier >= Natural Noise Threshold ● We cannot predict with less error than the resolution in the original data
  • 31. Our related research questions X. Amatriain, J.M. Pujol, N. Oliver (2009) "I like It... I like It Not: Measuring Users Ratings Noise in Recommender Systems", in UMAP 09 ● Q1. Are users inconsistent when providing explicit feedback to Recommender Systems via the common Rating procedure? ● Q2. How large is the prediction error due to these inconsistencies? ● Q3. What factors affect user inconsistencies?
  • 32. Experimental Setup ● 100 Movies selected from Netflix dataset doing a stratified random sampling on popularity ● Ratings on a 1 to 5 star scale ● Special “not seen” symbol. ● Trial 1 and 3 = random order; trial 2 = ordered by popularity ● 118 participants
  • 33. User Feedback is Noisy ● Users are inconsistent ● Inconsistencies are not random and depend on many factors ● More inconsistencies for mild opinions ● More inconsistencies for negative opinions ● How the items are presented affects inconsistencies
  • 34. User’s ratings are far from ground truth Pairwise comparison between trials, RMSE is already > 0.55 or > 0.69 in the best case (Netflix Prize was to get below 0.85 !!!)
  • 35. Rate it Again X. Amatriain, J.M. Pujol, N. Tintarev, N. Oliver (2009)"Rate it Again: Increasing Recommendation Accuracy by User re-Rating", 2009 ACM RecSys ● Given that users are noisy… can we benefit from asking to rate the same movie more than once? ● We propose an algorithm to allow for multiple ratings of the same <user,item> tuple. ● The algorithm is subjected to two fairness conditions: – Algorithm should remove as few ratings as possible (i.e. only when there is some certainty that the rating is only adding noise) – Algorithm should not make up new ratings but decide on which of the existing ones are valid (no averaging, predicting...)
  • 36. Re-rating Algorithm • One source re­rating case: Examples: {3, 1} →Ø {4} →4 {3, 4} →3 (2 source) {3, 4, 5} →3 • Given the following milding function:   
  • 38. Rate it again ● By asking users to rate items again we can remove noise in the dataset ● Improvements of up to 14% in accuracy! ● Because we don't want all users to re-rate all items we design ways to do partial denoising ● Data-dependent: only denoise extreme ratings ● User-dependent: detect “noisy” users
  • 39. The value or a re-rating Adding new ratings increases performance of the CF algorithm
  • 40. The value or a re-rating But you are better off doing re-rating than new ratings !!
  • 41. The value or a re-rating And much better if you know which ratings to re-rate!!
  • 42. Let's recap ● Users are inconsistent ● Inconsistencies can depend on many things including how the items are presented ● Inconsistencies produce natural noise ● Natural noise reduces our prediction accuracy independently of the algorithm ● By asking users to rate items again we can remove noise and improve accuracy
  • 43. But Crowds are not always wise ● Diversity of opinion Conditions that are  ● Independence needed to guarantee the  ● Decentralization Wisdom in a Crowd ● Aggregation
  • 44. Who Can we trust?
  • 45. Crowds are not always wise vs. Who  won?
  • 47. The Wisdom of the Few X. Amatriain et al. "The wisdom of the few: a collaborative filtering approach based on expert opinions from the web", SIGIR '09
  • 48. Expert-based CF ● expert = individual that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain ● Expert-based Collaborative Filtering ● Find neighbors from a reduced set of experts instead of regular users. 1. Identify domain experts with reliable ratings 2. For each user, compute “expert neighbors” 3. Compute recommendations similar to standard kNN CF
  • 49. User Study ● 57 participants, only 14.5 ratings/participant ● 50% of the users consider Expert-based CF to be good or very good ● Expert-based CF: only algorithm with an average rating over 3 (on a 0-4 scale)
  • 50. Advantages of the Approach ● Noise ● Cold Start problem ● Experts introduce less ● Experts rate items as natural noise soon as they are ● Malicious Ratings available ● Dataset can be monitored ● Scalability to avoid shilling ● Dataset is several order of ● Data Sparsity magnitudes smaller ● Reduced set of domain ● Privacy experts can be motivated ● Recommendations can be to rate items computed locally
  • 52. Some implementations J. Ahn and X. Amatriain et al. "Towards Fully Distributed and Privacy-preserving Recommendations via Expert Collaborative Filtering and RESTful Linked Data", Web Intelligence '10 ● A distributed Music Recommendation engine
  • 54. Some implementations (II) J. Bachs and X. Amatriain et al. "Geolocated Movie Recommendations based on Expert Collaborative Filtering", Recsys '10 ● A geo-localized Mobile Movie Recommender iPhone App
  • 55. Geo-localized Expert Movie Recommendations Powered by...
  • 57.
  • 58. Mobile phones are “personal”
  • 59. Mobile users tend to seek “fresh” content
  • 60. Where is the nearest florist?
  • 61. Where is that really cool cocktail bar I went to last month?
  • 64. Lost or in an unfamiliar place?
  • 65. Context-aware Recommendations ● A clear area of research and interest for companies: recommend me something that I like and is relevant in my current context. ● Context = any variable that adds a new dimension to the 2D user-item problem (e.g. time, geolocation, weather...)
  • 66. User micro-profiles L. Baltrunas, X. Amatriain "Towards Time-Dependant Recommendation based on Implicit Feedback", in CARS (Context-aware Recommender Systems Workshop) Recsys '09 ● Our proposal is to represent a user by a hierarchy of micro-profiles where each micro- profile represents a class in the context variable
  • 67. Multiverse Recommendation A. Karatzoglou, X. Amatriain, L. Baltrunas, N. Oliver "Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering", 2010 ACM Recsys Conference ● A different approach: represent the contextual recommendation problem by n-dimensional matrices (aka Tensors)
  • 68. Master Planner Automatic and personalized tourist route recommendations, a new approach to discovering the world
  • 69. Tourism 2.0 ● Tourism is not the same since the web appeared: – People search for information on where to go online (reading blogs, in their social networks...) – People buy tickets and hotel packages online – People post pictures and discuss tips online
  • 70. Tourism 3.0 – Going Mobile N. Tintarev, A. Flores, X. Amatriain (2010)"Off the beaten track - a mobile field study exploring the long tail of mobile tourist recommendations", 2010 Mobile HCI ● The mobile web and smartphones are introducing yet another revolution ● Tourists can now access information on the go: – Looking for information on a sight – Tips on where to go next – Information about the weather – ....
  • 71. Master Planner ● I am in SB, it's March and sunny, I have 6 hours to visit things and I am interested on music, art, literature, and sports ● I need: An automatic tourist route recommender system
  • 72. Master Planner ● Completely automatic personalized/contextualized tourist recommender system ● Generates automatic city models using web resources ● Generates automatic user models from regular user profiles ● Personalizes/contextualizes generic city models ● Recommends optimized personalized routes taking into account constraints using AI techniques
  • 73. Summary ➢ We need to build tools and approaches to help people navigate the abundance of media and information ➢ Recommender systems can help by leveraging the wisdom of the crowds ➢ But... ➢ User feedback is not always our ground truth ➢ Crowds are not always wise and we are better off using experts ➢ Context is becoming part of the content itself
  • 74. Co-authors ● Josep M. Pujol and Nuria Oliver (Telefonica) worked on Natural Noise and Wisdom of the Few projects ● Neal Lathia (UCL, London), Haewook Ahn (KAIST, Korea), Jaewook Ahn (Pittsbourgh Univ.), and Josep Bachs (UPF, Barcelona) on Wisdom of the Few ● Linas Baltrunas (Bolzano U., Italy), Alexandros Karatzoglou, Paulo Villegas, Toni Cebrian (Telefonica) worked on contextual ● Miquel Ramirez (UPF, Barcelona) and Nava Tintarev (Telefonica) worked on Tourist Recommendations.
  • 75. Conclusions ➢ Whether you are an engineer, an artist or a scientist (or all of the above), it is important to keep the “user” in mind ➢ Who are my “users”? (end-user, public, other scientists, a grant agency...) ➢ How will the output of my work affect users? ● How can I obtain feedback from them? ➢ How can I use it? ➢ ... ➢
  • 76. Thanks! Questions? Xavier Amatriain xar@tid.es http://xavier.amatriain.net http://technocalifornia.blogspot.com @xamat