SlideShare a Scribd company logo
1 of 18
6    42   8

78   14   98

1    7    8

               Simple Matrix Factorization for
               Recommendation
               Sean Owen • Apache Mahout
Apache Mahout
•   Scalable machine learning
•   (Mostly) Hadoop-based
•   Clustering, classification and
    recommender engines


•   Nearest-neighbor
     •   User-based                  mahout.apache.org
     •   Item-based
     •   Slope-one
     •   Clustering-based

•   Latent factor
     •   SVD-based
     •   ALS
     •   More!
Matrix = Associations
                               Things are associated
        Rose   Navy   Olive
                                Like people to colors

Alice    0      +4     0       Associations have strengths
                                Like preferences and dislikes
Bob      0      0      +2
                               Can quantify associations
                                Alice loves navy = +4,
Carol    -1     0      -2       Carol dislikes olive = -2

Dave    +3      0      0       We don’t know all
                                associations
                                Many implicit zeroes
From One Matrix, Two
 Like numbers, matrices can               n
  be factored

 m•n matrix = m•k times k•n

 Associations can
                                   m       P
                                                   =
  decompose into others
                                       k               n
 Alice likes navy =

                                           •
  Alice loves blues, and
                                               k   Y’
  blues includes navy          m       X
In Terms of Few Features
 Can explain associations by appealing to underlying
  intermediate features (e.g. “blue-ness”)

 Relatively few (one “blue-ness”, but many shades)


                              (Blue)
       (Alice)




                                                      (Navy)
Losing Information is Helpful
 When k (= features) is small, information is lost

 Factorization is approximate
  (Alice appears to like blue-ish periwinkle too)


                                 (Blue)
        (Alice)

                                                      (Periwinkle)

                                                      (Navy)
How to Compute?
     n            k           n


                      •   k   Y’

           =
m    P      m     X
Skip the Singular Value
    Decomposition for now …
        n        k                n


                     •   Σ   •k   T’

             =
m       A    m   S
Alternating Least Squares
 Collaborative Filtering for Implicit Feedback Datasets
  www2.research.att.com/~yifanhu/PUB/cf.pdf
 R = matrix of user-item interactions “strengths”
 P = R reduced to 0 and 1
 Factor as approximate P ≈ X•Y’
   Start with random Y
   Compute X such that X•Y’ best approximates P
    (Frobenius / L2 norm)            (Least Squares)
   Repeat for Y         (Alternating)
   Iterate, Iterate, Iterate

 Large values in X•Y’ are good recommendations
Example


    1   4   3           1   1   1   0   0
            3           0   0   1   0   0
        4       3   2   0   1   0   1   1
R                                           P
    5       2       3   1   0   1   0   1
                5       0   0   0   1   0
    2   4               1   1   0   0   0
k = 3, λ=2, α=40
            1 iteration


1   1   1    0   0       2.18   -0.01   0.35        0.43    0.48    0.48    0.16    0.10



0   0   1    0   0       1.83   -0.11   -0.68       -0.27   0.39    -0.13   0.03    0.05




                     ≈
0   1   0    1   1       0.79   1.15    -1.80       -0.03   -0.09   -0.13   -0.47   -0.47



1   0   1    0   1       0.97   -1.90   -2.12
                                                                                      Y’
0   0   0    1   0       1.01   -0.25   -1.77



1   1   0    0   0       2.33   -8.00   1.06
                                                X
k = 3, λ=2, α=40
            1 iteration


1   1   1    0   0
                         0.94   1.00    1.00   0.18    0.07



0   0   1    0   0       0.84   0.89    0.99   0.60    0.50




                     ≈
0   1   0    1   1       0.07   0.99    0.46   1.01    0.98

                                                               X•Y’
1   0   1    0   1       1.00   -0.09   1.00   1.08    0.99



0   0   0    1   0       0.55   0.54    0.75   0.98    0.92



1   1   0    0   0       1.01   0.99    0.98   -0.13   -0.25
k = 3, λ=2, α=40
            10 iterations


1   1   1    0   0
                         0.96   0.99   0.99    0.38    0.93



0   0   1    0   0       0.44   0.39   0.98    -0.11   0.39




                     ≈
0   1   0    1   1       0.70   0.99   0.42    0.98    0.98

                                                              X•Y’
1   0   1    0   1       1.00   1.04   0.99    0.44    0.98



0   0   0    1   0       0.11   0.51   -0.13   1.00    0.57



1   1   0    0   0       0.97   1.00   0.68    0.47    0.91
Interesting Because…



 This is all very
 parallelizable
by row, column
BONUS: Folding in New Data
 Model building takes time       Apply some right inverse:
                                       ⌃
                                   X•Y’•(Y’)-1 = Q•(Y’)-1 = so
 Sometimes need                   X = Q•(Y’)-1
  immediate, if approximate,
  updates for new data            OK, what is (Y’)-1?

 For new user U, need new        Of course (Y’•Y)•(Y’•Y)-1 = I
  row, XU•Y’ = QU, but have PU
                                  So Y’•(Y•(Y’•Y)-1) = I and
 What is XU?                      right inverse is Y•(Y’•Y)-1

                                  Xu = QU•Y•(Y’•Y)-1 and so
                                   Xu ≈ Pu•Y•(Y’•Y)-1
In Mahout
 org.apache.mahout.cf.          MAHOUT-737
  taste.hadoop.als.
  ParallelALSFactorizationJob     Alternate implementation
   Alternating least squares      of alternating least
                                   squares
   Distributed, Hadoop-
    based                        And more…
 org.apache.mahout.cf.           DistributedLanczosSolver
  taste.impl.recommender.         SequentialOutOfCoreSvd
  svd.SVDRecommender
                                  …
   SVD-based
   Non-distributed, not
    Hadoop
 Complete product
            Real-time Serving Layer
Myrrix      Hadoop-based
             Computation Layer
            Tuned, documented

          Free / open: Serving Layer,
           for small data

          Commercial: add
           Computation Layer for big
           data; Hosting

          Matrix factorization-based,
           attractive properties

          http://myrrix.com
Thank You
srowen at myrrix.com
mahout.apache.org

More Related Content

What's hot

Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendationAravindharamanan S
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking Systemivaderivader
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 

What's hot (20)

Recommender system
Recommender systemRecommender system
Recommender system
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 

Similar to Simple Matrix Factorization for Recommendation in Mahout

Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial IntelligenceManoj Harsule
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationSilvio Cesare
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
Statistics lecture 11 (chapter 11)
Statistics lecture 11 (chapter 11)Statistics lecture 11 (chapter 11)
Statistics lecture 11 (chapter 11)jillmitchell8778
 
Lesson31 Higher Dimensional First Order Difference Equations Slides
Lesson31   Higher Dimensional First Order Difference Equations SlidesLesson31   Higher Dimensional First Order Difference Equations Slides
Lesson31 Higher Dimensional First Order Difference Equations SlidesMatthew Leingang
 
Normal distribution and hypothesis testing
Normal distribution and hypothesis testingNormal distribution and hypothesis testing
Normal distribution and hypothesis testingLorelyn Turtosa-Dumaug
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoveryGabriel Peyré
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
Class 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and PoliticsClass 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and PoliticsDavid Evans
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Beating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit ArithmeticBeating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit Arithmeticinside-BigData.com
 

Similar to Simple Matrix Factorization for Recommendation in Mahout (20)

talk9.ppt
talk9.ppttalk9.ppt
talk9.ppt
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
 
December 7, Projects
December 7, ProjectsDecember 7, Projects
December 7, Projects
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
Taylor problem
Taylor problemTaylor problem
Taylor problem
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
Statistics lecture 11 (chapter 11)
Statistics lecture 11 (chapter 11)Statistics lecture 11 (chapter 11)
Statistics lecture 11 (chapter 11)
 
Lesson31 Higher Dimensional First Order Difference Equations Slides
Lesson31   Higher Dimensional First Order Difference Equations SlidesLesson31   Higher Dimensional First Order Difference Equations Slides
Lesson31 Higher Dimensional First Order Difference Equations Slides
 
Normal distribution and hypothesis testing
Normal distribution and hypothesis testingNormal distribution and hypothesis testing
Normal distribution and hypothesis testing
 
1010n3a
1010n3a1010n3a
1010n3a
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse Recovery
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Class 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and PoliticsClass 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and Politics
 
Class10
Class10Class10
Class10
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Beating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit ArithmeticBeating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit Arithmetic
 

More from Data Science London

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Data Science London
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingData Science London
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Data Science London
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysisData Science London
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayData Science London
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignData Science London
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Data Science London
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureData Science London
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryData Science London
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutData Science London
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersData Science London
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 

More from Data Science London (20)

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Nowcasting Business Performance
Nowcasting Business PerformanceNowcasting Business Performance
Nowcasting Business Performance
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems Design
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Data Science for Live Music
Data Science for Live MusicData Science for Live Music
Data Science for Live Music
 
Research at last.fm
Research at last.fmResearch at last.fm
Research at last.fm
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music Industry
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Practical Magic with Incanter
Practical Magic with IncanterPractical Magic with Incanter
Practical Magic with Incanter
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Simple Matrix Factorization for Recommendation in Mahout

  • 1. 6 42 8 78 14 98 1 7 8 Simple Matrix Factorization for Recommendation Sean Owen • Apache Mahout
  • 2. Apache Mahout • Scalable machine learning • (Mostly) Hadoop-based • Clustering, classification and recommender engines • Nearest-neighbor • User-based mahout.apache.org • Item-based • Slope-one • Clustering-based • Latent factor • SVD-based • ALS • More!
  • 3. Matrix = Associations  Things are associated Rose Navy Olive Like people to colors Alice 0 +4 0  Associations have strengths Like preferences and dislikes Bob 0 0 +2  Can quantify associations Alice loves navy = +4, Carol -1 0 -2 Carol dislikes olive = -2 Dave +3 0 0  We don’t know all associations Many implicit zeroes
  • 4. From One Matrix, Two  Like numbers, matrices can n be factored  m•n matrix = m•k times k•n  Associations can m P = decompose into others k n  Alice likes navy = • Alice loves blues, and k Y’ blues includes navy m X
  • 5. In Terms of Few Features  Can explain associations by appealing to underlying intermediate features (e.g. “blue-ness”)  Relatively few (one “blue-ness”, but many shades) (Blue) (Alice) (Navy)
  • 6. Losing Information is Helpful  When k (= features) is small, information is lost  Factorization is approximate (Alice appears to like blue-ish periwinkle too) (Blue) (Alice) (Periwinkle) (Navy)
  • 7. How to Compute? n k n • k Y’ = m P m X
  • 8. Skip the Singular Value Decomposition for now … n k n • Σ •k T’ = m A m S
  • 9. Alternating Least Squares  Collaborative Filtering for Implicit Feedback Datasets www2.research.att.com/~yifanhu/PUB/cf.pdf  R = matrix of user-item interactions “strengths”  P = R reduced to 0 and 1  Factor as approximate P ≈ X•Y’  Start with random Y  Compute X such that X•Y’ best approximates P (Frobenius / L2 norm) (Least Squares)  Repeat for Y (Alternating)  Iterate, Iterate, Iterate  Large values in X•Y’ are good recommendations
  • 10. Example 1 4 3 1 1 1 0 0 3 0 0 1 0 0 4 3 2 0 1 0 1 1 R P 5 2 3 1 0 1 0 1 5 0 0 0 1 0 2 4 1 1 0 0 0
  • 11. k = 3, λ=2, α=40 1 iteration 1 1 1 0 0 2.18 -0.01 0.35 0.43 0.48 0.48 0.16 0.10 0 0 1 0 0 1.83 -0.11 -0.68 -0.27 0.39 -0.13 0.03 0.05 ≈ 0 1 0 1 1 0.79 1.15 -1.80 -0.03 -0.09 -0.13 -0.47 -0.47 1 0 1 0 1 0.97 -1.90 -2.12 Y’ 0 0 0 1 0 1.01 -0.25 -1.77 1 1 0 0 0 2.33 -8.00 1.06 X
  • 12. k = 3, λ=2, α=40 1 iteration 1 1 1 0 0 0.94 1.00 1.00 0.18 0.07 0 0 1 0 0 0.84 0.89 0.99 0.60 0.50 ≈ 0 1 0 1 1 0.07 0.99 0.46 1.01 0.98 X•Y’ 1 0 1 0 1 1.00 -0.09 1.00 1.08 0.99 0 0 0 1 0 0.55 0.54 0.75 0.98 0.92 1 1 0 0 0 1.01 0.99 0.98 -0.13 -0.25
  • 13. k = 3, λ=2, α=40 10 iterations 1 1 1 0 0 0.96 0.99 0.99 0.38 0.93 0 0 1 0 0 0.44 0.39 0.98 -0.11 0.39 ≈ 0 1 0 1 1 0.70 0.99 0.42 0.98 0.98 X•Y’ 1 0 1 0 1 1.00 1.04 0.99 0.44 0.98 0 0 0 1 0 0.11 0.51 -0.13 1.00 0.57 1 1 0 0 0 0.97 1.00 0.68 0.47 0.91
  • 14. Interesting Because… This is all very parallelizable by row, column
  • 15. BONUS: Folding in New Data  Model building takes time  Apply some right inverse: ⌃ X•Y’•(Y’)-1 = Q•(Y’)-1 = so  Sometimes need X = Q•(Y’)-1 immediate, if approximate, updates for new data  OK, what is (Y’)-1?  For new user U, need new  Of course (Y’•Y)•(Y’•Y)-1 = I row, XU•Y’ = QU, but have PU  So Y’•(Y•(Y’•Y)-1) = I and  What is XU? right inverse is Y•(Y’•Y)-1  Xu = QU•Y•(Y’•Y)-1 and so Xu ≈ Pu•Y•(Y’•Y)-1
  • 16. In Mahout  org.apache.mahout.cf.  MAHOUT-737 taste.hadoop.als. ParallelALSFactorizationJob  Alternate implementation  Alternating least squares of alternating least squares  Distributed, Hadoop- based  And more…  org.apache.mahout.cf.  DistributedLanczosSolver taste.impl.recommender.  SequentialOutOfCoreSvd svd.SVDRecommender  …  SVD-based  Non-distributed, not Hadoop
  • 17.  Complete product  Real-time Serving Layer Myrrix  Hadoop-based Computation Layer  Tuned, documented  Free / open: Serving Layer, for small data  Commercial: add Computation Layer for big data; Hosting  Matrix factorization-based, attractive properties  http://myrrix.com
  • 18. Thank You srowen at myrrix.com mahout.apache.org