SlideShare a Scribd company logo
ENHANCING MATRIX FACTORIZATION
THROUGH INITIALIZATION FOR
IMPLICIT FEEDBACK DATABASES


                  Balázs Hidasi
                 Domonkos Tikk

                    Gravity R&D Ltd.
     Budapest University of Technology and Economics
OUTLINE
 Matrix factoriztion
 Initialization concept

 Methods
     Naive
     SimFactor

 Results
 Discussion




                           2/19
MATRIX FACTORIZATION
 Collaborative Filtering
 One of the most common approaches

 Approximates the rating matrix as product of low-
  rank matrices
                Items


                                           Q
        Users




                R        ≈     P




                                                      3/19
MATRIX FACTORIZATION
 Initialize P and Q with small random numbers
 Teach P and Q
     Alternating Least Squares
     Gradient Descent
     Etc.

   Transforms the data to a feature space
     Separately for users and items
     Noise reduction
     Compression
     Generalization

                                                 4/19
IMPLICIT FEEDBACK
 No ratings
 User-item interactions (events)

 Much noisier
     Presence of an event  might not be positive feedback
     Absence of an event  does not mean negative
      feedback
     No negative feedback is available!

 More common problem
 MF for implicit feedback
     Less accurate results due to noise
     Mostly ALS is used                                      5/19
     Scalability problems (rating matrix is dense)
CONCEPT
   Good MF model
     The feature vectors of similar entities are similar
     If data is too noisy  similar entities won’t be similar by
      their features
   Start MF from a „good” point
       Feature vector similarities are OK
   Data is more than just events
       Metadata
           Info about items/users
       Contextual data
           In what context did the event occured
       Can we incorporate those to help implicit MF?               6/19
NAIVE APPROACH
   Describe items using any data we have (detailed
    later)
       Long, sparse vectors for item description
   Compress these vectors to dense feature vectors
       PCA, MLP, MF, …
       Length of desired vectors = Number of features in MF
   Use these features as starting points




                                                               7/19
NAIVE APPROACH
 Compression and also noise reduction
 Does not really care about similarities

 But often feature similarities are not that bad

 If MF is used
           Half of the results is thrown out


                   Descriptors
                                         features   Descriptor features
                                    ≈
    Items




                                           Item


             Description of items

                                                                          8/19
SIMFACTOR ALGORITHM
 Try to preserve similarities better
 Starting from an MF of item description
                  Descriptors

                                                        Descriptors features




                                             features
            Description of items
                                        ≈
    Items




                                               Item
                    (D)


    Similarities of items: DD’




                                                                      Description of items
           Some metrics require transformation on D




                                                                              (D’)
                            Item
                                            Description of items
                         similarities
                             (S)
                                        =           (D)                                      9/19
SIMFACTOR ALGORITHM




                                                         Descriptors features
                   features
   Item                           Descriptors features                            Item
               ≈     Item

                      (X)
similarities                              (Y’)                                  features
    (S)                                                                            (X’)




                                                                 (Y)
   Similarity approximation

                                       features

                      Item                                    Item
                                         Item



                                   ≈
                                          (X)

                   similarities                   Y’Y       features
                       (S)                                     (X’)


   Y’Y  KxK symmetric                                                                    10/19
       Eigendecomposition
SIMFACTOR ALGORITHM

Y’Y     =      U          λ        U’

   λ diagonal  λ = SQRT(λ) * SQRT(λ)
                   features


   Item                                                      Item
               ≈
                     Item



                                        SQRT   SQRT
                      (X)


similarities                  U          (λ)    (λ)   U’   features
    (S)                                                       (X’)

 X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F
 F is MxK matrix

 S  F * F’  F used for initialization

   Item
similarities   ≈                  F’                                  11/19
                     F




    (S)
CREATING THE DESCRIPTION MATRIX
   „Any” data about the entity
       Vector-space reprezentation
   For Items:
     Metadata vector (title, category, description, etc)
     Event vector (who bought the item)
     Context-state vector (in which context state was it
      bought)
     Context-event (in which context state who bought it)

   For Users:
       All above except metadata
 Currently: Choose one source for D matrix
                                                             12/19
 Context used: seasonality
EXPERIMENTS: SIMILARITY PRESERVATION
   Real life dataset: online grocery shopping events
            SimFactor RMSE improvement over naive in similarity
                             approximation

     52,36%
                                                                                             48,70%




                                                          26,22%


                                          16,86%
                                                                           13,39%                              12,38%
                       10,81%




Item context state User context state   Item context-   User context-   Item event data   User event data   Item metadata
                                            event          event
                                                                                                                            13/19
   SimFactor approximates similarities better
EXPERIMENTS: INITIALIZATION
 Using different description matrices
 And both naive and SimFactor initialization

 Baseline: random init

 Evaluation metric: recall@50




                                                14/19
EXPERIMENTS: GROCERY DB
 Up to 6% improvement
 Best methods use SimFactor and user context data

                              Top5 methods on Grocery DB
         5,71%


                              4,88%

                                                   4,30%
                                                                       4,12%              4,04%




                                                                                                          15/19

    User context state   User context state   User context event   User event data   User context event
      (SimFactor)             (Naive)            (SimFactor)        (SimFactor)           (Naive)
EXPERIMENTS: „IMPLICITIZED” MOVIELENS
 Keeping 5 star ratings  implicit events
 Up to 10% improvement

 Best methods use SimFactor and item context data
                            Top5 methods on MovieLens DB

          10%




                              9,17%                9,17%                9,17%                9,17%




                                                                                                             16/19

    Item context state   User context state   Item context event   Item context event   Item context state
       (SimFactor)         (SimFactor)           (SimFactor)             (Naive)             (Naive)
DISCUSSION OF RESULTS
 SimFactor yields better results than naive
 Context information yields better results than other
  descriptions
 Context information separates well between entities
       Grocery: User context
         People’s routines
         Different types of shoppings in different times

       MovieLens: Item context
           Different types of movies watched on different hours
   Context-based similarity

                                                                   17/19
WHY CONTEXT?
   Grocery example
       Correlation between context states by users low
                   ITEM                                        USER
        Mon Tue Wed Thu Fri Sat Sun                 Mon Tue Wed Thu Fri Sat Sun
Mon      1,00 0,79 0,79 0,78 0,76 0,70 0,74   Mon    1,00 0,36 0,34 0,34 0,35 0,29 0,19
Tue      0,79 1,00 0,79 0,78 0,76 0,69 0,73   Tue    0,36 1,00 0,34 0,34 0,33 0,29 0,19
Wed      0,79 0,79 1,00 0,79 0,76 0,70 0,74   Wed    0,34 0,34 1,00 0,36 0,35 0,27 0,17
Thu      0,78 0,78 0,79 1,00 0,76 0,71 0,74   Thu    0,34 0,34 0,36 1,00 0,39 0,30 0,16
Fri      0,76 0,76 0,76 0,76 1,00 0,71 0,72   Fri    0,35 0,33 0,35 0,39 1,00 0,32 0,16
Sat      0,70 0,69 0,70 0,71 0,71 1,00 0,71   Sat    0,29 0,29 0,27 0,30 0,32 1,00 0,33
Sun      0,74 0,73 0,74 0,74 0,72 0,71 1,00   Sun    0,19 0,19 0,17 0,16 0,16 0,33 1,00
   Why can context aware algorithms be efficient?
     Different recommendations in different context states
                                                                                     18/19
     Context differentiates well between entities
            Easier subtasks
CONCLUSION & FUTURE WORK
 SimFactor  Similarity preserving compression
 Similarity based MF initialization:
     Description matrix from any data
     Apply SimFactor
     Use output as initial features for MF

 Context differentitates between entities well
 Future work:
       Mixed description matrix (multiple data sources)
       Multiple description matrix
       Using different context information
       Using different similarity metrics                 19/19
THANKS FOR YOUR ATTENTION!

Questions?

More Related Content

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases

  • 1. ENHANCING MATRIX FACTORIZATION THROUGH INITIALIZATION FOR IMPLICIT FEEDBACK DATABASES Balázs Hidasi Domonkos Tikk Gravity R&D Ltd. Budapest University of Technology and Economics
  • 2. OUTLINE  Matrix factoriztion  Initialization concept  Methods  Naive  SimFactor  Results  Discussion 2/19
  • 3. MATRIX FACTORIZATION  Collaborative Filtering  One of the most common approaches  Approximates the rating matrix as product of low- rank matrices Items Q Users R ≈ P 3/19
  • 4. MATRIX FACTORIZATION  Initialize P and Q with small random numbers  Teach P and Q  Alternating Least Squares  Gradient Descent  Etc.  Transforms the data to a feature space  Separately for users and items  Noise reduction  Compression  Generalization 4/19
  • 5. IMPLICIT FEEDBACK  No ratings  User-item interactions (events)  Much noisier  Presence of an event  might not be positive feedback  Absence of an event  does not mean negative feedback  No negative feedback is available!  More common problem  MF for implicit feedback  Less accurate results due to noise  Mostly ALS is used 5/19  Scalability problems (rating matrix is dense)
  • 6. CONCEPT  Good MF model  The feature vectors of similar entities are similar  If data is too noisy  similar entities won’t be similar by their features  Start MF from a „good” point  Feature vector similarities are OK  Data is more than just events  Metadata  Info about items/users  Contextual data  In what context did the event occured  Can we incorporate those to help implicit MF? 6/19
  • 7. NAIVE APPROACH  Describe items using any data we have (detailed later)  Long, sparse vectors for item description  Compress these vectors to dense feature vectors  PCA, MLP, MF, …  Length of desired vectors = Number of features in MF  Use these features as starting points 7/19
  • 8. NAIVE APPROACH  Compression and also noise reduction  Does not really care about similarities  But often feature similarities are not that bad  If MF is used  Half of the results is thrown out Descriptors features Descriptor features ≈ Items Item Description of items 8/19
  • 9. SIMFACTOR ALGORITHM  Try to preserve similarities better  Starting from an MF of item description Descriptors Descriptors features features Description of items ≈ Items Item (D)  Similarities of items: DD’ Description of items  Some metrics require transformation on D (D’) Item Description of items similarities (S) = (D) 9/19
  • 10. SIMFACTOR ALGORITHM Descriptors features features Item Descriptors features Item ≈ Item (X) similarities (Y’) features (S) (X’) (Y)  Similarity approximation features Item Item Item ≈ (X) similarities Y’Y features (S) (X’)  Y’Y  KxK symmetric 10/19  Eigendecomposition
  • 11. SIMFACTOR ALGORITHM Y’Y = U λ U’  λ diagonal  λ = SQRT(λ) * SQRT(λ) features Item Item ≈ Item SQRT SQRT (X) similarities U (λ) (λ) U’ features (S) (X’)  X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F  F is MxK matrix  S  F * F’  F used for initialization Item similarities ≈ F’ 11/19 F (S)
  • 12. CREATING THE DESCRIPTION MATRIX  „Any” data about the entity  Vector-space reprezentation  For Items:  Metadata vector (title, category, description, etc)  Event vector (who bought the item)  Context-state vector (in which context state was it bought)  Context-event (in which context state who bought it)  For Users:  All above except metadata  Currently: Choose one source for D matrix 12/19  Context used: seasonality
  • 13. EXPERIMENTS: SIMILARITY PRESERVATION  Real life dataset: online grocery shopping events SimFactor RMSE improvement over naive in similarity approximation 52,36% 48,70% 26,22% 16,86% 13,39% 12,38% 10,81% Item context state User context state Item context- User context- Item event data User event data Item metadata event event 13/19  SimFactor approximates similarities better
  • 14. EXPERIMENTS: INITIALIZATION  Using different description matrices  And both naive and SimFactor initialization  Baseline: random init  Evaluation metric: recall@50 14/19
  • 15. EXPERIMENTS: GROCERY DB  Up to 6% improvement  Best methods use SimFactor and user context data Top5 methods on Grocery DB 5,71% 4,88% 4,30% 4,12% 4,04% 15/19 User context state User context state User context event User event data User context event (SimFactor) (Naive) (SimFactor) (SimFactor) (Naive)
  • 16. EXPERIMENTS: „IMPLICITIZED” MOVIELENS  Keeping 5 star ratings  implicit events  Up to 10% improvement  Best methods use SimFactor and item context data Top5 methods on MovieLens DB 10% 9,17% 9,17% 9,17% 9,17% 16/19 Item context state User context state Item context event Item context event Item context state (SimFactor) (SimFactor) (SimFactor) (Naive) (Naive)
  • 17. DISCUSSION OF RESULTS  SimFactor yields better results than naive  Context information yields better results than other descriptions  Context information separates well between entities  Grocery: User context  People’s routines  Different types of shoppings in different times  MovieLens: Item context  Different types of movies watched on different hours  Context-based similarity 17/19
  • 18. WHY CONTEXT?  Grocery example  Correlation between context states by users low ITEM USER Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 1,00 0,79 0,79 0,78 0,76 0,70 0,74 Mon 1,00 0,36 0,34 0,34 0,35 0,29 0,19 Tue 0,79 1,00 0,79 0,78 0,76 0,69 0,73 Tue 0,36 1,00 0,34 0,34 0,33 0,29 0,19 Wed 0,79 0,79 1,00 0,79 0,76 0,70 0,74 Wed 0,34 0,34 1,00 0,36 0,35 0,27 0,17 Thu 0,78 0,78 0,79 1,00 0,76 0,71 0,74 Thu 0,34 0,34 0,36 1,00 0,39 0,30 0,16 Fri 0,76 0,76 0,76 0,76 1,00 0,71 0,72 Fri 0,35 0,33 0,35 0,39 1,00 0,32 0,16 Sat 0,70 0,69 0,70 0,71 0,71 1,00 0,71 Sat 0,29 0,29 0,27 0,30 0,32 1,00 0,33 Sun 0,74 0,73 0,74 0,74 0,72 0,71 1,00 Sun 0,19 0,19 0,17 0,16 0,16 0,33 1,00  Why can context aware algorithms be efficient?  Different recommendations in different context states 18/19  Context differentiates well between entities  Easier subtasks
  • 19. CONCLUSION & FUTURE WORK  SimFactor  Similarity preserving compression  Similarity based MF initialization:  Description matrix from any data  Apply SimFactor  Use output as initial features for MF  Context differentitates between entities well  Future work:  Mixed description matrix (multiple data sources)  Multiple description matrix  Using different context information  Using different similarity metrics 19/19
  • 20. THANKS FOR YOUR ATTENTION! Questions?