Iterative Methodology for Personalization Models Optimization

Iterative Methodology
for Personalization
Models Optimization

User Profile Lab
Model Learning
Framework

- Listings
- Real Listings
- Clicks
- User Profile
- Ad Profile
1

Why User Profile Is Important
● Personalization
● Lookalikes modeling
● Interest Targeting
● and more….

User Profile Data Model
DocId Timestamp Feature Confidence
12 0 Sport Cars 1
42 -1 Sport Cars 1
43 -21 World Cup 1
55 -21 World Cup 1
Offline Profile

User Profile Data Model
DocId TS Feature Co
nf
12 0 Sport Cars 1
42 -1 Sport Cars 1
43 -21 World Cup 1
55 -21 World Cup 1
Category Conf
Sports
Cats
2
Soccer 2
Serving ProfileOffline Profile

User Profile - Boost Recent X2
DocId TS Feature Co
nf
12 0 Sport Cars 1
42 -1 Sport Cars 1
43 -21 World Cup 1
55 -21 World Cup 1
Category Conf
Sports
Cats
2
Soccer 1
Serving ProfileOffline Profile

Motivation - User Profile Tweaks
● Is this hypothesis true?
● What is the decay schema?
● Linear in time?
● Exponential in time?
● Potentially many trial & error cycles

Profile Lab - Basic Flow
● Static dataset of offline profiles
● Sequence of docids for each user
● Static feature mapping all docids
● Lean algo block needed to be implemented
● Transform offline profile to online
● Apply algo piece to generate online profile
● Generate KPIs

Profile Lab - Cross User KPI
● Run cross user test 20K time
● Average and normalize result frequencies

Profile Lab - Advanced Flow (WIP)
Uuid, click, adid, document ids of interactions
11233455, true, 99837377, [11234342, 13424234, 3254534]
56456546, false, 3434888, [11234342, 23432444, 1213333, 23432423]
34564363, true, 11113333,[35245555, 463321111, 19938222]
…….

Profile Lab - Advanced Flow (WIP)
● Static supervised datasets of offline profiles
● Perso - with click
● LAL - in/out segment
● Model is built
● Model performance KPIs (auc etc)

Motivation: Named Entities & Wikitags

Motivation: Named Entities & Wikitags
● Very high cardinality ~300-400K
● Precise user taste
● Big potential in perso
● Big money for user segmentation
● Hard to leverage as is

Machine
Learning
High
Applied
Statistics
High
Student
Loans
Medium
Teaching Medium
Technical
Tutorial
High
Recurrent
Neural Nets
High
Andrew Ng High
Motivation: Hard To Leverage

Gender Royal Age Tech Rich
Prince Harry 1 0.9 -0.05 0.3 0.9
Queen Elizabeth -1 0.99 0.9 -0.8 0.9
Apple inc. 0.1 -0.03 0.5 0.9 0.9
Machine Learning Students -0.7 0.02 -0.5 0.8 -0.6
Facebook 0 0.01 0.2 0.9 0.87
Embeddings: Dense Representation

Embeddings: Dense Representation
● Given a high confidence concept in a doc
● Context is other concepts
● Lots of training data in our DocStore
● Many existing libraries: word2vec, glove, starspace etc
● Good embedding model

Machine
Learning
High
Applied
Statistics
High
Student
Loans
Medium
Teaching Medium
Technical
Tutorial
High
Recurrent
Neural Nets
High
Andrew Ng High
Embeddings Based Models

Embeddings Based Models
● Major change in prod model architecture
● High dev costs
● Potential issue with Elastic
Static embedding cluster is a good fallback

Clustering Phase
● Cluster embedding vectors
● Cluster id = doc feature
● Concept vector => cluster id
● Easy integration with current architecture

Clustering - Many Hyperparameters
● Train embedding model : |D| docs, |C| coordinates
● Quick sanity over embeddings model
● Select most frequent N concepts
● Apply sk-learn clustering analysis method A
● Benchmark clusters using common metrics
● Qualitative cluster analysis look good?
● No: Try different |D|, |C|, N, A
● Yes: Implement test and run on lab

Clustering - Many Hyperparameters
● Starspace embeddings / Word2Vec
● Embedding Dim - 50, 100, 300
● Number of Clusters - 1000, 2000, 5000, 10000
● Clustering Algorithms - k-means, DB-Scan

Clustering Phase
● Pakistani Cricketers
● Philipino Celebrities
● Norwegian Politics
● Badminton
● Potato Dishes
● Spanish Football

Results - WEC vs WikiTags
First WEC model

● Word2Vec on 3 days data
● Only features which conf > 0.3
● No long tail clustering (freq>100)
Results - WEC vs Categories

Training with Modeling Framework

Iterative Methodology for Personalization Models Optimization

Recommended

Recommended

More Related Content

Similar to Iterative Methodology for Personalization Models Optimization

Similar to Iterative Methodology for Personalization Models Optimization (20)

More from Sonya Liberman

More from Sonya Liberman (7)

Recently uploaded

Recently uploaded (20)

Iterative Methodology for Personalization Models Optimization