SlideShare a Scribd company logo
1 of 15
Download to read offline
RecSys Challenge 2015: ensemble learning with
categorical features
Peter Romov, Evgeny Sokolov
•  Logs from e-commerce website: collection of sessions
•  Session
•  sequence of clicks on the item pages
•  could end with or without purchase
•  Click
•  Timestamp
•  ItemID (≈54k unique IDs)
•  CategoryID (≈350 unique IDs)
•  Purchase
•  set of bought items with price and quantity
•  Known purchases for the train-set, need to predict on the test-set
Problem statement
2
Problem statement
3
Clicks from session s
Purchase (actual)
Purchase (predicted)
c(s) = (c1(s), . . . , cn(s)(s))
h(s) ⇡ y(s)
y(s) =
(
; — no purchase
{i1, . . . , im(s)} (bought items) — otherwise
Evaluation measure:
— Jaccard distance between two sets
Sb
test
Stest
where
— all sessions from test-set
— sessions from test-set with purchase
J(y(s), h(s)) =
|y(s)  h(s)|
|y(s) [ h(s)|
Q(h, Stest) =
X
s2Stest:
|h(s)|>0
8
<
:
|Sb
test|
|Stest| + J(y(s), h(s)), if y(s) 6= ;
|Sb
test|
|Stest| , otherwise
Problem statement
4
First observations (from the task):
•  the task is uncommon (set prediction with specific loss function)
•  evaluation measure could be rewritten
•  the original problem can be divided into two well-known
binary classification problems;
1.  predict purchase given session
optimize Purchase score
2.  predict bought items given session with purchase
optimize Jaccard score
P(y(s) 6= ;|s)
Q(h, Stest) =
|Sb
test|
|Stest|
(TP FP)
| {z }
purchase score
+
X
s2Stest
J(y(s), h(s))
| {z }
Jaccard score
,
P(i 2 y(s)|s, y(s) 6= ;)
•  Two-stage prediction
•  Two binary classification models learned on the train-set
•  Both classifiers require thresholds
•  Set up thresholds to optimize Purchase score and Jaccard score
using hold-out subsample of the train-set
Solution schema
5
Strain
purchase classifier
bought item classifier
Slearn
90%
Svalid
10%
s 7! P(purchase|s)
(s, i) 7! P(i 2 y(s)|s, y(s) 6= ;)
classifier
thresholds
Some relationships from data
6
Next observations (from the data):
•  Buying rate strongly depends on time features
•  Buying rate varies highly between categorical features
he
er-
87
e-
s)
es-
me
s.
he
or
ht
is
)) ,
ne
ty
Figure 1: Dynamics of the buying rate in time
Figure 2: Buying rate versus number of clicked
items (left) and ID of the item with the maximum
number of clicks in session (right)
The total number of item IDs and category IDs is 54, 287
and 347 correspondingly. Both training and test sets be-
long to an interval of 6 months. The target function y(s)
corresponds to the set of items that were bought in the ses-
sion s 1
. In other words, the target function gives some
subset of the universal item set I for each user session s.
We are given sets of bought items y(s) for all sessions s the
training set Strain, and are required to predict these sets for
test sessions s ∈ Stest.
2.2 Evaluation Measure
Denote by h(s) a hypothesis that predicts a set of bought
items for any user session s. The score of this hypothesis is
measured by the following formula:
Q(h, Stest) =
s∈Stest:
|h(s)|>0
|Sb
test|
|Stest|
(−1)isEmpty(y(s))
+J(y(s), h(s)) ,
where Sb
test is the set of all test sessions with at least one
purchase event, J(A, B) = |A∩B|
|A∪B|
is the Jaccard similarity
measure. It is easy to rewrite this expression as
Q(h, Stest) =
|Sb
test|
|Stest|
(TP − FP)
purchase score
+
s∈Stest
J(y(s), h(s))
Jaccard score
, (1)
where TP is the number of sessions with |y(s)| > 0 and |h(s)| >
0 (i.e. true positives), and FP is the number of sessions
with |y(s)| = 0 and |h(s)| > 0 (i.e. false positives). Now it
is easy to see that the score consists of two parts. The first
one gives a reward for each correctly guessed session with
buy events and a penalty for each false alarm; the absolute
values of penalty and reward are both equal to
|Sb
test|
|Stest|
. The
second part calculates the total similarity of predicted sets
of bought items to the real sets.
2.3 Purchase statistics
Figure 2: Buying rate versus number of clicked
items (left) and ID of the item with the maximum
number of clicks in session (right)
is that a higher number of items clicked during the session
leads to a higher chance of a purchase.
Lots of information could be extracted from the data by
considering item identifiers and categories clicked during the
Buying rate — fraction of buyer sessions in some subset of sessions
Feature extraction
7
•  Purchase classifier: features from session (sequence of clicks)
•  Bought item classifier: features from pair session+itemID
•  Observation: bought item is a clicked item
•  We use two types of features
•  Numerical: real number, e.g. seconds between two clicks
•  Categorical: element of the unordered set of values (levels),
e.g. ItemID
Feature extraction: session
8
1.  Start/end of the session (month, day, hour, etc.)
[numerical + categorical with few levels]
2.  Number of clicks, unique items, categories, item-category pairs
[numerical]
3.  Top 10 items and categories by the number of clicks
[categorical with ≈50k levels]
4.  ID of the first/last item clicked at least k times
[categorical with ≈50k levels]
5.  Click counts for 100 items and 50 categories that were most
popular in the whole training set
[sparse numerical]
Feature extraction:
session+ItemID
9
1.  All session features
2.  ItemID
[categorical with ≈50k levels]
3.  Timestamp of the first/last click on the item (month, day, hour, etc.)
[numerical + categorical with few levels]
4.  Number of clicks in the session for the given item
[numerical]
5.  Total duration (by analogy with dwell time) of the clicks on the item
[numerical]
•  GBM and similar ensemble learning techniques
•  useful with numerical features
•  one-hot encoding of categorical features doesn’t perform well
•  Matrix decompositions, FM
•  useful with categorical features
•  hard to incorporate numerical features because of rough (bi-linear)
model
•  We used our internal learning algorithm: MatrixNet
•  GBM with oblivious decision trees
•  trees properly handle categorical features (multi-split decision trees)
•  SVD-like decompositions for new feature value combinations
Classification method
10
Classification method
11
Oblivious decision tree with categorical features
duration > 20
user
item item
[numerical]
[categorical]
[categorical]
yes no
…
user
item item…
… … ……
•  Training classifiers
•  GB with 10k trees for each classifier
•  ≈12 hours to train both models on 150 machines
•  Making predictions
•  We made 4000 predictions per second per thread
Classification method: speed
12
Threshold optimization
13
We optimized thresholds using validation set (10% hold-out
from train-set)
1)  Maximize Jaccard score
2)  Maximize Purchase+Jaccard scores using fixed bought
item threshold
Q(h, Svalid) =
|Sb
valid|
|Svalid|
(TP FP)
| {z }
purchase score
+
X
s2Svalid
J(y(s), h(s))
| {z }
Jaccard score
,
Figure 3: Item detection threshold (above) and pur-
chase detection threshold (below) quality on the val-
idation set.
we train purchase detection and purchased item detection
classifiers. The purchase detection classifier hp(s) predicts
the outcome of the function yp(s) = isNotEmpty(y(s)) and
uses the entire training set in the learning phase. The item
detection classifier hi(s, j) approximates the indicator func-
tion yi(s, j) = I(j ∈ y(s)) and uses only sessions with bought
items in the learning phase. Of course, it would be wise to
use classifiers that output probabilities rather than binary
predictions, because in this case we will be able to select
thresholds that directly optimize evaluation metric (1) in-
stead of the classifier’s internal quality measure. So, our
final expression for the hypothesis can be written as
h(s) =
∅ if hp(s) < αp,
{j ∈ I | hi(s, j) > αi} if hp(s) ≥ αp.
(2)
3.2 Feature Extraction
We have outlined two groups of features: one describes a
session and the other describes a session-item pair. The pur-
chase detection classifier uses only session features and the
item detection classifier uses both groups. The full feature
listing can be found in Table 1; for further details, please
refer to our code2
. We give some comments on our feature
extraction decisions below.
One could use sophisticated aggregations to extract nu-
merical features that describe items and categories. How-
Figure 3: Item detection threshold (above) and pur-
chase detection threshold (below) quality on the val-
idation set.
•  Leaderboard: 63102 (1st place)
•  Purchase detection on validation (10% hold-out):
•  16% precision
•  77% recall
•  AUC 0.85
•  Purchased item detection on validation:
•  Jaccard measure 0.765
•  Features / datasets used to learn classifiers / evaluation process
can be reproduced, see our code1
Final results
14 1https://github.com/romovpa/ydf-recsys2015-challenge
1.  Observations from the problem statement
›  The task is complex but decomposable into two well-known:
binary classification of sessions and (session, ItemID)-pairs
2.  Observations from the data (user click sessions)
›  Features from sessions and (session, ItemID)-pairs
›  Easy to develop many meaningful categorical features
3.  The algorithm
›  Gradient boosting on trees with categorical features
›  No sophisticated mixtures of Machine Learning techniques: one
algorithm to work with many numerical and categorical features
Summary / Questions?
15

More Related Content

What's hot

Supply Chain Traceability Technology Tools
Supply Chain Traceability Technology ToolsSupply Chain Traceability Technology Tools
Supply Chain Traceability Technology Tools
Laura Olson
 

What's hot (8)

Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
 
Track and Trace Solution Details
Track and Trace Solution DetailsTrack and Trace Solution Details
Track and Trace Solution Details
 
Multi depot Time-dependent Vehicle Routing Problem with Heterogeneous Fleet
Multi depot Time-dependent Vehicle Routing Problem with Heterogeneous FleetMulti depot Time-dependent Vehicle Routing Problem with Heterogeneous Fleet
Multi depot Time-dependent Vehicle Routing Problem with Heterogeneous Fleet
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 
Google assistant and siri
Google assistant and siriGoogle assistant and siri
Google assistant and siri
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Supply Chain Traceability Technology Tools
Supply Chain Traceability Technology ToolsSupply Chain Traceability Technology Tools
Supply Chain Traceability Technology Tools
 

Viewers also liked

Viewers also liked (7)

Fields of Experts (доклад)
Fields of Experts (доклад)Fields of Experts (доклад)
Fields of Experts (доклад)
 
A Simple yet Efficient Method for a Credit Card Upselling Prediction
A Simple yet Efficient Method for a Credit Card Upselling PredictionA Simple yet Efficient Method for a Credit Card Upselling Prediction
A Simple yet Efficient Method for a Credit Card Upselling Prediction
 
Categorical Data Analysis in Python
Categorical Data Analysis in PythonCategorical Data Analysis in Python
Categorical Data Analysis in Python
 
Fast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approachFast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approach
 
Xgboost
XgboostXgboost
Xgboost
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 

Similar to RecSys Challenge 2015: ensemble learning with categorical features

EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_final
Shanglin Yang
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
Observations
ObservationsObservations
Observations
butest
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
Analysis of Algorithm full version 2024.pptx
Analysis of Algorithm  full version  2024.pptxAnalysis of Algorithm  full version  2024.pptx
Analysis of Algorithm full version 2024.pptx
rajesshs31r
 
Design Analysis of Alogorithm 1 ppt 2024.pptx
Design Analysis of Alogorithm 1 ppt 2024.pptxDesign Analysis of Alogorithm 1 ppt 2024.pptx
Design Analysis of Alogorithm 1 ppt 2024.pptx
rajesshs31r
 
Week 2 iLab TCO 2 — Given a simple problem, design a solutio.docx
Week 2 iLab TCO 2 — Given a simple problem, design a solutio.docxWeek 2 iLab TCO 2 — Given a simple problem, design a solutio.docx
Week 2 iLab TCO 2 — Given a simple problem, design a solutio.docx
melbruce90096
 

Similar to RecSys Challenge 2015: ensemble learning with categorical features (20)

EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_final
 
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Observations
ObservationsObservations
Observations
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
Analysis of Algorithm full version 2024.pptx
Analysis of Algorithm  full version  2024.pptxAnalysis of Algorithm  full version  2024.pptx
Analysis of Algorithm full version 2024.pptx
 
Design Analysis of Alogorithm 1 ppt 2024.pptx
Design Analysis of Alogorithm 1 ppt 2024.pptxDesign Analysis of Alogorithm 1 ppt 2024.pptx
Design Analysis of Alogorithm 1 ppt 2024.pptx
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
Week 2 iLab TCO 2 — Given a simple problem, design a solutio.docx
Week 2 iLab TCO 2 — Given a simple problem, design a solutio.docxWeek 2 iLab TCO 2 — Given a simple problem, design a solutio.docx
Week 2 iLab TCO 2 — Given a simple problem, design a solutio.docx
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Oracle ebs r12eam part2
Oracle ebs r12eam part2Oracle ebs r12eam part2
Oracle ebs r12eam part2
 
Ali upload
Ali uploadAli upload
Ali upload
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 
The relationship between test and production code quality (@ SIG)
The relationship between test and production code quality (@ SIG)The relationship between test and production code quality (@ SIG)
The relationship between test and production code quality (@ SIG)
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
Lecture 2 role of algorithms in computing
Lecture 2   role of algorithms in computingLecture 2   role of algorithms in computing
Lecture 2 role of algorithms in computing
 

More from romovpa

Глобальная дискретная оптимизация при помощи разрезов графов
Глобальная дискретная оптимизация при помощи разрезов графовГлобальная дискретная оптимизация при помощи разрезов графов
Глобальная дискретная оптимизация при помощи разрезов графов
romovpa
 

More from romovpa (10)

Машинное обучение для ваших игр и бизнеса
Машинное обучение для ваших игр и бизнесаМашинное обучение для ваших игр и бизнеса
Машинное обучение для ваших игр и бизнеса
 
Applications of Machine Learning in DOTA2: Literature Review and Practical Kn...
Applications of Machine Learning in DOTA2: Literature Review and Practical Kn...Applications of Machine Learning in DOTA2: Literature Review and Practical Kn...
Applications of Machine Learning in DOTA2: Literature Review and Practical Kn...
 
Проекты для студентов ФКН ВШЭ
Проекты для студентов ФКН ВШЭПроекты для студентов ФКН ВШЭ
Проекты для студентов ФКН ВШЭ
 
Dota Science: Роль киберспорта в обучении анализу данных
Dota Science: Роль киберспорта в обучении анализу данныхDota Science: Роль киберспорта в обучении анализу данных
Dota Science: Роль киберспорта в обучении анализу данных
 
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
 
Машинное обучение с элементами киберспорта
Машинное обучение с элементами киберспортаМашинное обучение с элементами киберспорта
Машинное обучение с элементами киберспорта
 
Факторизационные модели в рекомендательных системах
Факторизационные модели в рекомендательных системахФакторизационные модели в рекомендательных системах
Факторизационные модели в рекомендательных системах
 
Структурный SVM и отчет по курсовой
Структурный SVM и отчет по курсовойСтруктурный SVM и отчет по курсовой
Структурный SVM и отчет по курсовой
 
Структурное обучение и S-SVM
Структурное обучение и S-SVMСтруктурное обучение и S-SVM
Структурное обучение и S-SVM
 
Глобальная дискретная оптимизация при помощи разрезов графов
Глобальная дискретная оптимизация при помощи разрезов графовГлобальная дискретная оптимизация при помощи разрезов графов
Глобальная дискретная оптимизация при помощи разрезов графов
 

Recently uploaded

HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
GlendelCaroz
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
levieagacer
 

Recently uploaded (20)

ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENSANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
 
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.pptTHE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 

RecSys Challenge 2015: ensemble learning with categorical features

  • 1. RecSys Challenge 2015: ensemble learning with categorical features Peter Romov, Evgeny Sokolov
  • 2. •  Logs from e-commerce website: collection of sessions •  Session •  sequence of clicks on the item pages •  could end with or without purchase •  Click •  Timestamp •  ItemID (≈54k unique IDs) •  CategoryID (≈350 unique IDs) •  Purchase •  set of bought items with price and quantity •  Known purchases for the train-set, need to predict on the test-set Problem statement 2
  • 3. Problem statement 3 Clicks from session s Purchase (actual) Purchase (predicted) c(s) = (c1(s), . . . , cn(s)(s)) h(s) ⇡ y(s) y(s) = ( ; — no purchase {i1, . . . , im(s)} (bought items) — otherwise Evaluation measure: — Jaccard distance between two sets Sb test Stest where — all sessions from test-set — sessions from test-set with purchase J(y(s), h(s)) = |y(s) h(s)| |y(s) [ h(s)| Q(h, Stest) = X s2Stest: |h(s)|>0 8 < : |Sb test| |Stest| + J(y(s), h(s)), if y(s) 6= ; |Sb test| |Stest| , otherwise
  • 4. Problem statement 4 First observations (from the task): •  the task is uncommon (set prediction with specific loss function) •  evaluation measure could be rewritten •  the original problem can be divided into two well-known binary classification problems; 1.  predict purchase given session optimize Purchase score 2.  predict bought items given session with purchase optimize Jaccard score P(y(s) 6= ;|s) Q(h, Stest) = |Sb test| |Stest| (TP FP) | {z } purchase score + X s2Stest J(y(s), h(s)) | {z } Jaccard score , P(i 2 y(s)|s, y(s) 6= ;)
  • 5. •  Two-stage prediction •  Two binary classification models learned on the train-set •  Both classifiers require thresholds •  Set up thresholds to optimize Purchase score and Jaccard score using hold-out subsample of the train-set Solution schema 5 Strain purchase classifier bought item classifier Slearn 90% Svalid 10% s 7! P(purchase|s) (s, i) 7! P(i 2 y(s)|s, y(s) 6= ;) classifier thresholds
  • 6. Some relationships from data 6 Next observations (from the data): •  Buying rate strongly depends on time features •  Buying rate varies highly between categorical features he er- 87 e- s) es- me s. he or ht is )) , ne ty Figure 1: Dynamics of the buying rate in time Figure 2: Buying rate versus number of clicked items (left) and ID of the item with the maximum number of clicks in session (right) The total number of item IDs and category IDs is 54, 287 and 347 correspondingly. Both training and test sets be- long to an interval of 6 months. The target function y(s) corresponds to the set of items that were bought in the ses- sion s 1 . In other words, the target function gives some subset of the universal item set I for each user session s. We are given sets of bought items y(s) for all sessions s the training set Strain, and are required to predict these sets for test sessions s ∈ Stest. 2.2 Evaluation Measure Denote by h(s) a hypothesis that predicts a set of bought items for any user session s. The score of this hypothesis is measured by the following formula: Q(h, Stest) = s∈Stest: |h(s)|>0 |Sb test| |Stest| (−1)isEmpty(y(s)) +J(y(s), h(s)) , where Sb test is the set of all test sessions with at least one purchase event, J(A, B) = |A∩B| |A∪B| is the Jaccard similarity measure. It is easy to rewrite this expression as Q(h, Stest) = |Sb test| |Stest| (TP − FP) purchase score + s∈Stest J(y(s), h(s)) Jaccard score , (1) where TP is the number of sessions with |y(s)| > 0 and |h(s)| > 0 (i.e. true positives), and FP is the number of sessions with |y(s)| = 0 and |h(s)| > 0 (i.e. false positives). Now it is easy to see that the score consists of two parts. The first one gives a reward for each correctly guessed session with buy events and a penalty for each false alarm; the absolute values of penalty and reward are both equal to |Sb test| |Stest| . The second part calculates the total similarity of predicted sets of bought items to the real sets. 2.3 Purchase statistics Figure 2: Buying rate versus number of clicked items (left) and ID of the item with the maximum number of clicks in session (right) is that a higher number of items clicked during the session leads to a higher chance of a purchase. Lots of information could be extracted from the data by considering item identifiers and categories clicked during the Buying rate — fraction of buyer sessions in some subset of sessions
  • 7. Feature extraction 7 •  Purchase classifier: features from session (sequence of clicks) •  Bought item classifier: features from pair session+itemID •  Observation: bought item is a clicked item •  We use two types of features •  Numerical: real number, e.g. seconds between two clicks •  Categorical: element of the unordered set of values (levels), e.g. ItemID
  • 8. Feature extraction: session 8 1.  Start/end of the session (month, day, hour, etc.) [numerical + categorical with few levels] 2.  Number of clicks, unique items, categories, item-category pairs [numerical] 3.  Top 10 items and categories by the number of clicks [categorical with ≈50k levels] 4.  ID of the first/last item clicked at least k times [categorical with ≈50k levels] 5.  Click counts for 100 items and 50 categories that were most popular in the whole training set [sparse numerical]
  • 9. Feature extraction: session+ItemID 9 1.  All session features 2.  ItemID [categorical with ≈50k levels] 3.  Timestamp of the first/last click on the item (month, day, hour, etc.) [numerical + categorical with few levels] 4.  Number of clicks in the session for the given item [numerical] 5.  Total duration (by analogy with dwell time) of the clicks on the item [numerical]
  • 10. •  GBM and similar ensemble learning techniques •  useful with numerical features •  one-hot encoding of categorical features doesn’t perform well •  Matrix decompositions, FM •  useful with categorical features •  hard to incorporate numerical features because of rough (bi-linear) model •  We used our internal learning algorithm: MatrixNet •  GBM with oblivious decision trees •  trees properly handle categorical features (multi-split decision trees) •  SVD-like decompositions for new feature value combinations Classification method 10
  • 11. Classification method 11 Oblivious decision tree with categorical features duration > 20 user item item [numerical] [categorical] [categorical] yes no … user item item… … … ……
  • 12. •  Training classifiers •  GB with 10k trees for each classifier •  ≈12 hours to train both models on 150 machines •  Making predictions •  We made 4000 predictions per second per thread Classification method: speed 12
  • 13. Threshold optimization 13 We optimized thresholds using validation set (10% hold-out from train-set) 1)  Maximize Jaccard score 2)  Maximize Purchase+Jaccard scores using fixed bought item threshold Q(h, Svalid) = |Sb valid| |Svalid| (TP FP) | {z } purchase score + X s2Svalid J(y(s), h(s)) | {z } Jaccard score , Figure 3: Item detection threshold (above) and pur- chase detection threshold (below) quality on the val- idation set. we train purchase detection and purchased item detection classifiers. The purchase detection classifier hp(s) predicts the outcome of the function yp(s) = isNotEmpty(y(s)) and uses the entire training set in the learning phase. The item detection classifier hi(s, j) approximates the indicator func- tion yi(s, j) = I(j ∈ y(s)) and uses only sessions with bought items in the learning phase. Of course, it would be wise to use classifiers that output probabilities rather than binary predictions, because in this case we will be able to select thresholds that directly optimize evaluation metric (1) in- stead of the classifier’s internal quality measure. So, our final expression for the hypothesis can be written as h(s) = ∅ if hp(s) < αp, {j ∈ I | hi(s, j) > αi} if hp(s) ≥ αp. (2) 3.2 Feature Extraction We have outlined two groups of features: one describes a session and the other describes a session-item pair. The pur- chase detection classifier uses only session features and the item detection classifier uses both groups. The full feature listing can be found in Table 1; for further details, please refer to our code2 . We give some comments on our feature extraction decisions below. One could use sophisticated aggregations to extract nu- merical features that describe items and categories. How- Figure 3: Item detection threshold (above) and pur- chase detection threshold (below) quality on the val- idation set.
  • 14. •  Leaderboard: 63102 (1st place) •  Purchase detection on validation (10% hold-out): •  16% precision •  77% recall •  AUC 0.85 •  Purchased item detection on validation: •  Jaccard measure 0.765 •  Features / datasets used to learn classifiers / evaluation process can be reproduced, see our code1 Final results 14 1https://github.com/romovpa/ydf-recsys2015-challenge
  • 15. 1.  Observations from the problem statement ›  The task is complex but decomposable into two well-known: binary classification of sessions and (session, ItemID)-pairs 2.  Observations from the data (user click sessions) ›  Features from sessions and (session, ItemID)-pairs ›  Easy to develop many meaningful categorical features 3.  The algorithm ›  Gradient boosting on trees with categorical features ›  No sophisticated mixtures of Machine Learning techniques: one algorithm to work with many numerical and categorical features Summary / Questions? 15