SlideShare a Scribd company logo
1 of 45
Download to read offline
WSDM Challenge
Recommender Systems
Kenneth Emeka Odoh
25 Jan, 2018 ( Kaggle Meetup | SFU Ventures Labs )
Vancouver, BC
Talk Plan
● Bio
● Competition
● Winning Solutions
● My solution
● Conclusions
● References
2
Kenneth Emeka Odoh
Bio
Education
● Masters in Computer Science (University of Regina) 2013 - 2016
Work
● Research assistant in the field of Visual Analytics 2013 - 2016
○ ( https://scholar.google.com/citations?user=3G2ncgwAAAAJ&hl=en )
● Software engineer working in an IOT startup (Current position) 2017 -
3Kenneth Emeka Odoh
Competition
● Competition rule
○ Only 5 submissions per day
● Data Exploration
○ Data visualization
● Evaluation
● Primer on recommender systems
4
Kenneth Emeka Odoh
Nature of Data
● Data of Listening habits.
● Data are chronologically ordered.
○ Implicit time series data without
timestamps.
○ Timestamps are excluded for sake of
avoiding data leakages.
● Balanced data sets.
What to predict
● Given a data set of listening habit of
users.
● If a song has been listened in recent
past, predict if it will be listening to
again in the same month.
● If the song is listened to again by the
same user in the same month, output
1 or 0 otherwise
Competition: Data Exploration
5
Kenneth Emeka Odoh
Competition: Background
● URL ( https://www.kaggle.com/c/kkbox-music-recommendation-challenge )
● Sister competition on churn prediction by same company.
● Total price $5000
Class of Problem
● Classification
● Sequence prediction ? 6Kenneth Emeka Odoh
Competition: Data files
● train.csv
○ msno, song_id, source_system_tab, source_screen_name, source_type,
target
● songs.csv
○ song_id, song_length, genre_ids, artist_name, composer, lyricist,
language
● members.csv
○ msno, city, bd, gender, registered_via, registration_init_time,
expiration_date
● song_extra_info.csv
○ song_id, song name, isrc
7
Kenneth Emeka Odoh
Competition: Visualization of Data
8
Figure 1: Count vs Source typeKenneth Emeka Odoh
[jeru666]
Competition: Data files
9Figure 2: Count vs Source type grouped by labels
Kenneth Emeka Odoh
[jeru666]
Competition: Data files
10Figure 3: Source screen names vs Count
Kenneth Emeka Odoh
[jeru666]
Competition: Data files
11
Figure 4: Count vs Source screen names grouped by labelKenneth Emeka Odoh
[jeru666]
Competition: Data files
12
Figure 5: Count vs Source system tabKenneth Emeka Odoh
[jeru666]
Competition: Data files
13
Figure 6: Proportion by SexKenneth Emeka Odoh
[jeru666]
Competition: Data files
14
Figure 7: Proportion by Sex grouped by labelKenneth Emeka Odoh
[jeru666]
Competition: Data files
15
Figure 8: Proportion by LanguageKenneth Emeka Odoh
[jeru666]
Competition: Data files
16
Figure 9: City vs CountKenneth Emeka Odoh
[jeru666]
Competition: Evaluation (AUC)
Image credit: Walber & http://gim.unmc.edu/dxtests/roc2.htm
17
Optimizing PR space -> optimizing ROC space [Davis .et.al]
Kenneth Emeka Odoh
Evaluation Metric
● AUC per user vs AUC per group ?
● What about GAUC?
○ calculate the average of the AUC scores of each user.
■ Cold start problem
● AUC vs F-score vs RMSE ?
18Kenneth Emeka Odoh
Competition: Recommender Systems
19
Kenneth Emeka Odoh
U: set of users
I : set of items
R: set of rating of users on items.
Iu
: set of items rated by a user, u.
f : U x I→ R
This is usually a sparse matrix and as such so
form of data imputation is needed
The most common form of recommender systems
● Best-item recommendation
● Top-N recommendation
For a user, ua
, and i* has estimated the highest
value
i* = argmax f(ua
, j )
j ∊ IIu
20
Kenneth Emeka Odoh
WINNING SOLUTIONS
21
Kenneth Emeka Odoh
1st Place Solution
● Similar to Click Value Rate (CVR)
○ Click-through rate (CTR) is the ratio of users who click on a specific link to the number of total users who
accessed the resource.
● Relistening is like purchasing, and first listening is like clicking.
● Not suitable for latent-factor based or neighborhood-based collaborative filtering is not the best
way for this problem.
● Missing not at random.
● Data is sparse as a user-song pair is very informative.
● Data is time sensitive.
Kenneth Emeka Odoh
22
Ensembling
0.6 * LightGBMs and 0.4 * NNs
● Single LightGBMs is 0.74440 on public LB with a learning rate of 0.1.
● Single LightGBMs is 0.74460 on public LB with a learning rate of 0.05.
● Single LightGBMs with bagging is 0.74584 on public LB with a learning rate of 0.05.
For NNs, my best 30-ensemble on the same feature set can get 0.74215, and bagging ensemble of NNs on
different feature set can get 0.74333.
The predictions of LightGBMs and NNs are quite uncorrelated (correlation coefficient is about 0.92). The
ensemble of single LightGBM and single 30-ensemble of NN can get 0.7489+. The ensemble of ensembles
can reach 0.7498+.
Data preprocessing
https://kaggle2.blob.core.windows.net/forum-message-attachments/259372/8131/model.png
Kenneth Emeka Odoh
23
Analyzing Missing Data
Missing-data mechanisms
● Missing completely at random: This occurs when the probability of missing is same for all cases.
● Missing at random: The probability of missing depends only on available information.
● Missingness that depends on unobserved predictors
● Missingness that depends on the missing value itself.
Handling missing data
● Exclude missing data
● Mean / median imputation
● Last value carried forward
● Using information from related observation
24
Kenneth Emeka Odoh
[http://www.stat.columbia.edu/~gelman/arm/missing.pdf]
3rd Place Solution
Identified how to generate features from the future. We can listen or not listen to the same artist by
the same user in the future.
Add matrix factorization to the feature to use xgboost and catboost.
last 35% for future feature
earlier 65% for history
The size of X_train, X_val was about 20% of data.
Kenneth Emeka Odoh
25
The history can be generated using target. Other features uses all data. The generated features in the
context of categorical features.
1) mean of target from history by categorical features (pairs and triples)
2) count – the number of observations in all data by categorical features (pairs and triples)
3) regression – linear regression target by categorical features (msno, msno + genre_ids, msno+composer
and so on)
4) time_from_prev_heard – time from last heard by categorical features (msno, msno + genre_ids,
msno+composer and so on)
5) time_to_next_heard – time to next heard by categorical features (pairs and triples which include msno)
6) last_time_diff – time to the last heard by categorical features
7) part_of_unique_song – the share of unique song of artist that the user heard on all unique song of artist
26Kenneth Emeka Odoh
8) matrix_factorization – LightFM. I use as a user msno in different context (source_type) and the msno
was the feature for user.
I use last 35% for fit. Also I drop last 5% and again use 35%. Such a procedure could be repeated many
times and this greatly improved the score (I did 6 times on validation and this gave an increase of 0.005, but
in the end I did not have time to do everything and did only 2 times on test).
As a result, I fit xgboost and catboost on two parts of the data using and without the matrix_factorization
feature. And finally I blend all 8 predictions.
The code is available here:
https://github.com/VasiliyRubtsov/wsdm_music_recommendations/blob/master/pipeline.ipynb
27Kenneth Emeka Odoh
Factorization Machine
● By factoring into a number of latent vectors.
● lends itself to kernels.
Let us dive into factorization in a new thread.
https://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
Observed Characteristics of factorization machine
● Ideal for handling missing data
28
Kenneth Emeka Odoh
6th Place Solution
I use an ensemble (average) of lightgbm. The best single model public LB 0.726
Feature Engineering
1) raw features for both song, user and context
2) SVD matrix factorization latent space of user-song interaction matrix
3) SVD matrix factorization score of user-song interaction matrix
4) collaborative filtering (CF) score (top 100 neighborhoods ) of user-song interaction
5) similarity of artist, lyricist, composer, genre,language, source_system_tab, source_type,
soure_screen_name,within each user.
6) source_system_tab, source_type, soure_screen_name categorical variable rate of each user.
7) word embedding of artist, lyricist, composer, genre.
I use 10 cross validation, where train / valid set are splitted randomly.
Kenneth Emeka Odoh
29
SVD
30
[https://intoli.com/blog/pca-and-svd/]
Kenneth Emeka Odoh
Word Embeddings
31[https://deeplearning4j.org/word2vec.html]Kenneth Emeka Odoh
My Solution
● Primer on LSTM architectures.
○ Bidirectional lstm.
○ Batch normalization
● Intuition / guide for building reasonable neural architecture.
● Tricks to avoid overfitting.
● Optimizing loss function using AUC.
● Cross validation of time series data.
● Discuss failures in my traditional ML models.
○ Try same model with different custom loss functions
32
Kenneth Emeka Odoh
LSTM
● LSTM is RNN trained using BPTT ( solves
vanishing gradient problem ).
● Capture long term temporal dependency.
● A memory blocks used in place of neuron.
There are three types of gates in a memory block:
○ Forget Gate: decides information to be
removed from the unit.
○ Input Gate: decides information to update
memory state.
○ Output Gate: decides information to output
based on input and the memory state.
Goodfellow, et.al. Deep Learning, 2016
33
Kenneth Emeka Odoh
34[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
35[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
LSTM Architecture
iooi
[Google images]
See tutorial on subject [https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/]
Many-to-many model (boundary detection problem)
36
Kenneth Emeka Odoh
[Fei-Fei et. al., 2017]
● Bidirectional LSTMs can be trained
using past and future data of a time
frame.
● Bidirectional LSTMs can improve
performance on a number of sequence
classification.
● Bidirectional LSTMs is two lstms
where the 1st lstm accept input as-is,
and 2nd accept the input in reverse
order.
Bidirectional LSTM
37
[Yu, Zhou et al.]
Kenneth Emeka Odoh
Batch Normalization
● Normalization in neural architecture.
● Reduce covariance shift thereby improving
training speed.
● Control gradient flow in the network
thereby minimizing saturation in non-relu
based activation.
● Fix distribution between training and
testing set thereby enhancing
generalization.
● Implicit regularization.
[Ioffe et. al 2015]
38
Kenneth Emeka Odoh
Overfitting
● Implicit Regularization
○ Batch normalization
○ Early stopping [https://en.wikipedia.org/wiki/Early_stopping]
● Dropout
39
Kenneth Emeka Odoh
[Srivastava et. al., 2014]
Cross-Validation of Time-Series
● Don’t trust the
leaderboard.
● Trust local CV for model
comparison.
40
Kenneth Emeka Odoh
My Solution: Code
def model_relu6():
model = Sequential()
model.add( Embedding(inputDim*inputDim*inputDim, inputDim, dropout=0.2) )
#input vector dimension
model.add(Convolution1D(nb_filter= nb_filter, filter_length= filter_length,
border_mode='valid', activation='relu',subsample_length=1))
model.add(MaxPooling1D(pool_length= pool_length))
model.add( Bidirectional( LSTM(1024, return_sequences=True) ))
model.add(LeakyReLU())
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add( Bidirectional( LSTM(2048, return_sequences=True)) )
model.add(Activation('relu'))
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add( Bidirectional( LSTM(512, return_sequences=True)) )
model.add(Activation('relu'))
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add( Bidirectional( LSTM(256)) )
model.add(Activation('relu'))
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',metrics=[jacek_auc,discussion41015_auc])
return model
41Code: https://github.com/kenluck2001/KaggleKenneth/tree/master/wsdm_Kaggle
#full code here ( https://www.kaggle.com/rspadim/gini-keras-callback-earlystopping-validation )
# FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41108
def jacek_auc(y_true, y_pred):
score, up_opt = tf.metrics.auc(y_true, y_pred)
#score, up_opt = tf.contrib.metrics.streaming_auc(y_pred, y_true)
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([up_opt]):
score = tf.identity(score)
return score
# FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41015
# AUC for a binary classifier
def discussion41015_auc(y_true, y_pred):
ptas = tf.stack([binary_PTA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
pfas = tf.stack([binary_PFA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
pfas = tf.concat([tf.ones((1,)) ,pfas],axis=0)
binSizes = -(pfas[1:]-pfas[:-1])
s = ptas*binSizes
return K.sum(s, axis=0)
# PFA, prob false alert for binary classifier
def binary_PFA(y_true, y_pred, threshold=K.variable(value=0.5)):
y_pred = K.cast(y_pred >= threshold, 'float32')
# N = total number of negative labels
N = K.sum(1 - y_true)
# FP = total number of false alerts, alerts from the negative class labels
FP = K.sum(y_pred - y_pred * y_true)
return FP/N
# P_TA prob true alerts for binary classifier
def binary_PTA(y_true, y_pred, threshold=K.variable(value=0.5)):
y_pred = K.cast(y_pred >= threshold, 'float32')
# P = total number of positive labels
P = K.sum(y_true)
# TP = total number of correct alerts, alerts from the positive class labels
TP = K.sum(y_pred * y_true)
return TP/P
Kenneth Emeka Odoh
Private board: 0.59
Alternatives
● Trees (catboost, lightboost)
Other that could work with careful effort
● HMM
● MARS
● ANFIS
● Logistic regression
● Naive Bayes
42Kenneth Emeka Odoh
Conclusions
● Building models versus building infrastructure trade off.
● Debugging a failing model versus trying out a new model.
● Feature engineering versus better algorithm.
● Neural depth versus training size.
● “Ensembling work when the conditions are right”.
○ Provide empirical justification.
● The place of magic numbers.
● “Leakages may not actually be a bad thing”.
43
Kenneth Emeka Odoh
Thanks for listening
@kenluck2001
44
https://www.linkedin.com/in/kenluck2001/
kenneth.odoh@gmail.com
Kenneth Emeka Odoh
References
Yu, Zhou et al. “Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of
non-native spontaneous speech.” ASRU (2015).
Ioffe, S. and Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd
International Conference on Machine Learning, in PMLR 37:448-456 (2015).
Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2010. Recommender Systems Handbook (1st ed.). Springer-Verlag New York, Inc.,
New York, NY, USA.
Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT Press.
Jeru666, Visualization script, https://www.kaggle.com/jeru666/data-exploration-and-analysis .
Davis, J. & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. ICML '06: Proceedings of the 23rd international
conference on Machine learning 233--240, New York, NY, USA
Fei-Fei Li & Justin Johnson & Serena Yeung, (2017), http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from
overfitting.. Journal of Machine Learning Research, 15, 1929-1958.
45
Kenneth Emeka Odoh

More Related Content

What's hot

Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...ActiveEon
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithmsAboul Ella Hassanien
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)HJ van Veen
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learningRyohei Suzuki
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Comparative Performance Analysis & Complexity of Different Sorting Algorithm
Comparative Performance Analysis & Complexity of Different Sorting AlgorithmComparative Performance Analysis & Complexity of Different Sorting Algorithm
Comparative Performance Analysis & Complexity of Different Sorting Algorithmpaperpublications3
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用Ryo Iwaki
 
Data structure and algorithm notes
Data structure and algorithm notesData structure and algorithm notes
Data structure and algorithm notessuman khadka
 

What's hot (20)

Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithms
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
 
Unit1 pg math model
Unit1 pg math modelUnit1 pg math model
Unit1 pg math model
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
505 260-266
505 260-266505 260-266
505 260-266
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Comparative Performance Analysis & Complexity of Different Sorting Algorithm
Comparative Performance Analysis & Complexity of Different Sorting AlgorithmComparative Performance Analysis & Complexity of Different Sorting Algorithm
Comparative Performance Analysis & Complexity of Different Sorting Algorithm
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Data structure and algorithm notes
Data structure and algorithm notesData structure and algorithm notes
Data structure and algorithm notes
 

Similar to Kaggle kenneth

IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVMIRJET Journal
 
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Weiyang Tong
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...multimediaeval
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...Duke Network Analysis Center
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Lippo Group Digital
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsAravindharamanan S
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Varun Ojha
 
Music Genre Classification using Machine Learning
Music Genre Classification using Machine LearningMusic Genre Classification using Machine Learning
Music Genre Classification using Machine LearningIRJET Journal
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2Shrayes Ramesh
 

Similar to Kaggle kenneth (20)

IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratings
 
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
 
Manos
ManosManos
Manos
 
Music Genre Classification using Machine Learning
Music Genre Classification using Machine LearningMusic Genre Classification using Machine Learning
Music Genre Classification using Machine Learning
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

Kaggle kenneth

  • 1. WSDM Challenge Recommender Systems Kenneth Emeka Odoh 25 Jan, 2018 ( Kaggle Meetup | SFU Ventures Labs ) Vancouver, BC
  • 2. Talk Plan ● Bio ● Competition ● Winning Solutions ● My solution ● Conclusions ● References 2 Kenneth Emeka Odoh
  • 3. Bio Education ● Masters in Computer Science (University of Regina) 2013 - 2016 Work ● Research assistant in the field of Visual Analytics 2013 - 2016 ○ ( https://scholar.google.com/citations?user=3G2ncgwAAAAJ&hl=en ) ● Software engineer working in an IOT startup (Current position) 2017 - 3Kenneth Emeka Odoh
  • 4. Competition ● Competition rule ○ Only 5 submissions per day ● Data Exploration ○ Data visualization ● Evaluation ● Primer on recommender systems 4 Kenneth Emeka Odoh
  • 5. Nature of Data ● Data of Listening habits. ● Data are chronologically ordered. ○ Implicit time series data without timestamps. ○ Timestamps are excluded for sake of avoiding data leakages. ● Balanced data sets. What to predict ● Given a data set of listening habit of users. ● If a song has been listened in recent past, predict if it will be listening to again in the same month. ● If the song is listened to again by the same user in the same month, output 1 or 0 otherwise Competition: Data Exploration 5 Kenneth Emeka Odoh
  • 6. Competition: Background ● URL ( https://www.kaggle.com/c/kkbox-music-recommendation-challenge ) ● Sister competition on churn prediction by same company. ● Total price $5000 Class of Problem ● Classification ● Sequence prediction ? 6Kenneth Emeka Odoh
  • 7. Competition: Data files ● train.csv ○ msno, song_id, source_system_tab, source_screen_name, source_type, target ● songs.csv ○ song_id, song_length, genre_ids, artist_name, composer, lyricist, language ● members.csv ○ msno, city, bd, gender, registered_via, registration_init_time, expiration_date ● song_extra_info.csv ○ song_id, song name, isrc 7 Kenneth Emeka Odoh
  • 8. Competition: Visualization of Data 8 Figure 1: Count vs Source typeKenneth Emeka Odoh [jeru666]
  • 9. Competition: Data files 9Figure 2: Count vs Source type grouped by labels Kenneth Emeka Odoh [jeru666]
  • 10. Competition: Data files 10Figure 3: Source screen names vs Count Kenneth Emeka Odoh [jeru666]
  • 11. Competition: Data files 11 Figure 4: Count vs Source screen names grouped by labelKenneth Emeka Odoh [jeru666]
  • 12. Competition: Data files 12 Figure 5: Count vs Source system tabKenneth Emeka Odoh [jeru666]
  • 13. Competition: Data files 13 Figure 6: Proportion by SexKenneth Emeka Odoh [jeru666]
  • 14. Competition: Data files 14 Figure 7: Proportion by Sex grouped by labelKenneth Emeka Odoh [jeru666]
  • 15. Competition: Data files 15 Figure 8: Proportion by LanguageKenneth Emeka Odoh [jeru666]
  • 16. Competition: Data files 16 Figure 9: City vs CountKenneth Emeka Odoh [jeru666]
  • 17. Competition: Evaluation (AUC) Image credit: Walber & http://gim.unmc.edu/dxtests/roc2.htm 17 Optimizing PR space -> optimizing ROC space [Davis .et.al] Kenneth Emeka Odoh
  • 18. Evaluation Metric ● AUC per user vs AUC per group ? ● What about GAUC? ○ calculate the average of the AUC scores of each user. ■ Cold start problem ● AUC vs F-score vs RMSE ? 18Kenneth Emeka Odoh
  • 20. U: set of users I : set of items R: set of rating of users on items. Iu : set of items rated by a user, u. f : U x I→ R This is usually a sparse matrix and as such so form of data imputation is needed The most common form of recommender systems ● Best-item recommendation ● Top-N recommendation For a user, ua , and i* has estimated the highest value i* = argmax f(ua , j ) j ∊ IIu 20 Kenneth Emeka Odoh
  • 22. 1st Place Solution ● Similar to Click Value Rate (CVR) ○ Click-through rate (CTR) is the ratio of users who click on a specific link to the number of total users who accessed the resource. ● Relistening is like purchasing, and first listening is like clicking. ● Not suitable for latent-factor based or neighborhood-based collaborative filtering is not the best way for this problem. ● Missing not at random. ● Data is sparse as a user-song pair is very informative. ● Data is time sensitive. Kenneth Emeka Odoh 22
  • 23. Ensembling 0.6 * LightGBMs and 0.4 * NNs ● Single LightGBMs is 0.74440 on public LB with a learning rate of 0.1. ● Single LightGBMs is 0.74460 on public LB with a learning rate of 0.05. ● Single LightGBMs with bagging is 0.74584 on public LB with a learning rate of 0.05. For NNs, my best 30-ensemble on the same feature set can get 0.74215, and bagging ensemble of NNs on different feature set can get 0.74333. The predictions of LightGBMs and NNs are quite uncorrelated (correlation coefficient is about 0.92). The ensemble of single LightGBM and single 30-ensemble of NN can get 0.7489+. The ensemble of ensembles can reach 0.7498+. Data preprocessing https://kaggle2.blob.core.windows.net/forum-message-attachments/259372/8131/model.png Kenneth Emeka Odoh 23
  • 24. Analyzing Missing Data Missing-data mechanisms ● Missing completely at random: This occurs when the probability of missing is same for all cases. ● Missing at random: The probability of missing depends only on available information. ● Missingness that depends on unobserved predictors ● Missingness that depends on the missing value itself. Handling missing data ● Exclude missing data ● Mean / median imputation ● Last value carried forward ● Using information from related observation 24 Kenneth Emeka Odoh [http://www.stat.columbia.edu/~gelman/arm/missing.pdf]
  • 25. 3rd Place Solution Identified how to generate features from the future. We can listen or not listen to the same artist by the same user in the future. Add matrix factorization to the feature to use xgboost and catboost. last 35% for future feature earlier 65% for history The size of X_train, X_val was about 20% of data. Kenneth Emeka Odoh 25
  • 26. The history can be generated using target. Other features uses all data. The generated features in the context of categorical features. 1) mean of target from history by categorical features (pairs and triples) 2) count – the number of observations in all data by categorical features (pairs and triples) 3) regression – linear regression target by categorical features (msno, msno + genre_ids, msno+composer and so on) 4) time_from_prev_heard – time from last heard by categorical features (msno, msno + genre_ids, msno+composer and so on) 5) time_to_next_heard – time to next heard by categorical features (pairs and triples which include msno) 6) last_time_diff – time to the last heard by categorical features 7) part_of_unique_song – the share of unique song of artist that the user heard on all unique song of artist 26Kenneth Emeka Odoh
  • 27. 8) matrix_factorization – LightFM. I use as a user msno in different context (source_type) and the msno was the feature for user. I use last 35% for fit. Also I drop last 5% and again use 35%. Such a procedure could be repeated many times and this greatly improved the score (I did 6 times on validation and this gave an increase of 0.005, but in the end I did not have time to do everything and did only 2 times on test). As a result, I fit xgboost and catboost on two parts of the data using and without the matrix_factorization feature. And finally I blend all 8 predictions. The code is available here: https://github.com/VasiliyRubtsov/wsdm_music_recommendations/blob/master/pipeline.ipynb 27Kenneth Emeka Odoh
  • 28. Factorization Machine ● By factoring into a number of latent vectors. ● lends itself to kernels. Let us dive into factorization in a new thread. https://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf Observed Characteristics of factorization machine ● Ideal for handling missing data 28 Kenneth Emeka Odoh
  • 29. 6th Place Solution I use an ensemble (average) of lightgbm. The best single model public LB 0.726 Feature Engineering 1) raw features for both song, user and context 2) SVD matrix factorization latent space of user-song interaction matrix 3) SVD matrix factorization score of user-song interaction matrix 4) collaborative filtering (CF) score (top 100 neighborhoods ) of user-song interaction 5) similarity of artist, lyricist, composer, genre,language, source_system_tab, source_type, soure_screen_name,within each user. 6) source_system_tab, source_type, soure_screen_name categorical variable rate of each user. 7) word embedding of artist, lyricist, composer, genre. I use 10 cross validation, where train / valid set are splitted randomly. Kenneth Emeka Odoh 29
  • 32. My Solution ● Primer on LSTM architectures. ○ Bidirectional lstm. ○ Batch normalization ● Intuition / guide for building reasonable neural architecture. ● Tricks to avoid overfitting. ● Optimizing loss function using AUC. ● Cross validation of time series data. ● Discuss failures in my traditional ML models. ○ Try same model with different custom loss functions 32 Kenneth Emeka Odoh
  • 33. LSTM ● LSTM is RNN trained using BPTT ( solves vanishing gradient problem ). ● Capture long term temporal dependency. ● A memory blocks used in place of neuron. There are three types of gates in a memory block: ○ Forget Gate: decides information to be removed from the unit. ○ Input Gate: decides information to update memory state. ○ Output Gate: decides information to output based on input and the memory state. Goodfellow, et.al. Deep Learning, 2016 33 Kenneth Emeka Odoh
  • 34. 34[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
  • 35. 35[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
  • 36. LSTM Architecture iooi [Google images] See tutorial on subject [https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/] Many-to-many model (boundary detection problem) 36 Kenneth Emeka Odoh [Fei-Fei et. al., 2017]
  • 37. ● Bidirectional LSTMs can be trained using past and future data of a time frame. ● Bidirectional LSTMs can improve performance on a number of sequence classification. ● Bidirectional LSTMs is two lstms where the 1st lstm accept input as-is, and 2nd accept the input in reverse order. Bidirectional LSTM 37 [Yu, Zhou et al.] Kenneth Emeka Odoh
  • 38. Batch Normalization ● Normalization in neural architecture. ● Reduce covariance shift thereby improving training speed. ● Control gradient flow in the network thereby minimizing saturation in non-relu based activation. ● Fix distribution between training and testing set thereby enhancing generalization. ● Implicit regularization. [Ioffe et. al 2015] 38 Kenneth Emeka Odoh
  • 39. Overfitting ● Implicit Regularization ○ Batch normalization ○ Early stopping [https://en.wikipedia.org/wiki/Early_stopping] ● Dropout 39 Kenneth Emeka Odoh [Srivastava et. al., 2014]
  • 40. Cross-Validation of Time-Series ● Don’t trust the leaderboard. ● Trust local CV for model comparison. 40 Kenneth Emeka Odoh
  • 41. My Solution: Code def model_relu6(): model = Sequential() model.add( Embedding(inputDim*inputDim*inputDim, inputDim, dropout=0.2) ) #input vector dimension model.add(Convolution1D(nb_filter= nb_filter, filter_length= filter_length, border_mode='valid', activation='relu',subsample_length=1)) model.add(MaxPooling1D(pool_length= pool_length)) model.add( Bidirectional( LSTM(1024, return_sequences=True) )) model.add(LeakyReLU()) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add( Bidirectional( LSTM(2048, return_sequences=True)) ) model.add(Activation('relu')) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add( Bidirectional( LSTM(512, return_sequences=True)) ) model.add(Activation('relu')) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add( Bidirectional( LSTM(256)) ) model.add(Activation('relu')) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam',metrics=[jacek_auc,discussion41015_auc]) return model 41Code: https://github.com/kenluck2001/KaggleKenneth/tree/master/wsdm_Kaggle #full code here ( https://www.kaggle.com/rspadim/gini-keras-callback-earlystopping-validation ) # FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41108 def jacek_auc(y_true, y_pred): score, up_opt = tf.metrics.auc(y_true, y_pred) #score, up_opt = tf.contrib.metrics.streaming_auc(y_pred, y_true) K.get_session().run(tf.local_variables_initializer()) with tf.control_dependencies([up_opt]): score = tf.identity(score) return score # FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41015 # AUC for a binary classifier def discussion41015_auc(y_true, y_pred): ptas = tf.stack([binary_PTA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0) pfas = tf.stack([binary_PFA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0) pfas = tf.concat([tf.ones((1,)) ,pfas],axis=0) binSizes = -(pfas[1:]-pfas[:-1]) s = ptas*binSizes return K.sum(s, axis=0) # PFA, prob false alert for binary classifier def binary_PFA(y_true, y_pred, threshold=K.variable(value=0.5)): y_pred = K.cast(y_pred >= threshold, 'float32') # N = total number of negative labels N = K.sum(1 - y_true) # FP = total number of false alerts, alerts from the negative class labels FP = K.sum(y_pred - y_pred * y_true) return FP/N # P_TA prob true alerts for binary classifier def binary_PTA(y_true, y_pred, threshold=K.variable(value=0.5)): y_pred = K.cast(y_pred >= threshold, 'float32') # P = total number of positive labels P = K.sum(y_true) # TP = total number of correct alerts, alerts from the positive class labels TP = K.sum(y_pred * y_true) return TP/P Kenneth Emeka Odoh Private board: 0.59
  • 42. Alternatives ● Trees (catboost, lightboost) Other that could work with careful effort ● HMM ● MARS ● ANFIS ● Logistic regression ● Naive Bayes 42Kenneth Emeka Odoh
  • 43. Conclusions ● Building models versus building infrastructure trade off. ● Debugging a failing model versus trying out a new model. ● Feature engineering versus better algorithm. ● Neural depth versus training size. ● “Ensembling work when the conditions are right”. ○ Provide empirical justification. ● The place of magic numbers. ● “Leakages may not actually be a bad thing”. 43 Kenneth Emeka Odoh
  • 45. References Yu, Zhou et al. “Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech.” ASRU (2015). Ioffe, S. and Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, in PMLR 37:448-456 (2015). Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2010. Recommender Systems Handbook (1st ed.). Springer-Verlag New York, Inc., New York, NY, USA. Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT Press. Jeru666, Visualization script, https://www.kaggle.com/jeru666/data-exploration-and-analysis . Davis, J. & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. ICML '06: Proceedings of the 23rd international conference on Machine learning 233--240, New York, NY, USA Fei-Fei Li & Justin Johnson & Serena Yeung, (2017), http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting.. Journal of Machine Learning Research, 15, 1929-1958. 45 Kenneth Emeka Odoh