SlideShare a Scribd company logo
Entity embeddings for
categorical data
Paul Skeie
2
Outline
• Background
• Datarepresentations
• Gradient boosted trees
• Deep Learning
• Entity embeddings
3
Exceeded 1 million users in 2017
Collaborativeandcompetitive datascience
Gradient boosted trees win most contests with tabular/structureddata
Deep Learning wins when datais unstructured images/text/sound
Standard modeling activitiesStatistics or machine learning, most activities are common
Select
model
Select
inputs
Train model
Test model
on unseen
data
Evaluate
performance
Success?
Modelling
activities
Supervised learning needs labeled data
SELECT GROUND TRUTH TO TARGET THE TRAINING AGAINST
Requires experts with deep understanding in the field
FEATURE ENGINEERING – FIND RELEVANT INPUTS
The risk of overfitting is high when the model has
many parameters
COMMON PITFALL - OVERFITTING
Nodes are randomly dropped so that the rest must readjust
DEEP LEARNING AVOIDS OVERFITTING USING DROPOUT
DATA
DISCOVERY
5
Artificial neural networks - Some highlights from timeline
Snipped from https://www.scaruffi.com/mind/ai.pdf
6
• Internet produces massivedatasets
• Powerful GPUs developed primarily for gaming
• Improved algorithms
• Better ways of mitigating overfitting
Artificial Neural networks – Why now?
7
Deep learning in the industry
Jeff Dean
Google Brain Team
AI Frontiers
Trends and Developments in Deep Learning Research
8
Gradient boosted trees
9
Data representations, decomposing a vector
x
y
v
u
u
v
V =
We can decompose
the vector V into a
vector of length u
directed along the x
axis, and a vector of
length v directed
along the y axis.
V
10
Data representations, vector length and direction
x
y u
v
V =
V
V
α
=
V
α
Both these data
representationsdefine
the same vector.
How you want to feed
this information to the
learning algorithm
depends on what
you’re aiming to
predict.
If this vector would represent wind in the horizontal plane, and we want to predict the power output from
a wind turbine, which we happen to know is a function of the wind speed, feeding in
to the learning algorithm makes a lot of sense.
𝑉 = 𝑢2 + 𝑣2This way the learning algorithm doesn’t need to figure out Pythagoras on it’s own.
However, with enough training data, a neural network could figure this out.
𝑉
11
Data representations, cyclic variables
x
y u
v
V =
V
V
α
=
V
α
Cyclic variables needs
special consideration.
Angle α, the angle
between 0° and 359°
is only 1°, this is not
obvious to a learning
algorithm.
12
Neural networks can learn new data representations
Compute Graph
𝑃 𝑥 = 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑺𝒊𝒈𝒎𝒐𝒊𝒅
Neural network architecture subject to change
Width - Depth Logistic Regression
𝜎 =
1
1 + 𝑒−𝑥
13
An artificial neural network is just a series of matrix operations
𝑧 = 𝑊𝑥 + 𝑏
𝑎 = 𝜎(𝑧)
14
A simple neural network
𝑧1
𝑧2
𝑧3
=
𝑤11 𝑤12 𝑤13
𝑤21 𝑤22 𝑤23
𝑤31 𝑤32 𝑤33
∙
𝑥1
𝑥2
𝑥3
+
𝑏1
𝑏2
𝑏3
𝑎1
𝑎2
𝑎3
=
𝑟𝑒𝐿𝑈(𝑧1)
𝑟𝑒𝐿𝑈(𝑧2)
𝑟𝑒𝐿𝑈(𝑧3)
Linear transformation
Apply non-linearity
15
Instead of feature inputs, use activations from previous
layer as input
𝑧1
[𝑛+1]
𝑧2
[𝑛+1]
𝑧3
[𝑛+1]
=
𝑤11 𝑤12 𝑤13
𝑤21 𝑤22 𝑤23
𝑤31 𝑤32 𝑤33
∙
𝑎1
𝑛
𝑎2
𝑛
𝑎3
𝑛
+
𝑏1
𝑏2
𝑏3
𝑎1
[𝑛+1]
𝑎2
[𝑛+1]
𝑎3
[𝑛+1]
=
𝑟𝑒𝐿𝑈(𝑧1
𝑛+1
)
𝑟𝑒𝐿𝑈(𝑧2
[𝑛+1]
)
𝑟𝑒𝐿𝑈(𝑧3
[𝑛+1]
)
16
Logistic regression
𝑧 = 𝑤11 𝑤12 𝑤13 ∙
𝑥1
𝑥2
𝑥3
+ 𝑏
ො𝑦 = 𝜎(𝑧)
𝜎 =
1
1 + 𝑒−𝑥
17
Decision trees and gradient boosting
http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
Trivialsplitting:X1<0.5 => y=0.3
18
• Supervised learning
• Mapping someinputs to some outputs
• 𝑥1 𝑥2 𝑥3 … 𝑥 𝑛 → 𝑦1 𝑦2 𝑦3 … 𝑦 𝑚
• Using some parameters 𝜃1 𝜃2 𝜃3 … 𝜃 𝑝
• That you determine by minimizing some loss
• Objectivefunction in xgboostis training loss + regularization term
• 𝑜𝑏𝑗 𝜃 = 𝐿 𝜃 + Ω(𝜃) 𝐿 𝜃 = σ𝑖=1
𝑘 (ෝ𝑦𝑖 − 𝑦𝑖 )2
xgboost
19
CART – Classification and Regression Trees
Does a person like computergames?
The score adds expressiveness to the leaf
20
Form an ensemble of weak learners
Add score of multipletrees together
21
Neural networks
Trainingcycle
22
• Normalize data
• If input is categorical,represent it as one-hot encodings
• Red,blue,green -> red=[1,0,0] , blue=[0,1,0], green=[0,0,1]
• If input is text,represent words as word embeddings
• If embeddinglength was 4, we could have«bank» = [0.23,1.2,0.34,0.78]
• The embeddings can be learned as part of the learningtask, or:
• Embeddings can be taken from a language model trained froma larger text corpus
Preprocessing of inputs to neural networks
23
• Large number of categories lead to long one-hot vectors
• Different values of categorical variables are treated as
completely independent of each other.
Some weaknesses of one-hot for categorical data
24
25
Paris – France + Italy ~ Rome
- + ~
26
• >20 000 forbedringsforslag since 2010
• Each Forbedringsforslag has one text with maximum 98
words
• Each Forbedringsforslag is classified into a product
category by a person.
• Can we take those data and teach a learning algorithm to
predict product category?
Forbedringsforslag
27
Forbedringsforslag – Neural network architecture
«Hei, jeg opplever det som veldig forvirrende at jeg ser bokført saldo. Jeg trenger kun å se
disponibel saldo. Ønsker å bare se disponibel eller velge det som den saldoen som er synlig.»
28
Conclusion Forbedringsforslag
• We finally arriveat an accuracy of 75% for both the validation set and the test set
• Without regularization we startoverfitting after 10 to 15 epochs
• By applying dropoutfraction of 0.2 on both input-to-stateand state-to-statein the LSTM, we avoid overfitting
• A thin graphical user interfacecan presentthe products sorted by descending predicted probability
• The labelling job can the be quicker, but it can’tbe done entirely by machine learning
29
Sales prediction Kaggle contest 2015
• 3000 drug stores
• 7 countries
• Predict daily sales
• Depends on:
• Promotions
• Competition
• School
• State holiday
• Seasonality
• Locality
• Etc
30
• In principle a neural network can approximateany
continous function and piece wise continous function
• A neural network is not suitable to approximate arbitrary
non-continous functions as it assumes a certain level of
continuity
• Decision trees do not assumeany continuity of feature
variables and can divide the states of a variable as fine as
necessary
31
• «The rise of neural networks in natural language
processing is based on the word embeddings which puts
words with similar meaning closer to each other in a
word space thus increasing the continuity of the words
compared to using one-hot encoding of words»
32
Keras implementation of entity embeddings by Guo
https://github.com/entron/entity-embedding-rossmann/
• Store
• Day of week
• Promo
• Year
• Month
• Day of month
• State
33
Neural network architecture Guo
34
The embeddings have learned some German geography
35
• Entity embeddings reduce memory usage and speeds up neural
networks compared to one-hot encoding.
• Intrinsic properties of the categorical features can be revealed by
mapping similar values close to each other in embedding space.
• The embeddings learned boost the performance of other machine
learning methods when using them as input features instead.
• Guo and Berkhahn came out third in the Rossman Store Sales prediction
• The students at MILA, Montreal who won the Taxi Destination
prediction on Kaggle also used entity embeddings
http://blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview-
1st-place-team-%F0%9F%9A%95/
Conclusions

More Related Content

What's hot

Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Bpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedbackBpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedback
Park JunPyo
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
Somnath Banerjee
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
Basit Rafiq
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
Sebastian Ruder
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Yeonsu Kim
 
BERT
BERTBERT
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
Yiqun Hu
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
LEE HOSEONG
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
Md Tajul Islam
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
Dong Heon Cho
 
2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワーク2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワークTakeshi Sakaki
 

What's hot (20)

Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Bpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedbackBpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedback
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)
 
BERT
BERTBERT
BERT
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
 
2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワーク2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワーク
 

Similar to Entity embeddings for categorical data

Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
Takrim Ul Islam Laskar
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
Facultad de Informática UCM
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
Knoldus Inc.
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitskylopanath
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
kartikaursang53
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Holdings
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
Hiroyuki Vincent Yamazaki
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
iqbalphy1
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
Shivam327815
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
inside-BigData.com
 

Similar to Entity embeddings for categorical data (20)

Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Entity embeddings for categorical data

  • 2. 2 Outline • Background • Datarepresentations • Gradient boosted trees • Deep Learning • Entity embeddings
  • 3. 3 Exceeded 1 million users in 2017 Collaborativeandcompetitive datascience Gradient boosted trees win most contests with tabular/structureddata Deep Learning wins when datais unstructured images/text/sound
  • 4. Standard modeling activitiesStatistics or machine learning, most activities are common Select model Select inputs Train model Test model on unseen data Evaluate performance Success? Modelling activities Supervised learning needs labeled data SELECT GROUND TRUTH TO TARGET THE TRAINING AGAINST Requires experts with deep understanding in the field FEATURE ENGINEERING – FIND RELEVANT INPUTS The risk of overfitting is high when the model has many parameters COMMON PITFALL - OVERFITTING Nodes are randomly dropped so that the rest must readjust DEEP LEARNING AVOIDS OVERFITTING USING DROPOUT DATA DISCOVERY
  • 5. 5 Artificial neural networks - Some highlights from timeline Snipped from https://www.scaruffi.com/mind/ai.pdf
  • 6. 6 • Internet produces massivedatasets • Powerful GPUs developed primarily for gaming • Improved algorithms • Better ways of mitigating overfitting Artificial Neural networks – Why now?
  • 7. 7 Deep learning in the industry Jeff Dean Google Brain Team AI Frontiers Trends and Developments in Deep Learning Research
  • 9. 9 Data representations, decomposing a vector x y v u u v V = We can decompose the vector V into a vector of length u directed along the x axis, and a vector of length v directed along the y axis. V
  • 10. 10 Data representations, vector length and direction x y u v V = V V α = V α Both these data representationsdefine the same vector. How you want to feed this information to the learning algorithm depends on what you’re aiming to predict. If this vector would represent wind in the horizontal plane, and we want to predict the power output from a wind turbine, which we happen to know is a function of the wind speed, feeding in to the learning algorithm makes a lot of sense. 𝑉 = 𝑢2 + 𝑣2This way the learning algorithm doesn’t need to figure out Pythagoras on it’s own. However, with enough training data, a neural network could figure this out. 𝑉
  • 11. 11 Data representations, cyclic variables x y u v V = V V α = V α Cyclic variables needs special consideration. Angle α, the angle between 0° and 359° is only 1°, this is not obvious to a learning algorithm.
  • 12. 12 Neural networks can learn new data representations Compute Graph 𝑃 𝑥 = 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑺𝒊𝒈𝒎𝒐𝒊𝒅 Neural network architecture subject to change Width - Depth Logistic Regression 𝜎 = 1 1 + 𝑒−𝑥
  • 13. 13 An artificial neural network is just a series of matrix operations 𝑧 = 𝑊𝑥 + 𝑏 𝑎 = 𝜎(𝑧)
  • 14. 14 A simple neural network 𝑧1 𝑧2 𝑧3 = 𝑤11 𝑤12 𝑤13 𝑤21 𝑤22 𝑤23 𝑤31 𝑤32 𝑤33 ∙ 𝑥1 𝑥2 𝑥3 + 𝑏1 𝑏2 𝑏3 𝑎1 𝑎2 𝑎3 = 𝑟𝑒𝐿𝑈(𝑧1) 𝑟𝑒𝐿𝑈(𝑧2) 𝑟𝑒𝐿𝑈(𝑧3) Linear transformation Apply non-linearity
  • 15. 15 Instead of feature inputs, use activations from previous layer as input 𝑧1 [𝑛+1] 𝑧2 [𝑛+1] 𝑧3 [𝑛+1] = 𝑤11 𝑤12 𝑤13 𝑤21 𝑤22 𝑤23 𝑤31 𝑤32 𝑤33 ∙ 𝑎1 𝑛 𝑎2 𝑛 𝑎3 𝑛 + 𝑏1 𝑏2 𝑏3 𝑎1 [𝑛+1] 𝑎2 [𝑛+1] 𝑎3 [𝑛+1] = 𝑟𝑒𝐿𝑈(𝑧1 𝑛+1 ) 𝑟𝑒𝐿𝑈(𝑧2 [𝑛+1] ) 𝑟𝑒𝐿𝑈(𝑧3 [𝑛+1] )
  • 16. 16 Logistic regression 𝑧 = 𝑤11 𝑤12 𝑤13 ∙ 𝑥1 𝑥2 𝑥3 + 𝑏 ො𝑦 = 𝜎(𝑧) 𝜎 = 1 1 + 𝑒−𝑥
  • 17. 17 Decision trees and gradient boosting http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html Trivialsplitting:X1<0.5 => y=0.3
  • 18. 18 • Supervised learning • Mapping someinputs to some outputs • 𝑥1 𝑥2 𝑥3 … 𝑥 𝑛 → 𝑦1 𝑦2 𝑦3 … 𝑦 𝑚 • Using some parameters 𝜃1 𝜃2 𝜃3 … 𝜃 𝑝 • That you determine by minimizing some loss • Objectivefunction in xgboostis training loss + regularization term • 𝑜𝑏𝑗 𝜃 = 𝐿 𝜃 + Ω(𝜃) 𝐿 𝜃 = σ𝑖=1 𝑘 (ෝ𝑦𝑖 − 𝑦𝑖 )2 xgboost
  • 19. 19 CART – Classification and Regression Trees Does a person like computergames? The score adds expressiveness to the leaf
  • 20. 20 Form an ensemble of weak learners Add score of multipletrees together
  • 22. 22 • Normalize data • If input is categorical,represent it as one-hot encodings • Red,blue,green -> red=[1,0,0] , blue=[0,1,0], green=[0,0,1] • If input is text,represent words as word embeddings • If embeddinglength was 4, we could have«bank» = [0.23,1.2,0.34,0.78] • The embeddings can be learned as part of the learningtask, or: • Embeddings can be taken from a language model trained froma larger text corpus Preprocessing of inputs to neural networks
  • 23. 23 • Large number of categories lead to long one-hot vectors • Different values of categorical variables are treated as completely independent of each other. Some weaknesses of one-hot for categorical data
  • 24. 24
  • 25. 25 Paris – France + Italy ~ Rome - + ~
  • 26. 26 • >20 000 forbedringsforslag since 2010 • Each Forbedringsforslag has one text with maximum 98 words • Each Forbedringsforslag is classified into a product category by a person. • Can we take those data and teach a learning algorithm to predict product category? Forbedringsforslag
  • 27. 27 Forbedringsforslag – Neural network architecture «Hei, jeg opplever det som veldig forvirrende at jeg ser bokført saldo. Jeg trenger kun å se disponibel saldo. Ønsker å bare se disponibel eller velge det som den saldoen som er synlig.»
  • 28. 28 Conclusion Forbedringsforslag • We finally arriveat an accuracy of 75% for both the validation set and the test set • Without regularization we startoverfitting after 10 to 15 epochs • By applying dropoutfraction of 0.2 on both input-to-stateand state-to-statein the LSTM, we avoid overfitting • A thin graphical user interfacecan presentthe products sorted by descending predicted probability • The labelling job can the be quicker, but it can’tbe done entirely by machine learning
  • 29. 29 Sales prediction Kaggle contest 2015 • 3000 drug stores • 7 countries • Predict daily sales • Depends on: • Promotions • Competition • School • State holiday • Seasonality • Locality • Etc
  • 30. 30 • In principle a neural network can approximateany continous function and piece wise continous function • A neural network is not suitable to approximate arbitrary non-continous functions as it assumes a certain level of continuity • Decision trees do not assumeany continuity of feature variables and can divide the states of a variable as fine as necessary
  • 31. 31 • «The rise of neural networks in natural language processing is based on the word embeddings which puts words with similar meaning closer to each other in a word space thus increasing the continuity of the words compared to using one-hot encoding of words»
  • 32. 32 Keras implementation of entity embeddings by Guo https://github.com/entron/entity-embedding-rossmann/ • Store • Day of week • Promo • Year • Month • Day of month • State
  • 34. 34 The embeddings have learned some German geography
  • 35. 35 • Entity embeddings reduce memory usage and speeds up neural networks compared to one-hot encoding. • Intrinsic properties of the categorical features can be revealed by mapping similar values close to each other in embedding space. • The embeddings learned boost the performance of other machine learning methods when using them as input features instead. • Guo and Berkhahn came out third in the Rossman Store Sales prediction • The students at MILA, Montreal who won the Taxi Destination prediction on Kaggle also used entity embeddings http://blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview- 1st-place-team-%F0%9F%9A%95/ Conclusions