SlideShare a Scribd company logo
1 of 35
Download to read offline
Entity embeddings for
categorical data
Paul Skeie
2
Outline
• Background
• Datarepresentations
• Gradient boosted trees
• Deep Learning
• Entity embeddings
3
Exceeded 1 million users in 2017
Collaborativeandcompetitive datascience
Gradient boosted trees win most contests with tabular/structureddata
Deep Learning wins when datais unstructured images/text/sound
Standard modeling activitiesStatistics or machine learning, most activities are common
Select
model
Select
inputs
Train model
Test model
on unseen
data
Evaluate
performance
Success?
Modelling
activities
Supervised learning needs labeled data
SELECT GROUND TRUTH TO TARGET THE TRAINING AGAINST
Requires experts with deep understanding in the field
FEATURE ENGINEERING – FIND RELEVANT INPUTS
The risk of overfitting is high when the model has
many parameters
COMMON PITFALL - OVERFITTING
Nodes are randomly dropped so that the rest must readjust
DEEP LEARNING AVOIDS OVERFITTING USING DROPOUT
DATA
DISCOVERY
5
Artificial neural networks - Some highlights from timeline
Snipped from https://www.scaruffi.com/mind/ai.pdf
6
• Internet produces massivedatasets
• Powerful GPUs developed primarily for gaming
• Improved algorithms
• Better ways of mitigating overfitting
Artificial Neural networks – Why now?
7
Deep learning in the industry
Jeff Dean
Google Brain Team
AI Frontiers
Trends and Developments in Deep Learning Research
8
Gradient boosted trees
9
Data representations, decomposing a vector
x
y
v
u
u
v
V =
We can decompose
the vector V into a
vector of length u
directed along the x
axis, and a vector of
length v directed
along the y axis.
V
10
Data representations, vector length and direction
x
y u
v
V =
V
V
α
=
V
α
Both these data
representationsdefine
the same vector.
How you want to feed
this information to the
learning algorithm
depends on what
you’re aiming to
predict.
If this vector would represent wind in the horizontal plane, and we want to predict the power output from
a wind turbine, which we happen to know is a function of the wind speed, feeding in
to the learning algorithm makes a lot of sense.
𝑉 = 𝑢2 + 𝑣2This way the learning algorithm doesn’t need to figure out Pythagoras on it’s own.
However, with enough training data, a neural network could figure this out.
𝑉
11
Data representations, cyclic variables
x
y u
v
V =
V
V
α
=
V
α
Cyclic variables needs
special consideration.
Angle α, the angle
between 0° and 359°
is only 1°, this is not
obvious to a learning
algorithm.
12
Neural networks can learn new data representations
Compute Graph
𝑃 𝑥 = 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑺𝒊𝒈𝒎𝒐𝒊𝒅
Neural network architecture subject to change
Width - Depth Logistic Regression
𝜎 =
1
1 + 𝑒−𝑥
13
An artificial neural network is just a series of matrix operations
𝑧 = 𝑊𝑥 + 𝑏
𝑎 = 𝜎(𝑧)
14
A simple neural network
𝑧1
𝑧2
𝑧3
=
𝑤11 𝑤12 𝑤13
𝑤21 𝑤22 𝑤23
𝑤31 𝑤32 𝑤33
∙
𝑥1
𝑥2
𝑥3
+
𝑏1
𝑏2
𝑏3
𝑎1
𝑎2
𝑎3
=
𝑟𝑒𝐿𝑈(𝑧1)
𝑟𝑒𝐿𝑈(𝑧2)
𝑟𝑒𝐿𝑈(𝑧3)
Linear transformation
Apply non-linearity
15
Instead of feature inputs, use activations from previous
layer as input
𝑧1
[𝑛+1]
𝑧2
[𝑛+1]
𝑧3
[𝑛+1]
=
𝑤11 𝑤12 𝑤13
𝑤21 𝑤22 𝑤23
𝑤31 𝑤32 𝑤33
∙
𝑎1
𝑛
𝑎2
𝑛
𝑎3
𝑛
+
𝑏1
𝑏2
𝑏3
𝑎1
[𝑛+1]
𝑎2
[𝑛+1]
𝑎3
[𝑛+1]
=
𝑟𝑒𝐿𝑈(𝑧1
𝑛+1
)
𝑟𝑒𝐿𝑈(𝑧2
[𝑛+1]
)
𝑟𝑒𝐿𝑈(𝑧3
[𝑛+1]
)
16
Logistic regression
𝑧 = 𝑤11 𝑤12 𝑤13 ∙
𝑥1
𝑥2
𝑥3
+ 𝑏
ො𝑦 = 𝜎(𝑧)
𝜎 =
1
1 + 𝑒−𝑥
17
Decision trees and gradient boosting
http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
Trivialsplitting:X1<0.5 => y=0.3
18
• Supervised learning
• Mapping someinputs to some outputs
• 𝑥1 𝑥2 𝑥3 … 𝑥 𝑛 → 𝑦1 𝑦2 𝑦3 … 𝑦 𝑚
• Using some parameters 𝜃1 𝜃2 𝜃3 … 𝜃 𝑝
• That you determine by minimizing some loss
• Objectivefunction in xgboostis training loss + regularization term
• 𝑜𝑏𝑗 𝜃 = 𝐿 𝜃 + Ω(𝜃) 𝐿 𝜃 = σ𝑖=1
𝑘 (ෝ𝑦𝑖 − 𝑦𝑖 )2
xgboost
19
CART – Classification and Regression Trees
Does a person like computergames?
The score adds expressiveness to the leaf
20
Form an ensemble of weak learners
Add score of multipletrees together
21
Neural networks
Trainingcycle
22
• Normalize data
• If input is categorical,represent it as one-hot encodings
• Red,blue,green -> red=[1,0,0] , blue=[0,1,0], green=[0,0,1]
• If input is text,represent words as word embeddings
• If embeddinglength was 4, we could have«bank» = [0.23,1.2,0.34,0.78]
• The embeddings can be learned as part of the learningtask, or:
• Embeddings can be taken from a language model trained froma larger text corpus
Preprocessing of inputs to neural networks
23
• Large number of categories lead to long one-hot vectors
• Different values of categorical variables are treated as
completely independent of each other.
Some weaknesses of one-hot for categorical data
24
25
Paris – France + Italy ~ Rome
- + ~
26
• >20 000 forbedringsforslag since 2010
• Each Forbedringsforslag has one text with maximum 98
words
• Each Forbedringsforslag is classified into a product
category by a person.
• Can we take those data and teach a learning algorithm to
predict product category?
Forbedringsforslag
27
Forbedringsforslag – Neural network architecture
«Hei, jeg opplever det som veldig forvirrende at jeg ser bokført saldo. Jeg trenger kun å se
disponibel saldo. Ønsker å bare se disponibel eller velge det som den saldoen som er synlig.»
28
Conclusion Forbedringsforslag
• We finally arriveat an accuracy of 75% for both the validation set and the test set
• Without regularization we startoverfitting after 10 to 15 epochs
• By applying dropoutfraction of 0.2 on both input-to-stateand state-to-statein the LSTM, we avoid overfitting
• A thin graphical user interfacecan presentthe products sorted by descending predicted probability
• The labelling job can the be quicker, but it can’tbe done entirely by machine learning
29
Sales prediction Kaggle contest 2015
• 3000 drug stores
• 7 countries
• Predict daily sales
• Depends on:
• Promotions
• Competition
• School
• State holiday
• Seasonality
• Locality
• Etc
30
• In principle a neural network can approximateany
continous function and piece wise continous function
• A neural network is not suitable to approximate arbitrary
non-continous functions as it assumes a certain level of
continuity
• Decision trees do not assumeany continuity of feature
variables and can divide the states of a variable as fine as
necessary
31
• «The rise of neural networks in natural language
processing is based on the word embeddings which puts
words with similar meaning closer to each other in a
word space thus increasing the continuity of the words
compared to using one-hot encoding of words»
32
Keras implementation of entity embeddings by Guo
https://github.com/entron/entity-embedding-rossmann/
• Store
• Day of week
• Promo
• Year
• Month
• Day of month
• State
33
Neural network architecture Guo
34
The embeddings have learned some German geography
35
• Entity embeddings reduce memory usage and speeds up neural
networks compared to one-hot encoding.
• Intrinsic properties of the categorical features can be revealed by
mapping similar values close to each other in embedding space.
• The embeddings learned boost the performance of other machine
learning methods when using them as input features instead.
• Guo and Berkhahn came out third in the Rossman Store Sales prediction
• The students at MILA, Montreal who won the Taxi Destination
prediction on Kaggle also used entity embeddings
http://blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview-
1st-place-team-%F0%9F%9A%95/
Conclusions

More Related Content

What's hot

Build an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMBuild an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMPoo Kuan Hoong
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualizationtaeseon ryu
 
Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0Soowan Lee
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
Model evaluation - machine learning
Model evaluation - machine learningModel evaluation - machine learning
Model evaluation - machine learningSon Phan
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding GradientsSiddharth Vij
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Simplilearn
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfittingSivapriyaS12
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selectionAndrea Dal Pozzolo
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 

What's hot (20)

Build an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMBuild an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBM
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
 
Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
Model evaluation - machine learning
Model evaluation - machine learningModel evaluation - machine learning
Model evaluation - machine learning
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Demystifying Xgboost
Demystifying XgboostDemystifying Xgboost
Demystifying Xgboost
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 

Similar to Entity embeddings for categorical data

Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsKnoldus Inc.
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitskylopanath
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnnkartikaursang53
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxBhagyasriPatel2
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithmsiqbalphy1
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptxShivam327815
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...inside-BigData.com
 

Similar to Entity embeddings for categorical data (20)

Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 

Recently uploaded

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Entity embeddings for categorical data

  • 2. 2 Outline • Background • Datarepresentations • Gradient boosted trees • Deep Learning • Entity embeddings
  • 3. 3 Exceeded 1 million users in 2017 Collaborativeandcompetitive datascience Gradient boosted trees win most contests with tabular/structureddata Deep Learning wins when datais unstructured images/text/sound
  • 4. Standard modeling activitiesStatistics or machine learning, most activities are common Select model Select inputs Train model Test model on unseen data Evaluate performance Success? Modelling activities Supervised learning needs labeled data SELECT GROUND TRUTH TO TARGET THE TRAINING AGAINST Requires experts with deep understanding in the field FEATURE ENGINEERING – FIND RELEVANT INPUTS The risk of overfitting is high when the model has many parameters COMMON PITFALL - OVERFITTING Nodes are randomly dropped so that the rest must readjust DEEP LEARNING AVOIDS OVERFITTING USING DROPOUT DATA DISCOVERY
  • 5. 5 Artificial neural networks - Some highlights from timeline Snipped from https://www.scaruffi.com/mind/ai.pdf
  • 6. 6 • Internet produces massivedatasets • Powerful GPUs developed primarily for gaming • Improved algorithms • Better ways of mitigating overfitting Artificial Neural networks – Why now?
  • 7. 7 Deep learning in the industry Jeff Dean Google Brain Team AI Frontiers Trends and Developments in Deep Learning Research
  • 9. 9 Data representations, decomposing a vector x y v u u v V = We can decompose the vector V into a vector of length u directed along the x axis, and a vector of length v directed along the y axis. V
  • 10. 10 Data representations, vector length and direction x y u v V = V V α = V α Both these data representationsdefine the same vector. How you want to feed this information to the learning algorithm depends on what you’re aiming to predict. If this vector would represent wind in the horizontal plane, and we want to predict the power output from a wind turbine, which we happen to know is a function of the wind speed, feeding in to the learning algorithm makes a lot of sense. 𝑉 = 𝑢2 + 𝑣2This way the learning algorithm doesn’t need to figure out Pythagoras on it’s own. However, with enough training data, a neural network could figure this out. 𝑉
  • 11. 11 Data representations, cyclic variables x y u v V = V V α = V α Cyclic variables needs special consideration. Angle α, the angle between 0° and 359° is only 1°, this is not obvious to a learning algorithm.
  • 12. 12 Neural networks can learn new data representations Compute Graph 𝑃 𝑥 = 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑟𝑒𝐿𝑈 → 𝐿𝑖𝑛𝑒𝑎𝑟 → 𝑺𝒊𝒈𝒎𝒐𝒊𝒅 Neural network architecture subject to change Width - Depth Logistic Regression 𝜎 = 1 1 + 𝑒−𝑥
  • 13. 13 An artificial neural network is just a series of matrix operations 𝑧 = 𝑊𝑥 + 𝑏 𝑎 = 𝜎(𝑧)
  • 14. 14 A simple neural network 𝑧1 𝑧2 𝑧3 = 𝑤11 𝑤12 𝑤13 𝑤21 𝑤22 𝑤23 𝑤31 𝑤32 𝑤33 ∙ 𝑥1 𝑥2 𝑥3 + 𝑏1 𝑏2 𝑏3 𝑎1 𝑎2 𝑎3 = 𝑟𝑒𝐿𝑈(𝑧1) 𝑟𝑒𝐿𝑈(𝑧2) 𝑟𝑒𝐿𝑈(𝑧3) Linear transformation Apply non-linearity
  • 15. 15 Instead of feature inputs, use activations from previous layer as input 𝑧1 [𝑛+1] 𝑧2 [𝑛+1] 𝑧3 [𝑛+1] = 𝑤11 𝑤12 𝑤13 𝑤21 𝑤22 𝑤23 𝑤31 𝑤32 𝑤33 ∙ 𝑎1 𝑛 𝑎2 𝑛 𝑎3 𝑛 + 𝑏1 𝑏2 𝑏3 𝑎1 [𝑛+1] 𝑎2 [𝑛+1] 𝑎3 [𝑛+1] = 𝑟𝑒𝐿𝑈(𝑧1 𝑛+1 ) 𝑟𝑒𝐿𝑈(𝑧2 [𝑛+1] ) 𝑟𝑒𝐿𝑈(𝑧3 [𝑛+1] )
  • 16. 16 Logistic regression 𝑧 = 𝑤11 𝑤12 𝑤13 ∙ 𝑥1 𝑥2 𝑥3 + 𝑏 ො𝑦 = 𝜎(𝑧) 𝜎 = 1 1 + 𝑒−𝑥
  • 17. 17 Decision trees and gradient boosting http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html Trivialsplitting:X1<0.5 => y=0.3
  • 18. 18 • Supervised learning • Mapping someinputs to some outputs • 𝑥1 𝑥2 𝑥3 … 𝑥 𝑛 → 𝑦1 𝑦2 𝑦3 … 𝑦 𝑚 • Using some parameters 𝜃1 𝜃2 𝜃3 … 𝜃 𝑝 • That you determine by minimizing some loss • Objectivefunction in xgboostis training loss + regularization term • 𝑜𝑏𝑗 𝜃 = 𝐿 𝜃 + Ω(𝜃) 𝐿 𝜃 = σ𝑖=1 𝑘 (ෝ𝑦𝑖 − 𝑦𝑖 )2 xgboost
  • 19. 19 CART – Classification and Regression Trees Does a person like computergames? The score adds expressiveness to the leaf
  • 20. 20 Form an ensemble of weak learners Add score of multipletrees together
  • 22. 22 • Normalize data • If input is categorical,represent it as one-hot encodings • Red,blue,green -> red=[1,0,0] , blue=[0,1,0], green=[0,0,1] • If input is text,represent words as word embeddings • If embeddinglength was 4, we could have«bank» = [0.23,1.2,0.34,0.78] • The embeddings can be learned as part of the learningtask, or: • Embeddings can be taken from a language model trained froma larger text corpus Preprocessing of inputs to neural networks
  • 23. 23 • Large number of categories lead to long one-hot vectors • Different values of categorical variables are treated as completely independent of each other. Some weaknesses of one-hot for categorical data
  • 24. 24
  • 25. 25 Paris – France + Italy ~ Rome - + ~
  • 26. 26 • >20 000 forbedringsforslag since 2010 • Each Forbedringsforslag has one text with maximum 98 words • Each Forbedringsforslag is classified into a product category by a person. • Can we take those data and teach a learning algorithm to predict product category? Forbedringsforslag
  • 27. 27 Forbedringsforslag – Neural network architecture «Hei, jeg opplever det som veldig forvirrende at jeg ser bokført saldo. Jeg trenger kun å se disponibel saldo. Ønsker å bare se disponibel eller velge det som den saldoen som er synlig.»
  • 28. 28 Conclusion Forbedringsforslag • We finally arriveat an accuracy of 75% for both the validation set and the test set • Without regularization we startoverfitting after 10 to 15 epochs • By applying dropoutfraction of 0.2 on both input-to-stateand state-to-statein the LSTM, we avoid overfitting • A thin graphical user interfacecan presentthe products sorted by descending predicted probability • The labelling job can the be quicker, but it can’tbe done entirely by machine learning
  • 29. 29 Sales prediction Kaggle contest 2015 • 3000 drug stores • 7 countries • Predict daily sales • Depends on: • Promotions • Competition • School • State holiday • Seasonality • Locality • Etc
  • 30. 30 • In principle a neural network can approximateany continous function and piece wise continous function • A neural network is not suitable to approximate arbitrary non-continous functions as it assumes a certain level of continuity • Decision trees do not assumeany continuity of feature variables and can divide the states of a variable as fine as necessary
  • 31. 31 • «The rise of neural networks in natural language processing is based on the word embeddings which puts words with similar meaning closer to each other in a word space thus increasing the continuity of the words compared to using one-hot encoding of words»
  • 32. 32 Keras implementation of entity embeddings by Guo https://github.com/entron/entity-embedding-rossmann/ • Store • Day of week • Promo • Year • Month • Day of month • State
  • 34. 34 The embeddings have learned some German geography
  • 35. 35 • Entity embeddings reduce memory usage and speeds up neural networks compared to one-hot encoding. • Intrinsic properties of the categorical features can be revealed by mapping similar values close to each other in embedding space. • The embeddings learned boost the performance of other machine learning methods when using them as input features instead. • Guo and Berkhahn came out third in the Rossman Store Sales prediction • The students at MILA, Montreal who won the Taxi Destination prediction on Kaggle also used entity embeddings http://blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview- 1st-place-team-%F0%9F%9A%95/ Conclusions