SlideShare a Scribd company logo
1 of 40
Download to read offline
TextCNN on
Acme UGC moderation
Marsan Ma
2018.7.18
1
● Stuffs you are interested in
○ TextCNN
○ Why you should try deep-learning on text data
● On real product
○ Acme review / Q&A moderation
○ Network Spec
Outline
2
TextCNN
3
TextCNN: architecture
1. Architecture
a. Embedding
i. Trainable, or not (if using pretrained)
b. 1d convolution filters
i. Each conv “searches for this pattern”
ii. Pattern to be searched is trainable
iii. hundreds/thousands filters per size
c. Maxpool results
i. Means “max density of certain pattern”
2. Keras code: just 10 lines
4
Let’s take an intuitive analogy!
(Keyword: Convolution, MaxPool)
5
6
I’m Llama (sample L) I’m Alpaca (sample A)
1. You want to classify whether the right animal is Llama.
7
2. So you find traits of Llama as your filters.
Traits (Filters)
8
3. Then you do a convolution to find maxmatch of each trait.
Llama Traits (Filters)
0%
70%
10%
Filters finding traits
(by Convolution)
70%
Best match of 1st trait
is 70% (by MaxPool)
10%
5%
9
Llama Traits (Filters)
0%
10%
80%
Filters finding traits
(by Convolution)
80%
Best match of 2nd trait
is 80% (by MaxPool)
60%
15%
3. Then you do a convolution to find maxmatch of each trait.
10
Llama Traits (Filters)
0%
10%
10%
Filters finding traits
(by Convolution)
60%
Best match of 3rd trait
is 60% (by MaxPool)
40%
60%
3. Then you do a convolution to find maxmatch of each trait.
11
4. Finally, you have similarities for each trait (features!)
Llama Traits (Filters)
70%
80%
60%
12
5. Make final decision (model) out of your features.
(In neural network, multi-layer perceptron is simplest.)
Llama Traits (Filters)
70%
80%
60%
M
Any
classifier
you love!
Conv & MaxPool
Now you got the idea,
Let’s dive into a bit more detail.
13
TextCNN: what is a 1d convolution?
14
Let’s talk in Convolutions!
10*3 + 20*0 + 30*1 + 40*2 = 140
Text Data Filter
TextCNN: what is a 1d convolution?
15
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
Text Data Filter
TextCNN: what is a 1d convolution?
16
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
10*0 + 20*2 + 30*0 + 40*0 = 40
Text Data Filter
TextCNN: what is a 1d convolution?
17
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
10*0 + 20*2 + 30*0 + 40*0 = 40
10*0 + 20*0 + 30*0 + 40*0 = 0
Note: you could specify activation function in Conv1D to adjust your output,
and seems like people is using relu in TextCNN.
Text Data Filter
1. It’s idea similar to ngram-bow/tfidf, but:
a. Don’t need an exact match, words with similar
meanings also contribute.
b. Weighted among tokens/dimensions.
c. Maxpool make only the most matched pattern
have final effect.
2. It’s including
a. Training the embedding
b. Training the feature finder
c. into your supervised learning process,
dedicated to your data.
3. While most of traditional feature-finding & extraction
is actually unsupervised-learning process.
TextCNN: what is a 1d convolution?
18
Text Data Filter
Now you know the details.
Let’s return to full picture!
19
TextCNN: architecture
1. Architecture
a. Embedding
i. Trainable, or not (if using pretrained)
b. 1d convolution filters
i. Each “searches for this pattern”
ii. Pattern to be searched is trainable
iii. hundreds/thousands filters per size
c. Maxpool results
i. Means “max density of certain pattern”
2. Keras code: just 10 lines
20
1. Input: 140 as the max word count of doc.
2. 60k for english dictionary size
3. 300 is convention of embedding size
4. region size=[2,3,4] as example figure
5. filter=2, as example figure
6. dropout=0.5 because Hinton said so.
TextCNN: code detail
21
Text_1
(ex: resume)
Advance trick: concatenate channels from multiple features
22
MLP
...
?
[Output stage]
sigmoid, softmax,
linear… according to
your response type.
textcnn_vec1
textcnn_vec2
Text_2
(ex: jobTitle)
Your Fancy
Feature
Engineering
traditional
features
Advance trick: then concatenate regular features you love
23
MLP
...
?
[Output stage]
sigmoid, softmax,
linear… according to
your response type.
textcnn_vec1
textcnn_vec2
+
Text_1
(ex: resume)
Text_2
(ex: jobTitle)
Others
(ex: sex/lang)
Why you should try
Deep Learning?
24
● Doing much better than ngram
○ Embedding => better resolution than word level
○ words having similar meaning will also work
○ Including feature extraction in supervised learning for your data.
● No Manual Feature Extraction
○ No need to reproduce feature extraction in deploy (Java) domain.
○ Feature extraction from text data could be computing expensive:
■ Dictionary based feature is slow for large sample size
■ Model based features like NMF, LDA is super slow.
● Pretrained embeddings give you a boost (word2vec, GloVe, FastText):
○ Idea like transfer learning.
○ Then your embedding could further fine tune it to fit your dataset.
Better features = Better performance
25
● Customize you model according to your data/purpose.
○ Switch output layer / activation function for different response type / range.
○ Customize architecture according to your data characteristic.
○ Mess around with hidden layers / dropout / different activate function.
● Merit as an online model (SGD over BGD)
○ Sustainable model: old model + new samples = new model having old experience!
○ Memory friendly: don’t need to load all samples in memory, choose mini-batch size fitting
your usage.
● GPU speed you!
Customizable / Reusability!
26
Deployment: Tensorflow Java API (doc)
27
1. Load protobuf model
2. Input tokenized text
3. Get prediction results
● Big network => expensive computing power
○ Embedding layer is expensive, since it’s a huge fully-connected layer.
○ Reference: the predicting throughput is ~750/sec in our deploying model. (w/o GPU)
● Larger model
○ Both model we’re deploying is 100Mb with 13M total/trainable parameters, while simple tree
based model could be < 10 Mb.
○ In our case, it takes 4.5 hours to train on 1.2M reviews in 1 epoch, with 8cpu/32G ram.
● Solutions for throughput:
○ GPU acceleration
○ Do predicting in parallel on product
○ Use coarse model to narrow down search space, only use fine-grain model in sorting
promising candidates.
No free lunch, it will cost you ...
28
Acme UGC moderation
29
Preliminary
● User generated content (UGC) is valuable asset in Acme.
● Bad UGC will ruin user experience, or get us sued.
● Today we’re talking about model moderating Reviews and Q&A.
30
Acme reviews classifier: ROC curve
31
Old model
ROC=0.696
New model
ROC=0.843
Somehow the old model tend to be too confidence in separating sample to 1/0.
Acme reviews classifier: class distribution
32
Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly
scattered in whole range, too.
Acme reviews classifier
33
● Target : auto-accepting 80% user content, since we don’t won’t human moderating more.
● Currently, NOT auto-rejecting anything since stack holder asked so.
New model
Auto-accepting 80%
82% class-1 precision
Old model
Auto-accepting 80%
74% class-1 precision
Acme reviews classifier
34
Bad
33%
Good
66%
0.82-0.74 / (1-0.74) = 30% less
bad content being approved
Acme Q&A answers classifier
35
Old model
ROC=0.668
New model
ROC=0.844
The old model prediction seems truncated by some reason.
Acme Q&A answers classifier
36
Acme Q&A answers classifier
37
Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly
scattered in whole range, too.
● Target : auto-accepting 80% user content, since we don’t won’t human moderating more.
● Currently, NOT auto-rejecting anything since stack holder asked so.
Auto-moderating 377%
Precision improved 7%
New model
Auto-accepting 68%
90% class-1 precision
Old model
Auto-accepting 18%
83% class-1 precision
Acme QnA answers classifier
38
Bad
30%
Good
66%
Good
70%
● LGB performance stocked 0.77 @ ~350k training samples
● TextCNN performance stocked 0.83 @ 1.2M training samples
● LSTM seems might still growing after 0.83? But we don’t have more samples.
Learning Curve (QnA Invalidator)
39
Thanks for your time!
40

More Related Content

What's hot

BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialpotaters
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...Jisang Yoon
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...Jisang Yoon
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
ラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポートラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポートKazuyukiMasada
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Featurizing log data before XGBoost
Featurizing log data before XGBoostFeaturizing log data before XGBoost
Featurizing log data before XGBoostDataRobot
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson ChallengeRaouf KESKES
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 

What's hot (20)

BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorial
 
Xgboost
XgboostXgboost
Xgboost
 
Xgboost
XgboostXgboost
Xgboost
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
ラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポートラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポート
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Featurizing log data before XGBoost
Featurizing log data before XGBoostFeaturizing log data before XGBoost
Featurizing log data before XGBoost
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 

Similar to Text cnn on acme ugc moderation

Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...Nicolas Kowalski
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]SubhradeepMaji
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail SelectionKamil Sindi
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code ClinicMike Acton
 
Lesson 2 Understanding Types And Usage In Dot Net
Lesson 2    Understanding Types And Usage In Dot NetLesson 2    Understanding Types And Usage In Dot Net
Lesson 2 Understanding Types And Usage In Dot Netnbaveja
 
DeepLearningProjV3
DeepLearningProjV3DeepLearningProjV3
DeepLearningProjV3Ana Sanchez
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelKoichi Shirahata
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 

Similar to Text cnn on acme ugc moderation (20)

Duplicate_Quora_Question_Detection
Duplicate_Quora_Question_DetectionDuplicate_Quora_Question_Detection
Duplicate_Quora_Question_Detection
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail Selection
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
 
Lesson 2 Understanding Types And Usage In Dot Net
Lesson 2    Understanding Types And Usage In Dot NetLesson 2    Understanding Types And Usage In Dot Net
Lesson 2 Understanding Types And Usage In Dot Net
 
DeepLearningProjV3
DeepLearningProjV3DeepLearningProjV3
DeepLearningProjV3
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Text cnn on acme ugc moderation

  • 1. TextCNN on Acme UGC moderation Marsan Ma 2018.7.18 1
  • 2. ● Stuffs you are interested in ○ TextCNN ○ Why you should try deep-learning on text data ● On real product ○ Acme review / Q&A moderation ○ Network Spec Outline 2
  • 4. TextCNN: architecture 1. Architecture a. Embedding i. Trainable, or not (if using pretrained) b. 1d convolution filters i. Each conv “searches for this pattern” ii. Pattern to be searched is trainable iii. hundreds/thousands filters per size c. Maxpool results i. Means “max density of certain pattern” 2. Keras code: just 10 lines 4
  • 5. Let’s take an intuitive analogy! (Keyword: Convolution, MaxPool) 5
  • 6. 6 I’m Llama (sample L) I’m Alpaca (sample A) 1. You want to classify whether the right animal is Llama.
  • 7. 7 2. So you find traits of Llama as your filters. Traits (Filters)
  • 8. 8 3. Then you do a convolution to find maxmatch of each trait. Llama Traits (Filters) 0% 70% 10% Filters finding traits (by Convolution) 70% Best match of 1st trait is 70% (by MaxPool) 10% 5%
  • 9. 9 Llama Traits (Filters) 0% 10% 80% Filters finding traits (by Convolution) 80% Best match of 2nd trait is 80% (by MaxPool) 60% 15% 3. Then you do a convolution to find maxmatch of each trait.
  • 10. 10 Llama Traits (Filters) 0% 10% 10% Filters finding traits (by Convolution) 60% Best match of 3rd trait is 60% (by MaxPool) 40% 60% 3. Then you do a convolution to find maxmatch of each trait.
  • 11. 11 4. Finally, you have similarities for each trait (features!) Llama Traits (Filters) 70% 80% 60%
  • 12. 12 5. Make final decision (model) out of your features. (In neural network, multi-layer perceptron is simplest.) Llama Traits (Filters) 70% 80% 60% M Any classifier you love!
  • 13. Conv & MaxPool Now you got the idea, Let’s dive into a bit more detail. 13
  • 14. TextCNN: what is a 1d convolution? 14 Let’s talk in Convolutions! 10*3 + 20*0 + 30*1 + 40*2 = 140 Text Data Filter
  • 15. TextCNN: what is a 1d convolution? 15 Let’s talk in Convolution! 10*3 + 20*0 + 30*1 + 40*2 = 140 10*0 + 20*0 + 30*0 + 40*0 = 0 Text Data Filter
  • 16. TextCNN: what is a 1d convolution? 16 Let’s talk in Convolution! 10*3 + 20*0 + 30*1 + 40*2 = 140 10*0 + 20*0 + 30*0 + 40*0 = 0 10*0 + 20*2 + 30*0 + 40*0 = 40 Text Data Filter
  • 17. TextCNN: what is a 1d convolution? 17 Let’s talk in Convolution! 10*3 + 20*0 + 30*1 + 40*2 = 140 10*0 + 20*0 + 30*0 + 40*0 = 0 10*0 + 20*2 + 30*0 + 40*0 = 40 10*0 + 20*0 + 30*0 + 40*0 = 0 Note: you could specify activation function in Conv1D to adjust your output, and seems like people is using relu in TextCNN. Text Data Filter
  • 18. 1. It’s idea similar to ngram-bow/tfidf, but: a. Don’t need an exact match, words with similar meanings also contribute. b. Weighted among tokens/dimensions. c. Maxpool make only the most matched pattern have final effect. 2. It’s including a. Training the embedding b. Training the feature finder c. into your supervised learning process, dedicated to your data. 3. While most of traditional feature-finding & extraction is actually unsupervised-learning process. TextCNN: what is a 1d convolution? 18 Text Data Filter
  • 19. Now you know the details. Let’s return to full picture! 19
  • 20. TextCNN: architecture 1. Architecture a. Embedding i. Trainable, or not (if using pretrained) b. 1d convolution filters i. Each “searches for this pattern” ii. Pattern to be searched is trainable iii. hundreds/thousands filters per size c. Maxpool results i. Means “max density of certain pattern” 2. Keras code: just 10 lines 20
  • 21. 1. Input: 140 as the max word count of doc. 2. 60k for english dictionary size 3. 300 is convention of embedding size 4. region size=[2,3,4] as example figure 5. filter=2, as example figure 6. dropout=0.5 because Hinton said so. TextCNN: code detail 21
  • 22. Text_1 (ex: resume) Advance trick: concatenate channels from multiple features 22 MLP ... ? [Output stage] sigmoid, softmax, linear… according to your response type. textcnn_vec1 textcnn_vec2 Text_2 (ex: jobTitle)
  • 23. Your Fancy Feature Engineering traditional features Advance trick: then concatenate regular features you love 23 MLP ... ? [Output stage] sigmoid, softmax, linear… according to your response type. textcnn_vec1 textcnn_vec2 + Text_1 (ex: resume) Text_2 (ex: jobTitle) Others (ex: sex/lang)
  • 24. Why you should try Deep Learning? 24
  • 25. ● Doing much better than ngram ○ Embedding => better resolution than word level ○ words having similar meaning will also work ○ Including feature extraction in supervised learning for your data. ● No Manual Feature Extraction ○ No need to reproduce feature extraction in deploy (Java) domain. ○ Feature extraction from text data could be computing expensive: ■ Dictionary based feature is slow for large sample size ■ Model based features like NMF, LDA is super slow. ● Pretrained embeddings give you a boost (word2vec, GloVe, FastText): ○ Idea like transfer learning. ○ Then your embedding could further fine tune it to fit your dataset. Better features = Better performance 25
  • 26. ● Customize you model according to your data/purpose. ○ Switch output layer / activation function for different response type / range. ○ Customize architecture according to your data characteristic. ○ Mess around with hidden layers / dropout / different activate function. ● Merit as an online model (SGD over BGD) ○ Sustainable model: old model + new samples = new model having old experience! ○ Memory friendly: don’t need to load all samples in memory, choose mini-batch size fitting your usage. ● GPU speed you! Customizable / Reusability! 26
  • 27. Deployment: Tensorflow Java API (doc) 27 1. Load protobuf model 2. Input tokenized text 3. Get prediction results
  • 28. ● Big network => expensive computing power ○ Embedding layer is expensive, since it’s a huge fully-connected layer. ○ Reference: the predicting throughput is ~750/sec in our deploying model. (w/o GPU) ● Larger model ○ Both model we’re deploying is 100Mb with 13M total/trainable parameters, while simple tree based model could be < 10 Mb. ○ In our case, it takes 4.5 hours to train on 1.2M reviews in 1 epoch, with 8cpu/32G ram. ● Solutions for throughput: ○ GPU acceleration ○ Do predicting in parallel on product ○ Use coarse model to narrow down search space, only use fine-grain model in sorting promising candidates. No free lunch, it will cost you ... 28
  • 30. Preliminary ● User generated content (UGC) is valuable asset in Acme. ● Bad UGC will ruin user experience, or get us sued. ● Today we’re talking about model moderating Reviews and Q&A. 30
  • 31. Acme reviews classifier: ROC curve 31 Old model ROC=0.696 New model ROC=0.843
  • 32. Somehow the old model tend to be too confidence in separating sample to 1/0. Acme reviews classifier: class distribution 32
  • 33. Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly scattered in whole range, too. Acme reviews classifier 33
  • 34. ● Target : auto-accepting 80% user content, since we don’t won’t human moderating more. ● Currently, NOT auto-rejecting anything since stack holder asked so. New model Auto-accepting 80% 82% class-1 precision Old model Auto-accepting 80% 74% class-1 precision Acme reviews classifier 34 Bad 33% Good 66% 0.82-0.74 / (1-0.74) = 30% less bad content being approved
  • 35. Acme Q&A answers classifier 35 Old model ROC=0.668 New model ROC=0.844
  • 36. The old model prediction seems truncated by some reason. Acme Q&A answers classifier 36
  • 37. Acme Q&A answers classifier 37 Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly scattered in whole range, too.
  • 38. ● Target : auto-accepting 80% user content, since we don’t won’t human moderating more. ● Currently, NOT auto-rejecting anything since stack holder asked so. Auto-moderating 377% Precision improved 7% New model Auto-accepting 68% 90% class-1 precision Old model Auto-accepting 18% 83% class-1 precision Acme QnA answers classifier 38 Bad 30% Good 66% Good 70%
  • 39. ● LGB performance stocked 0.77 @ ~350k training samples ● TextCNN performance stocked 0.83 @ 1.2M training samples ● LSTM seems might still growing after 0.83? But we don’t have more samples. Learning Curve (QnA Invalidator) 39
  • 40. Thanks for your time! 40