SlideShare a Scribd company logo
TextCNN on
Acme UGC moderation
Marsan Ma
2018.7.18
1
● Stuffs you are interested in
○ TextCNN
○ Why you should try deep-learning on text data
● On real product
○ Acme review / Q&A moderation
○ Network Spec
Outline
2
TextCNN
3
TextCNN: architecture
1. Architecture
a. Embedding
i. Trainable, or not (if using pretrained)
b. 1d convolution filters
i. Each conv “searches for this pattern”
ii. Pattern to be searched is trainable
iii. hundreds/thousands filters per size
c. Maxpool results
i. Means “max density of certain pattern”
2. Keras code: just 10 lines
4
Let’s take an intuitive analogy!
(Keyword: Convolution, MaxPool)
5
6
I’m Llama (sample L) I’m Alpaca (sample A)
1. You want to classify whether the right animal is Llama.
7
2. So you find traits of Llama as your filters.
Traits (Filters)
8
3. Then you do a convolution to find maxmatch of each trait.
Llama Traits (Filters)
0%
70%
10%
Filters finding traits
(by Convolution)
70%
Best match of 1st trait
is 70% (by MaxPool)
10%
5%
9
Llama Traits (Filters)
0%
10%
80%
Filters finding traits
(by Convolution)
80%
Best match of 2nd trait
is 80% (by MaxPool)
60%
15%
3. Then you do a convolution to find maxmatch of each trait.
10
Llama Traits (Filters)
0%
10%
10%
Filters finding traits
(by Convolution)
60%
Best match of 3rd trait
is 60% (by MaxPool)
40%
60%
3. Then you do a convolution to find maxmatch of each trait.
11
4. Finally, you have similarities for each trait (features!)
Llama Traits (Filters)
70%
80%
60%
12
5. Make final decision (model) out of your features.
(In neural network, multi-layer perceptron is simplest.)
Llama Traits (Filters)
70%
80%
60%
M
Any
classifier
you love!
Conv & MaxPool
Now you got the idea,
Let’s dive into a bit more detail.
13
TextCNN: what is a 1d convolution?
14
Let’s talk in Convolutions!
10*3 + 20*0 + 30*1 + 40*2 = 140
Text Data Filter
TextCNN: what is a 1d convolution?
15
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
Text Data Filter
TextCNN: what is a 1d convolution?
16
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
10*0 + 20*2 + 30*0 + 40*0 = 40
Text Data Filter
TextCNN: what is a 1d convolution?
17
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
10*0 + 20*2 + 30*0 + 40*0 = 40
10*0 + 20*0 + 30*0 + 40*0 = 0
Note: you could specify activation function in Conv1D to adjust your output,
and seems like people is using relu in TextCNN.
Text Data Filter
1. It’s idea similar to ngram-bow/tfidf, but:
a. Don’t need an exact match, words with similar
meanings also contribute.
b. Weighted among tokens/dimensions.
c. Maxpool make only the most matched pattern
have final effect.
2. It’s including
a. Training the embedding
b. Training the feature finder
c. into your supervised learning process,
dedicated to your data.
3. While most of traditional feature-finding & extraction
is actually unsupervised-learning process.
TextCNN: what is a 1d convolution?
18
Text Data Filter
Now you know the details.
Let’s return to full picture!
19
TextCNN: architecture
1. Architecture
a. Embedding
i. Trainable, or not (if using pretrained)
b. 1d convolution filters
i. Each “searches for this pattern”
ii. Pattern to be searched is trainable
iii. hundreds/thousands filters per size
c. Maxpool results
i. Means “max density of certain pattern”
2. Keras code: just 10 lines
20
1. Input: 140 as the max word count of doc.
2. 60k for english dictionary size
3. 300 is convention of embedding size
4. region size=[2,3,4] as example figure
5. filter=2, as example figure
6. dropout=0.5 because Hinton said so.
TextCNN: code detail
21
Text_1
(ex: resume)
Advance trick: concatenate channels from multiple features
22
MLP
...
?
[Output stage]
sigmoid, softmax,
linear… according to
your response type.
textcnn_vec1
textcnn_vec2
Text_2
(ex: jobTitle)
Your Fancy
Feature
Engineering
traditional
features
Advance trick: then concatenate regular features you love
23
MLP
...
?
[Output stage]
sigmoid, softmax,
linear… according to
your response type.
textcnn_vec1
textcnn_vec2
+
Text_1
(ex: resume)
Text_2
(ex: jobTitle)
Others
(ex: sex/lang)
Why you should try
Deep Learning?
24
● Doing much better than ngram
○ Embedding => better resolution than word level
○ words having similar meaning will also work
○ Including feature extraction in supervised learning for your data.
● No Manual Feature Extraction
○ No need to reproduce feature extraction in deploy (Java) domain.
○ Feature extraction from text data could be computing expensive:
■ Dictionary based feature is slow for large sample size
■ Model based features like NMF, LDA is super slow.
● Pretrained embeddings give you a boost (word2vec, GloVe, FastText):
○ Idea like transfer learning.
○ Then your embedding could further fine tune it to fit your dataset.
Better features = Better performance
25
● Customize you model according to your data/purpose.
○ Switch output layer / activation function for different response type / range.
○ Customize architecture according to your data characteristic.
○ Mess around with hidden layers / dropout / different activate function.
● Merit as an online model (SGD over BGD)
○ Sustainable model: old model + new samples = new model having old experience!
○ Memory friendly: don’t need to load all samples in memory, choose mini-batch size fitting
your usage.
● GPU speed you!
Customizable / Reusability!
26
Deployment: Tensorflow Java API (doc)
27
1. Load protobuf model
2. Input tokenized text
3. Get prediction results
● Big network => expensive computing power
○ Embedding layer is expensive, since it’s a huge fully-connected layer.
○ Reference: the predicting throughput is ~750/sec in our deploying model. (w/o GPU)
● Larger model
○ Both model we’re deploying is 100Mb with 13M total/trainable parameters, while simple tree
based model could be < 10 Mb.
○ In our case, it takes 4.5 hours to train on 1.2M reviews in 1 epoch, with 8cpu/32G ram.
● Solutions for throughput:
○ GPU acceleration
○ Do predicting in parallel on product
○ Use coarse model to narrow down search space, only use fine-grain model in sorting
promising candidates.
No free lunch, it will cost you ...
28
Acme UGC moderation
29
Preliminary
● User generated content (UGC) is valuable asset in Acme.
● Bad UGC will ruin user experience, or get us sued.
● Today we’re talking about model moderating Reviews and Q&A.
30
Acme reviews classifier: ROC curve
31
Old model
ROC=0.696
New model
ROC=0.843
Somehow the old model tend to be too confidence in separating sample to 1/0.
Acme reviews classifier: class distribution
32
Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly
scattered in whole range, too.
Acme reviews classifier
33
● Target : auto-accepting 80% user content, since we don’t won’t human moderating more.
● Currently, NOT auto-rejecting anything since stack holder asked so.
New model
Auto-accepting 80%
82% class-1 precision
Old model
Auto-accepting 80%
74% class-1 precision
Acme reviews classifier
34
Bad
33%
Good
66%
0.82-0.74 / (1-0.74) = 30% less
bad content being approved
Acme Q&A answers classifier
35
Old model
ROC=0.668
New model
ROC=0.844
The old model prediction seems truncated by some reason.
Acme Q&A answers classifier
36
Acme Q&A answers classifier
37
Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly
scattered in whole range, too.
● Target : auto-accepting 80% user content, since we don’t won’t human moderating more.
● Currently, NOT auto-rejecting anything since stack holder asked so.
Auto-moderating 377%
Precision improved 7%
New model
Auto-accepting 68%
90% class-1 precision
Old model
Auto-accepting 18%
83% class-1 precision
Acme QnA answers classifier
38
Bad
30%
Good
66%
Good
70%
● LGB performance stocked 0.77 @ ~350k training samples
● TextCNN performance stocked 0.83 @ 1.2M training samples
● LSTM seems might still growing after 0.83? But we don’t have more samples.
Learning Curve (QnA Invalidator)
39
Thanks for your time!
40

More Related Content

What's hot

BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorial
potaters
 
Xgboost
XgboostXgboost
Xgboost
XgboostXgboost
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Knoldus Inc.
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Eugene Yan Ziyou
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
Jisang Yoon
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Jisang Yoon
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
Jisang Yoon
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
Raouf KESKES
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
michiaki ito
 
ラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポートラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポート
KazuyukiMasada
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen
 
Featurizing log data before XGBoost
Featurizing log data before XGBoostFeaturizing log data before XGBoost
Featurizing log data before XGBoost
DataRobot
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Bartlomiej Twardowski
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
Nicholas McClure
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
Raouf KESKES
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
Xiang Zhang
 

What's hot (20)

BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorial
 
Xgboost
XgboostXgboost
Xgboost
 
Xgboost
XgboostXgboost
Xgboost
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
ラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポートラビットチャレンジ 深層学習Day1 day2レポート
ラビットチャレンジ 深層学習Day1 day2レポート
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Featurizing log data before XGBoost
Featurizing log data before XGBoostFeaturizing log data before XGBoost
Featurizing log data before XGBoost
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 

Similar to Text cnn on acme ugc moderation

Duplicate_Quora_Question_Detection
Duplicate_Quora_Question_DetectionDuplicate_Quora_Question_Detection
Duplicate_Quora_Question_Detection
Jayavardhan Reddy Peddamail
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
Databricks
 
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
Nicolas Kowalski
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
PyData
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
byteLAKE
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Alex Conway
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
SubhradeepMaji
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail Selection
Kamil Sindi
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Manish Pandey
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
Nitish Upreti
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
Ian Foster
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
Mike Acton
 
Lesson 2 Understanding Types And Usage In Dot Net
Lesson 2    Understanding Types And Usage In Dot NetLesson 2    Understanding Types And Usage In Dot Net
Lesson 2 Understanding Types And Usage In Dot Net
nbaveja
 
DeepLearningProjV3
DeepLearningProjV3DeepLearningProjV3
DeepLearningProjV3Ana Sanchez
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
Oswald Campesato
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
Koichi Shirahata
 
House price prediction
House price predictionHouse price prediction
House price prediction
SabahBegum
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
Vincent Tatan
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 

Similar to Text cnn on acme ugc moderation (20)

Duplicate_Quora_Question_Detection
Duplicate_Quora_Question_DetectionDuplicate_Quora_Question_Detection
Duplicate_Quora_Question_Detection
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail Selection
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
 
Lesson 2 Understanding Types And Usage In Dot Net
Lesson 2    Understanding Types And Usage In Dot NetLesson 2    Understanding Types And Usage In Dot Net
Lesson 2 Understanding Types And Usage In Dot Net
 
DeepLearningProjV3
DeepLearningProjV3DeepLearningProjV3
DeepLearningProjV3
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Text cnn on acme ugc moderation

  • 1. TextCNN on Acme UGC moderation Marsan Ma 2018.7.18 1
  • 2. ● Stuffs you are interested in ○ TextCNN ○ Why you should try deep-learning on text data ● On real product ○ Acme review / Q&A moderation ○ Network Spec Outline 2
  • 4. TextCNN: architecture 1. Architecture a. Embedding i. Trainable, or not (if using pretrained) b. 1d convolution filters i. Each conv “searches for this pattern” ii. Pattern to be searched is trainable iii. hundreds/thousands filters per size c. Maxpool results i. Means “max density of certain pattern” 2. Keras code: just 10 lines 4
  • 5. Let’s take an intuitive analogy! (Keyword: Convolution, MaxPool) 5
  • 6. 6 I’m Llama (sample L) I’m Alpaca (sample A) 1. You want to classify whether the right animal is Llama.
  • 7. 7 2. So you find traits of Llama as your filters. Traits (Filters)
  • 8. 8 3. Then you do a convolution to find maxmatch of each trait. Llama Traits (Filters) 0% 70% 10% Filters finding traits (by Convolution) 70% Best match of 1st trait is 70% (by MaxPool) 10% 5%
  • 9. 9 Llama Traits (Filters) 0% 10% 80% Filters finding traits (by Convolution) 80% Best match of 2nd trait is 80% (by MaxPool) 60% 15% 3. Then you do a convolution to find maxmatch of each trait.
  • 10. 10 Llama Traits (Filters) 0% 10% 10% Filters finding traits (by Convolution) 60% Best match of 3rd trait is 60% (by MaxPool) 40% 60% 3. Then you do a convolution to find maxmatch of each trait.
  • 11. 11 4. Finally, you have similarities for each trait (features!) Llama Traits (Filters) 70% 80% 60%
  • 12. 12 5. Make final decision (model) out of your features. (In neural network, multi-layer perceptron is simplest.) Llama Traits (Filters) 70% 80% 60% M Any classifier you love!
  • 13. Conv & MaxPool Now you got the idea, Let’s dive into a bit more detail. 13
  • 14. TextCNN: what is a 1d convolution? 14 Let’s talk in Convolutions! 10*3 + 20*0 + 30*1 + 40*2 = 140 Text Data Filter
  • 15. TextCNN: what is a 1d convolution? 15 Let’s talk in Convolution! 10*3 + 20*0 + 30*1 + 40*2 = 140 10*0 + 20*0 + 30*0 + 40*0 = 0 Text Data Filter
  • 16. TextCNN: what is a 1d convolution? 16 Let’s talk in Convolution! 10*3 + 20*0 + 30*1 + 40*2 = 140 10*0 + 20*0 + 30*0 + 40*0 = 0 10*0 + 20*2 + 30*0 + 40*0 = 40 Text Data Filter
  • 17. TextCNN: what is a 1d convolution? 17 Let’s talk in Convolution! 10*3 + 20*0 + 30*1 + 40*2 = 140 10*0 + 20*0 + 30*0 + 40*0 = 0 10*0 + 20*2 + 30*0 + 40*0 = 40 10*0 + 20*0 + 30*0 + 40*0 = 0 Note: you could specify activation function in Conv1D to adjust your output, and seems like people is using relu in TextCNN. Text Data Filter
  • 18. 1. It’s idea similar to ngram-bow/tfidf, but: a. Don’t need an exact match, words with similar meanings also contribute. b. Weighted among tokens/dimensions. c. Maxpool make only the most matched pattern have final effect. 2. It’s including a. Training the embedding b. Training the feature finder c. into your supervised learning process, dedicated to your data. 3. While most of traditional feature-finding & extraction is actually unsupervised-learning process. TextCNN: what is a 1d convolution? 18 Text Data Filter
  • 19. Now you know the details. Let’s return to full picture! 19
  • 20. TextCNN: architecture 1. Architecture a. Embedding i. Trainable, or not (if using pretrained) b. 1d convolution filters i. Each “searches for this pattern” ii. Pattern to be searched is trainable iii. hundreds/thousands filters per size c. Maxpool results i. Means “max density of certain pattern” 2. Keras code: just 10 lines 20
  • 21. 1. Input: 140 as the max word count of doc. 2. 60k for english dictionary size 3. 300 is convention of embedding size 4. region size=[2,3,4] as example figure 5. filter=2, as example figure 6. dropout=0.5 because Hinton said so. TextCNN: code detail 21
  • 22. Text_1 (ex: resume) Advance trick: concatenate channels from multiple features 22 MLP ... ? [Output stage] sigmoid, softmax, linear… according to your response type. textcnn_vec1 textcnn_vec2 Text_2 (ex: jobTitle)
  • 23. Your Fancy Feature Engineering traditional features Advance trick: then concatenate regular features you love 23 MLP ... ? [Output stage] sigmoid, softmax, linear… according to your response type. textcnn_vec1 textcnn_vec2 + Text_1 (ex: resume) Text_2 (ex: jobTitle) Others (ex: sex/lang)
  • 24. Why you should try Deep Learning? 24
  • 25. ● Doing much better than ngram ○ Embedding => better resolution than word level ○ words having similar meaning will also work ○ Including feature extraction in supervised learning for your data. ● No Manual Feature Extraction ○ No need to reproduce feature extraction in deploy (Java) domain. ○ Feature extraction from text data could be computing expensive: ■ Dictionary based feature is slow for large sample size ■ Model based features like NMF, LDA is super slow. ● Pretrained embeddings give you a boost (word2vec, GloVe, FastText): ○ Idea like transfer learning. ○ Then your embedding could further fine tune it to fit your dataset. Better features = Better performance 25
  • 26. ● Customize you model according to your data/purpose. ○ Switch output layer / activation function for different response type / range. ○ Customize architecture according to your data characteristic. ○ Mess around with hidden layers / dropout / different activate function. ● Merit as an online model (SGD over BGD) ○ Sustainable model: old model + new samples = new model having old experience! ○ Memory friendly: don’t need to load all samples in memory, choose mini-batch size fitting your usage. ● GPU speed you! Customizable / Reusability! 26
  • 27. Deployment: Tensorflow Java API (doc) 27 1. Load protobuf model 2. Input tokenized text 3. Get prediction results
  • 28. ● Big network => expensive computing power ○ Embedding layer is expensive, since it’s a huge fully-connected layer. ○ Reference: the predicting throughput is ~750/sec in our deploying model. (w/o GPU) ● Larger model ○ Both model we’re deploying is 100Mb with 13M total/trainable parameters, while simple tree based model could be < 10 Mb. ○ In our case, it takes 4.5 hours to train on 1.2M reviews in 1 epoch, with 8cpu/32G ram. ● Solutions for throughput: ○ GPU acceleration ○ Do predicting in parallel on product ○ Use coarse model to narrow down search space, only use fine-grain model in sorting promising candidates. No free lunch, it will cost you ... 28
  • 30. Preliminary ● User generated content (UGC) is valuable asset in Acme. ● Bad UGC will ruin user experience, or get us sued. ● Today we’re talking about model moderating Reviews and Q&A. 30
  • 31. Acme reviews classifier: ROC curve 31 Old model ROC=0.696 New model ROC=0.843
  • 32. Somehow the old model tend to be too confidence in separating sample to 1/0. Acme reviews classifier: class distribution 32
  • 33. Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly scattered in whole range, too. Acme reviews classifier 33
  • 34. ● Target : auto-accepting 80% user content, since we don’t won’t human moderating more. ● Currently, NOT auto-rejecting anything since stack holder asked so. New model Auto-accepting 80% 82% class-1 precision Old model Auto-accepting 80% 74% class-1 precision Acme reviews classifier 34 Bad 33% Good 66% 0.82-0.74 / (1-0.74) = 30% less bad content being approved
  • 35. Acme Q&A answers classifier 35 Old model ROC=0.668 New model ROC=0.844
  • 36. The old model prediction seems truncated by some reason. Acme Q&A answers classifier 36
  • 37. Acme Q&A answers classifier 37 Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly scattered in whole range, too.
  • 38. ● Target : auto-accepting 80% user content, since we don’t won’t human moderating more. ● Currently, NOT auto-rejecting anything since stack holder asked so. Auto-moderating 377% Precision improved 7% New model Auto-accepting 68% 90% class-1 precision Old model Auto-accepting 18% 83% class-1 precision Acme QnA answers classifier 38 Bad 30% Good 66% Good 70%
  • 39. ● LGB performance stocked 0.77 @ ~350k training samples ● TextCNN performance stocked 0.83 @ 1.2M training samples ● LSTM seems might still growing after 0.83? But we don’t have more samples. Learning Curve (QnA Invalidator) 39
  • 40. Thanks for your time! 40