Applied Text Classification– Is Deep Learning Business Ready?
Examples from the HR Industry
• Introduction to Experteer
• Limitations of job search
• Improving job search with ML
– our current tech
• Usind Deep Learning for text
classification and benchmarking
• Summary and next steps
/ Agenda
/ Introduction
A little bit about me
• Moved to Munich in 2006 from Sofia, Bulgaria;
• Studied Finance/Statistics/Econometry in LMU Munich;
• Initial setup of the data science department in Experteer;
• Lead a large-scale ML initiative to automate core processes;
• Heading Data Services at Experteer.
/ Experteer Data Services
• Provides ML Services and Data Science for
Experteer;
• Offers a range of ML APIs for external HR Tech
Clients;
• Consulting services for applied machine learning
across all industries;
• AI Workshops for value chain optimization
accross all industries.
/ Introduction to Experteer
Europe’s executive
career service
www.experteer.com
/ Traditional Job Search is Bad!
Full-text search is just not enough...
Searching for a “CEO” position on a job board….
Returns „HR Manager“ as a first result.
/ Taxonomy fixes full-text search limitations
Experteer delivers better search results with taxonomy filters.
Career Level Filter
/ Our Problem
Very complex, manual data processing process with exponential costs.
State 2014
• Highly customized and manual process of job data;
• Team of 80-100 ppl;
• Hand-picking and classifying jobs (90% left-out);
• Extensive Job Classification Taxonomy
 19 Functions
 631 Industries on 4 Levels
 8 Career Levels
 Location, education, company and subsidiary, salary, education, travel requirements
• 7 Languages; 12 Countries.
• Major asset:Extremely good quality of the positions. 2 mio+ hand-classified jobs in multiple
languages.
JOB COST: 3€/JOB
/ Job Classification Example
1
Manager, App Store Program Management
Job Summary
Apple is seeking a Manager for the App Store Program Management Team. This role will lead a
team of engineering program managers responsiblefor end-to-end delivery of App Store features
across iOS, macOS, and tvOS platforms.
Key Qualifications
8+ years of professional experience in software program/project/product management
Proven experience in building and managing high performing teams and individuals
Proven track record in managing and deploying large,complex programs
Strong relationship management and facilitation skills both within diverseengineering teams and
cross functional organizations
Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities
Great attention to detail and organized
Excellent written and verbal communication skills
Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value
quickly to our customers
RequiedExperience
- Understanding of mobile software development
- Understanding of server-based software development
- Knowledge of iTunes Connect,App Store, iOStechnologies
Description
We are looking for a seasoned manager with a proven track record in program management. This
is not just a peoplemanager role, you will need to have hands on experience managing software
releases. You will develop tools and processes to gain efficiencies in the build,development,
testing, and deployment lifecycle. The role requires a combination of program and release
management, strong engineering background,and ability to build collaborative relationships
across various teams in Apple. We are looking for someonewho loves digging into details,building
teams, and driving operational efficiencies under demanding timeframes. You take responsibility;
you feel a personal stake in the product you ship; you communicateresponsibilities and scope
clearly; you value integrity; you manage risk; you need to know how things work; you work for the
success of the entire PM team; you thrive in uncertainty and strive to bring order to it; you have
deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are
aware of politics but do not get mired in them.
Education
BS/MS in Computer Science, Engineering or similar technical field
3
2
4
5 6
7
8
• Building teams – indicator for people manager
• Education is an indicator for function.
• Good indicator for the function selection
• “building and managing high performing teams” – this is a
strong indication for a people manager career level;
• Indicates the industry – Software companies
• Indicator for at least people manager career level
• Indicator the function
• “Manager” is a soft indication, has to be looked into
context. It could indicate management responsibilities,
but it depends on the rest ot the responsibilities.
7
1
2
3
4
5
6
8
Career Level Function Industry
/ Our Solution with Machine Learning
Step 1 – break down the whole value chain into small steps
• Breakdown of the whole process in small steps and start with the low-hanging fruits
Linguistic Rules – manage/create rules for extraction/classification QC UI
/ Our Classification Stack
Step 2: Build a classification pipeline
Linguistic Rules
Supplementalbusiness logic
rules to to improve modelscore
Base Model
Ensemble oflinear classifiers
(SVM, NB) with (c)BOW, n-grams
Data
cleaning
Confidence
Evaluation
Random QC
Check
Re-train models with hand-checked data
Create business logic rules (if necessary)
Feature Engineering
Job
DB
/ Goal: Half Cost/Double Jobs
We managed to decrease our cost by more than 50% and increased the output by 3x
0%
20%
40%
60%
80%
100%
120%
140%
160%
0
50000
100000
150000
200000
250000
300000
350000
400000
2010-01
2010-04
2010-07
2010-10
2011-01
2011-04
2011-07
2011-10
2012-01
2012-04
2012-07
2012-10
2013-01
2013-04
2013-07
2013-10
2014-01
2014-04
2014-07
2014-10
2015-01
2015-04
2015-07
2015-10
2016-01
2016-04
2016-07
2016-10
2017-01
Live Jobs
Cost Change %
Live
Jobs
SUCCESS!
Unit Cost
3 0€ per
Job
/ What is Next? DEEP NEURAL NETWORKS!
Benchmark a bunch of DNN on our data!
• Traditional models with BoW and n-grams capture only partially complexity of career levels;
• Deep Neural Networks have recently become very popular for text processing and NLP tasks;
• CNNs have been successfully adapted to computer vision because of the compositional structure
of an image;
• Texts have similar properties: characters combine to form words, n-grams, stems, phrases,
sentences, etc;
• CNN-based models achieve very good performance in laboratory practice, what about real-life
business problems?
• To our knowledge, no one has used deep neural networks for job classification.
ALL LOOKS GOOD! LETS TRY IT!
/ Datasets
Overview of the datasets
DATASET 1
622K jobs (title, description).
Average length: 2203 characters.
Collected over 8 years.
Created by more than 300 people.
Includes datapoints from junior colleagues.
Career Level Number of Jobs
Specialist 189,637
Senior Specialist 283,125
Manager 109,758
Senior Manager 26,815
Business Unit Leader 6,748
Managing Director SME 5,132
Managing Director Large
Comp
386
DATASET 2
243K jobs (title, description).
Average length 2197 characters.
Collected over 6 years.
Created by 80 people.
Excludes junior colleagues.
Only include jobs reviewed by QC Team.
Career Level Number of Jobs
Specialist 57,844
Senior Specialist 142,771
Manager 39,870
Senior Manager 2,897
Business Unit Leader 375
Managing Director SME 152
Managing Director Large
Comp
3
/ Tech Setup & Stack
Machines used for training
Nvidia Titan X 12GB NVIDIA Quadro P6000 24 GB
256 GB Ram
24 x 3.0 GhZ CPU
Machine 1 Machine 2
CUDA + PyTorch
/ The Test: Career Level Classification
Example of how a human would read classify a job for career level
Manager, App Store Program Management
Job Summary
Appleis seeking a Manager for the App Store Program Management Team. This rolewill lead a
team of engineering program managers responsible for end-to-end delivery of App Store features
across iOS, macOS, and tvOS platforms.
Key Qualifications
8+ years of professional experience in software program/project/product management
Proven experience in building and managing high performing teams and individuals
Proven track record in managing and deploying large,complex programs
Strong relationship management and facilitation skills both within diverseengineering teams and
cross functional organizations
Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities
Great attention to detail and organized
Excellent written and verbal communication skills
Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value
quickly to our customers
Required Experience
- Understanding of mobilesoftware development
- Understanding of server-based software development
- Knowledgeof iTunes Connect,App Store, iOStechnologies
Description
We are looking for a seasoned manager with a proven track record in program management. This
is not just a people manager role, you will need to have hands on experience managing software
releases. You will develop tools and processes to gain efficiencies in the build,development,
testing, and deployment lifecycle. The role requires a combination of program and release
management, strong engineering background,and ability to build collaborative relationships
across various teams in Apple. We are looking for someonewho loves digging into details,building
teams, and driving operational efficiencies under demanding timeframes. You take responsibility;
you feel a personal stake in the product you ship; you communicateresponsibilities and scope
clearly; you value integrity; you manage risk; you need to know how things work; you work for the
success of the entire PM team; you thrivein uncertainty and strive to bring order to it; you have
deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are
aware of politics but do not get mired in them.
Education
BS/MS in Computer Science, Engineering or similar technical field
1
3
4
5
• “Leading a team” is a soft indicator for a manager
• “building and managing high performing teams” – this is a
strong indication for a people manager career level.
• Indicator for at least people manager career level;
• “Building teams” – indicator for manager
• “Manager” is a soft indication, has to be looked into
context. It could indicate management responsibilities,
but it depends on the rest ot the responsibilities;
1
2
3
4
5
2
/ Starting with VDCNN
Very Deep Convolutional Neural Networks (Conneau at al. 2017)
MOTIVATION:
• NLP tasks are commonlyapproachedwith RNN (particularly LTSM) and CNNs;
• However, these architectures are rather shallow;
• State-of-the-art computervision has pioneeredDEEP CNNs and greatly profited from such models;
• Builds on ”Character-levelCNN for Text Classification” by Zhang et al. (2016) which outperforms
traditional methodson similardatasets;
• VDCNN operates on character-level– no data preprocessingoraugmentation.
Conneau atal. 2017 - https://arxiv.org/pdf/1606.01781.pdf
/ VDCNN Test 1: Career Level on 622K Dataset
Initial test to get a feeling of how well the model abstracts.
• No oversampling;
• All 7 classes are tested;
• 90/10 split.
Dataset Size 622K
Training Time 25 hours
GPU Titan X 12 GB
Layers 29
Epochs 9
Accuracy 73,5%
VDCNN
ConfusionMatrix
/ VDCNN Test 2: Smaller Dataset
We measure the performance of our smaller dataset. Classes 5,6,7,8 are combined
Dataset Size 243K
Training Time 28 hours, 39 mins
GPU P6000 24 GB
Layers 55
Epochs 30
Accuracy 86,68%
Dataset Size 243K
Training Time 16 hours, 54 mins
GPU Titan X 12 GB
Layers 33
Epochs 30
Accuracy 87,99%
VDCNN
Career Level Number of
Jobs
Specialist 57,844 Class 1
Senior Specialist 142,771 Class 2
Manager 39,870 Class 3
Senior Manager 2,897
Class 4
Business Unit Leader 375
Managing Director SME 152
Managing Director Large
Comp
3
• A cleaner (but smaller) dataset.
• Grouping of classes 5-8 as a new class 4.
• Huge jump in performance.
• Still, very long training times.
/ Deeper look into the confusion matrix
Compare the confusion matrix of both models
Confusion Matrix: 622K – 7 Class
Confusion Matrix: 243K – 4 Class
/ Add a Benchmark: FastText
Facebook open-sourced library for building of scalable solutions for text
representation and classification
https://arxiv.org/abs/1607.01759
Let’s include some benchmarking!
• Developed by Facebook AI Research;
• Released in 2016 due to critical acclaim;
• Scalable across 100Ks of classes due to hierarchical structure;
• Represents sentences as BoW, BoN;
• Sharing information across classes – what one class learns about a word is shared to all;
• Written in C++, so EXTREMELY FAST.
/ VDCNN vs FastText
Classifying for 7 Classes on the 622K Dataset
Dataset Size 622K (7 Classes)
Training Time 5 minutes !!!
GPU N.A.
Layers N.A.
Epochs 10
Accuracy 74,1%
Dataset Size 622K (7 Classes)
Training Time 25 hours
GPU Titan X 12 GB
Layers 29
Epochs 9
Accuracy 73,5%
FastText outperforms VDCNN slightly at only a fraction of the training time.
VDCNN FastText
/ VDCNN vs FastText
Classifying for 4 Classes on the 243K Dataset
Dataset Size 243 (4 Classes)
Training Time 2,5 minutes !!!
GPU N.A.
Layers N.A.
Epochs 10
Accuracy 88,1%
FastText outperforms again VDCNN slightly at only a fraction of the training time.
FastText
Dataset Size 243K (4 Classes)
Training Time 16 hours, 54 mins
GPU Titan X 12 GB
Layers 33
Epochs 30
Accuracy 87,99%
VDCNN
/ HDLText
Hierarchical Deep Learning for Text Classification
Why are trying this?
• Specifically developed for datasets with large corpus (similar to ours);
• Hierarchical classification can be also applied to career levels;
• Outperforms baseline classifiers.
Original paper: https://arxiv.org/abs/1709.08267
Repository: https://github.com/kk7nc/HDLTex
/ HDLTex
Setup of our experiment
• We train a German word vector on our 622K Dataset using https://github.com/stanfordnlp/GloVe
• We train HDLTex with our 243K Dataset
• As we have very few observations in career level 8, we treat 7 and 8 as one class;
• Split 90/10;
• We combine our training data like following;
Class Career Level Class Feature Observations
Class 1 specialist + senior
specialist
Career levels with no
people management
200,615
Class 2 manager + senior manager Career levels with people
management
42,767
Class 3 business unit leader +
managing Director
Career Levels with P&L
Responsibility
530
/ HDLTex (Layer 1 RNN, Layer 2 CNN)
Results from our experiment
Dataset Size 243K
Training Time 10 hours
GPU P6000 24GB
Accuracy 86,8%
HDLTex Confusion Matrix
Training a career level classifier with HDLTex is still not better than FastText and takes longer!
Dataset Size 243K
Training Time 2.5 minutes !!!
GPU N.A.
Accuracy 88,0%
FastTextHDLTex
/ Summary
Let’s review what we have learned today
• Deep Learning is a major step forward in the classification of documents – both VDCNN and
HDLTex outperform our best-practice linear classifiers model;
• Plenty of academic literature and open-source implementations allow data scientist to start
testing in a couple of hours;
• However both deep neural network architectures require long training times, even on
powerful GPUs, which makes experimentation hard;
• Fasttext outperforms all models and can be trained in minutes on a desktop CPU, which
allows for easy MVPs and testing;
• Business owners interested in rapid prototyping should definitely explore FastText for text
classification before jumping on DNN.
/Next Steps
Where we will invest time effort in the next 2 months
• Further tests with HDLTex, especially for the classification of industries (2-Level hierarchy);
• Benchmark FastText to every process where we use linear classifiers and deploy to production;
• Benchmark Deep Pyramid Convolutional Neural Networks (Zhang et al, 2017) to FastText/VDCNN;
• Analyze predictions from FastText, HDLTex, VDCNN and explore opportunities for model stacking.
/ Thank you for your attention
Special thanks to our Data Scientists Viet Nguyen, who made all of
this possible.
AlexanderChukovski
alexander.chukovski@experteer.com
/ Questions from the Audience
Question 1: Why is FastText so fast, compared to Deep Learning?
Written in C++, compiled executables are faster than script languages.
A hierarchical softmax takes advantage of fast computation times.
Fast training times due to successful basic concepts of NLP – bag of words, bag of n-grams.
Question 2: How did you get up to speed with machine learning?
In the beginning we had a lot of help from an external firm “Glanos” in Munich
that helped us build our first ML models and create a production-ready solution.
Most of the research in Deep Learning is free as academic papers and the community
is very fast in building Github repositories with the models.
Question 3: The machines that you have used are very powerful. It this a cloud cluster?
No, we had test access to these machines for a short period of time;
Question 4 Can a normal person or a small company actually run Deep Learning?
This configuration seems expensive.
You can buy a Titan GTX GPU from Nvidia for about €900 on Ebay.
CPU and RAM configuration is not that relevant for Deep Learning, although your RAM should
match your GPU RAM.
A normal desktop with a 12GB GPU should be more than enough to replicate these experiments.
VDCNN only required 4GB of GPU, so we did not fully utilize the full GPU RAM.

Applied Deep Learning for Text Classification - Examples from the HR Industry

  • 1.
    Applied Text Classification–Is Deep Learning Business Ready? Examples from the HR Industry
  • 2.
    • Introduction toExperteer • Limitations of job search • Improving job search with ML – our current tech • Usind Deep Learning for text classification and benchmarking • Summary and next steps / Agenda
  • 3.
    / Introduction A littlebit about me • Moved to Munich in 2006 from Sofia, Bulgaria; • Studied Finance/Statistics/Econometry in LMU Munich; • Initial setup of the data science department in Experteer; • Lead a large-scale ML initiative to automate core processes; • Heading Data Services at Experteer.
  • 4.
    / Experteer DataServices • Provides ML Services and Data Science for Experteer; • Offers a range of ML APIs for external HR Tech Clients; • Consulting services for applied machine learning across all industries; • AI Workshops for value chain optimization accross all industries.
  • 5.
    / Introduction toExperteer Europe’s executive career service www.experteer.com
  • 6.
    / Traditional JobSearch is Bad! Full-text search is just not enough... Searching for a “CEO” position on a job board…. Returns „HR Manager“ as a first result.
  • 7.
    / Taxonomy fixesfull-text search limitations Experteer delivers better search results with taxonomy filters. Career Level Filter
  • 8.
    / Our Problem Verycomplex, manual data processing process with exponential costs. State 2014 • Highly customized and manual process of job data; • Team of 80-100 ppl; • Hand-picking and classifying jobs (90% left-out); • Extensive Job Classification Taxonomy  19 Functions  631 Industries on 4 Levels  8 Career Levels  Location, education, company and subsidiary, salary, education, travel requirements • 7 Languages; 12 Countries. • Major asset:Extremely good quality of the positions. 2 mio+ hand-classified jobs in multiple languages. JOB COST: 3€/JOB
  • 9.
    / Job ClassificationExample 1 Manager, App Store Program Management Job Summary Apple is seeking a Manager for the App Store Program Management Team. This role will lead a team of engineering program managers responsiblefor end-to-end delivery of App Store features across iOS, macOS, and tvOS platforms. Key Qualifications 8+ years of professional experience in software program/project/product management Proven experience in building and managing high performing teams and individuals Proven track record in managing and deploying large,complex programs Strong relationship management and facilitation skills both within diverseengineering teams and cross functional organizations Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities Great attention to detail and organized Excellent written and verbal communication skills Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value quickly to our customers RequiedExperience - Understanding of mobile software development - Understanding of server-based software development - Knowledge of iTunes Connect,App Store, iOStechnologies Description We are looking for a seasoned manager with a proven track record in program management. This is not just a peoplemanager role, you will need to have hands on experience managing software releases. You will develop tools and processes to gain efficiencies in the build,development, testing, and deployment lifecycle. The role requires a combination of program and release management, strong engineering background,and ability to build collaborative relationships across various teams in Apple. We are looking for someonewho loves digging into details,building teams, and driving operational efficiencies under demanding timeframes. You take responsibility; you feel a personal stake in the product you ship; you communicateresponsibilities and scope clearly; you value integrity; you manage risk; you need to know how things work; you work for the success of the entire PM team; you thrive in uncertainty and strive to bring order to it; you have deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are aware of politics but do not get mired in them. Education BS/MS in Computer Science, Engineering or similar technical field 3 2 4 5 6 7 8 • Building teams – indicator for people manager • Education is an indicator for function. • Good indicator for the function selection • “building and managing high performing teams” – this is a strong indication for a people manager career level; • Indicates the industry – Software companies • Indicator for at least people manager career level • Indicator the function • “Manager” is a soft indication, has to be looked into context. It could indicate management responsibilities, but it depends on the rest ot the responsibilities. 7 1 2 3 4 5 6 8 Career Level Function Industry
  • 10.
    / Our Solutionwith Machine Learning Step 1 – break down the whole value chain into small steps • Breakdown of the whole process in small steps and start with the low-hanging fruits Linguistic Rules – manage/create rules for extraction/classification QC UI
  • 11.
    / Our ClassificationStack Step 2: Build a classification pipeline Linguistic Rules Supplementalbusiness logic rules to to improve modelscore Base Model Ensemble oflinear classifiers (SVM, NB) with (c)BOW, n-grams Data cleaning Confidence Evaluation Random QC Check Re-train models with hand-checked data Create business logic rules (if necessary) Feature Engineering Job DB
  • 12.
    / Goal: HalfCost/Double Jobs We managed to decrease our cost by more than 50% and increased the output by 3x 0% 20% 40% 60% 80% 100% 120% 140% 160% 0 50000 100000 150000 200000 250000 300000 350000 400000 2010-01 2010-04 2010-07 2010-10 2011-01 2011-04 2011-07 2011-10 2012-01 2012-04 2012-07 2012-10 2013-01 2013-04 2013-07 2013-10 2014-01 2014-04 2014-07 2014-10 2015-01 2015-04 2015-07 2015-10 2016-01 2016-04 2016-07 2016-10 2017-01 Live Jobs Cost Change % Live Jobs SUCCESS! Unit Cost 3 0€ per Job
  • 13.
    / What isNext? DEEP NEURAL NETWORKS! Benchmark a bunch of DNN on our data! • Traditional models with BoW and n-grams capture only partially complexity of career levels; • Deep Neural Networks have recently become very popular for text processing and NLP tasks; • CNNs have been successfully adapted to computer vision because of the compositional structure of an image; • Texts have similar properties: characters combine to form words, n-grams, stems, phrases, sentences, etc; • CNN-based models achieve very good performance in laboratory practice, what about real-life business problems? • To our knowledge, no one has used deep neural networks for job classification. ALL LOOKS GOOD! LETS TRY IT!
  • 14.
    / Datasets Overview ofthe datasets DATASET 1 622K jobs (title, description). Average length: 2203 characters. Collected over 8 years. Created by more than 300 people. Includes datapoints from junior colleagues. Career Level Number of Jobs Specialist 189,637 Senior Specialist 283,125 Manager 109,758 Senior Manager 26,815 Business Unit Leader 6,748 Managing Director SME 5,132 Managing Director Large Comp 386 DATASET 2 243K jobs (title, description). Average length 2197 characters. Collected over 6 years. Created by 80 people. Excludes junior colleagues. Only include jobs reviewed by QC Team. Career Level Number of Jobs Specialist 57,844 Senior Specialist 142,771 Manager 39,870 Senior Manager 2,897 Business Unit Leader 375 Managing Director SME 152 Managing Director Large Comp 3
  • 15.
    / Tech Setup& Stack Machines used for training Nvidia Titan X 12GB NVIDIA Quadro P6000 24 GB 256 GB Ram 24 x 3.0 GhZ CPU Machine 1 Machine 2 CUDA + PyTorch
  • 16.
    / The Test:Career Level Classification Example of how a human would read classify a job for career level Manager, App Store Program Management Job Summary Appleis seeking a Manager for the App Store Program Management Team. This rolewill lead a team of engineering program managers responsible for end-to-end delivery of App Store features across iOS, macOS, and tvOS platforms. Key Qualifications 8+ years of professional experience in software program/project/product management Proven experience in building and managing high performing teams and individuals Proven track record in managing and deploying large,complex programs Strong relationship management and facilitation skills both within diverseengineering teams and cross functional organizations Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities Great attention to detail and organized Excellent written and verbal communication skills Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value quickly to our customers Required Experience - Understanding of mobilesoftware development - Understanding of server-based software development - Knowledgeof iTunes Connect,App Store, iOStechnologies Description We are looking for a seasoned manager with a proven track record in program management. This is not just a people manager role, you will need to have hands on experience managing software releases. You will develop tools and processes to gain efficiencies in the build,development, testing, and deployment lifecycle. The role requires a combination of program and release management, strong engineering background,and ability to build collaborative relationships across various teams in Apple. We are looking for someonewho loves digging into details,building teams, and driving operational efficiencies under demanding timeframes. You take responsibility; you feel a personal stake in the product you ship; you communicateresponsibilities and scope clearly; you value integrity; you manage risk; you need to know how things work; you work for the success of the entire PM team; you thrivein uncertainty and strive to bring order to it; you have deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are aware of politics but do not get mired in them. Education BS/MS in Computer Science, Engineering or similar technical field 1 3 4 5 • “Leading a team” is a soft indicator for a manager • “building and managing high performing teams” – this is a strong indication for a people manager career level. • Indicator for at least people manager career level; • “Building teams” – indicator for manager • “Manager” is a soft indication, has to be looked into context. It could indicate management responsibilities, but it depends on the rest ot the responsibilities; 1 2 3 4 5 2
  • 17.
    / Starting withVDCNN Very Deep Convolutional Neural Networks (Conneau at al. 2017) MOTIVATION: • NLP tasks are commonlyapproachedwith RNN (particularly LTSM) and CNNs; • However, these architectures are rather shallow; • State-of-the-art computervision has pioneeredDEEP CNNs and greatly profited from such models; • Builds on ”Character-levelCNN for Text Classification” by Zhang et al. (2016) which outperforms traditional methodson similardatasets; • VDCNN operates on character-level– no data preprocessingoraugmentation. Conneau atal. 2017 - https://arxiv.org/pdf/1606.01781.pdf
  • 18.
    / VDCNN Test1: Career Level on 622K Dataset Initial test to get a feeling of how well the model abstracts. • No oversampling; • All 7 classes are tested; • 90/10 split. Dataset Size 622K Training Time 25 hours GPU Titan X 12 GB Layers 29 Epochs 9 Accuracy 73,5% VDCNN ConfusionMatrix
  • 19.
    / VDCNN Test2: Smaller Dataset We measure the performance of our smaller dataset. Classes 5,6,7,8 are combined Dataset Size 243K Training Time 28 hours, 39 mins GPU P6000 24 GB Layers 55 Epochs 30 Accuracy 86,68% Dataset Size 243K Training Time 16 hours, 54 mins GPU Titan X 12 GB Layers 33 Epochs 30 Accuracy 87,99% VDCNN Career Level Number of Jobs Specialist 57,844 Class 1 Senior Specialist 142,771 Class 2 Manager 39,870 Class 3 Senior Manager 2,897 Class 4 Business Unit Leader 375 Managing Director SME 152 Managing Director Large Comp 3 • A cleaner (but smaller) dataset. • Grouping of classes 5-8 as a new class 4. • Huge jump in performance. • Still, very long training times.
  • 20.
    / Deeper lookinto the confusion matrix Compare the confusion matrix of both models Confusion Matrix: 622K – 7 Class Confusion Matrix: 243K – 4 Class
  • 21.
    / Add aBenchmark: FastText Facebook open-sourced library for building of scalable solutions for text representation and classification https://arxiv.org/abs/1607.01759 Let’s include some benchmarking! • Developed by Facebook AI Research; • Released in 2016 due to critical acclaim; • Scalable across 100Ks of classes due to hierarchical structure; • Represents sentences as BoW, BoN; • Sharing information across classes – what one class learns about a word is shared to all; • Written in C++, so EXTREMELY FAST.
  • 22.
    / VDCNN vsFastText Classifying for 7 Classes on the 622K Dataset Dataset Size 622K (7 Classes) Training Time 5 minutes !!! GPU N.A. Layers N.A. Epochs 10 Accuracy 74,1% Dataset Size 622K (7 Classes) Training Time 25 hours GPU Titan X 12 GB Layers 29 Epochs 9 Accuracy 73,5% FastText outperforms VDCNN slightly at only a fraction of the training time. VDCNN FastText
  • 23.
    / VDCNN vsFastText Classifying for 4 Classes on the 243K Dataset Dataset Size 243 (4 Classes) Training Time 2,5 minutes !!! GPU N.A. Layers N.A. Epochs 10 Accuracy 88,1% FastText outperforms again VDCNN slightly at only a fraction of the training time. FastText Dataset Size 243K (4 Classes) Training Time 16 hours, 54 mins GPU Titan X 12 GB Layers 33 Epochs 30 Accuracy 87,99% VDCNN
  • 24.
    / HDLText Hierarchical DeepLearning for Text Classification Why are trying this? • Specifically developed for datasets with large corpus (similar to ours); • Hierarchical classification can be also applied to career levels; • Outperforms baseline classifiers. Original paper: https://arxiv.org/abs/1709.08267 Repository: https://github.com/kk7nc/HDLTex
  • 25.
    / HDLTex Setup ofour experiment • We train a German word vector on our 622K Dataset using https://github.com/stanfordnlp/GloVe • We train HDLTex with our 243K Dataset • As we have very few observations in career level 8, we treat 7 and 8 as one class; • Split 90/10; • We combine our training data like following; Class Career Level Class Feature Observations Class 1 specialist + senior specialist Career levels with no people management 200,615 Class 2 manager + senior manager Career levels with people management 42,767 Class 3 business unit leader + managing Director Career Levels with P&L Responsibility 530
  • 26.
    / HDLTex (Layer1 RNN, Layer 2 CNN) Results from our experiment Dataset Size 243K Training Time 10 hours GPU P6000 24GB Accuracy 86,8% HDLTex Confusion Matrix Training a career level classifier with HDLTex is still not better than FastText and takes longer! Dataset Size 243K Training Time 2.5 minutes !!! GPU N.A. Accuracy 88,0% FastTextHDLTex
  • 27.
    / Summary Let’s reviewwhat we have learned today • Deep Learning is a major step forward in the classification of documents – both VDCNN and HDLTex outperform our best-practice linear classifiers model; • Plenty of academic literature and open-source implementations allow data scientist to start testing in a couple of hours; • However both deep neural network architectures require long training times, even on powerful GPUs, which makes experimentation hard; • Fasttext outperforms all models and can be trained in minutes on a desktop CPU, which allows for easy MVPs and testing; • Business owners interested in rapid prototyping should definitely explore FastText for text classification before jumping on DNN.
  • 28.
    /Next Steps Where wewill invest time effort in the next 2 months • Further tests with HDLTex, especially for the classification of industries (2-Level hierarchy); • Benchmark FastText to every process where we use linear classifiers and deploy to production; • Benchmark Deep Pyramid Convolutional Neural Networks (Zhang et al, 2017) to FastText/VDCNN; • Analyze predictions from FastText, HDLTex, VDCNN and explore opportunities for model stacking.
  • 29.
    / Thank youfor your attention Special thanks to our Data Scientists Viet Nguyen, who made all of this possible. AlexanderChukovski alexander.chukovski@experteer.com
  • 30.
    / Questions fromthe Audience Question 1: Why is FastText so fast, compared to Deep Learning? Written in C++, compiled executables are faster than script languages. A hierarchical softmax takes advantage of fast computation times. Fast training times due to successful basic concepts of NLP – bag of words, bag of n-grams. Question 2: How did you get up to speed with machine learning? In the beginning we had a lot of help from an external firm “Glanos” in Munich that helped us build our first ML models and create a production-ready solution. Most of the research in Deep Learning is free as academic papers and the community is very fast in building Github repositories with the models. Question 3: The machines that you have used are very powerful. It this a cloud cluster? No, we had test access to these machines for a short period of time; Question 4 Can a normal person or a small company actually run Deep Learning? This configuration seems expensive. You can buy a Titan GTX GPU from Nvidia for about €900 on Ebay. CPU and RAM configuration is not that relevant for Deep Learning, although your RAM should match your GPU RAM. A normal desktop with a 12GB GPU should be more than enough to replicate these experiments. VDCNN only required 4GB of GPU, so we did not fully utilize the full GPU RAM.