Sherlock
NLP Transfer Learning Platform
https://github.com/suryavanshi/Sherlock
Manu Suryavansh
AI Fellow
https://www.linkedin.com/in/manusuryavansh/
Insight Open Source Project
1
FastText
LSTM+Word
Embeddings
Language Model
Ease of Use
Accuracy with
small Datasets
Transfer Learning
Potential classification solutions
2
Steps to make a Text Classifier product
● Collect large number of labelled samples!
● Decide which method to use?
● Where to train - do I need a GPU?
● How to scale and deploy your classifier?
3
Sherlock
Small Dataset
From weeks of work -> 1 Day
Sherlock Transfer Learning
Semi-Supervised
Language Model
(BERT)
Supervised: Sherlock
Model:
Dataset:
Objective: Language Modeling
Pre-Trained Model
Classifier
The movie was good 1
The movie was horrible 0
Dataset:
1 0
4
Sherlock: API
● TRAIN - API to train new models by fine-tuning BERT pre-
trained model weights.
○ Makes it easy to iterate and conduct experiments.
● LABEL - API to batch label unlabelled text data using the
above model.
○ Makes it easy to test and use the above models.
5
REST API Redis
Queue
Tasks
NLP
Inference
Server
Scalable Architecture
GPU
6
Train - Create a custom classifier in 2 steps!
7
Baseline:
Random
500 Samples is all you need
8(IMDB Reviews Dataset)
Lessons Learned
Lessons Solution
Versions! TF (1.11) + Cuda 9 + Docker Compose 2.3 !!
GPU Memory!! TF doesn’t release GPU memory!!
9
BERT - Bi-directional Encoder Representations from Transformers
● BERT is a deep bi-directional unsupervised language re-presentation pre-
trained on large text corpus (like Wikipedia).
● Ref: https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
● https://arxiv.org/abs/1810.04805
Deep bi-directional Uni-directional Shallow bi-directional
10
Summary
● Added scalable backend for NLP classification
● Provided API’s to:
○ Train a new model using BERT pre-trained language
model with GPU or CPU (for very small datasets)
○ Label dataset using the new model.
● Showed models using BERT perform well with small
datasets.
● Solved issues in using docker, TF, GPU together!
11

NLP Transfer learning platform

  • 1.
    Sherlock NLP Transfer LearningPlatform https://github.com/suryavanshi/Sherlock Manu Suryavansh AI Fellow https://www.linkedin.com/in/manusuryavansh/ Insight Open Source Project 1
  • 2.
    FastText LSTM+Word Embeddings Language Model Ease ofUse Accuracy with small Datasets Transfer Learning Potential classification solutions 2
  • 3.
    Steps to makea Text Classifier product ● Collect large number of labelled samples! ● Decide which method to use? ● Where to train - do I need a GPU? ● How to scale and deploy your classifier? 3 Sherlock Small Dataset From weeks of work -> 1 Day
  • 4.
    Sherlock Transfer Learning Semi-Supervised LanguageModel (BERT) Supervised: Sherlock Model: Dataset: Objective: Language Modeling Pre-Trained Model Classifier The movie was good 1 The movie was horrible 0 Dataset: 1 0 4
  • 5.
    Sherlock: API ● TRAIN- API to train new models by fine-tuning BERT pre- trained model weights. ○ Makes it easy to iterate and conduct experiments. ● LABEL - API to batch label unlabelled text data using the above model. ○ Makes it easy to test and use the above models. 5
  • 6.
  • 7.
    Train - Createa custom classifier in 2 steps! 7
  • 8.
    Baseline: Random 500 Samples isall you need 8(IMDB Reviews Dataset)
  • 9.
    Lessons Learned Lessons Solution Versions!TF (1.11) + Cuda 9 + Docker Compose 2.3 !! GPU Memory!! TF doesn’t release GPU memory!! 9
  • 10.
    BERT - Bi-directionalEncoder Representations from Transformers ● BERT is a deep bi-directional unsupervised language re-presentation pre- trained on large text corpus (like Wikipedia). ● Ref: https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html ● https://arxiv.org/abs/1810.04805 Deep bi-directional Uni-directional Shallow bi-directional 10
  • 11.
    Summary ● Added scalablebackend for NLP classification ● Provided API’s to: ○ Train a new model using BERT pre-trained language model with GPU or CPU (for very small datasets) ○ Label dataset using the new model. ● Showed models using BERT perform well with small datasets. ● Solved issues in using docker, TF, GPU together! 11

Editor's Notes

  • #2 https://youtu.be/psZ8Dm-yfgM
  • #4 From weeks of work to 1 day
  • #6 Make this text, not image
  • #9 API makes it easy to train and iterate on datasets of different sizes.