SlideShare a Scribd company logo
1 of 42
Download to read offline
© 2019 Chegg, Inc. / All Rights Reserved
Natural Language
Processing and Machine
Learning for non - experts.
May 7
Sanghamitra Deb
@sangha_deb,
sdeb@chegg.com
© 2019 Chegg, Inc. / All Rights Reserved
Zoha Zargham Sakshi Bhargava Priya Venkat
Collaborators
© 2019 Chegg, Inc. / All Rights Reserved
Natural Language Processing
• Giving structure to
unstructured data.
• Learn properties of the data
that makes decision making
simple.
• Provide concise information to
drive intelligence of different
systems.
© 2019 Chegg, Inc. / All Rights Reserved
Natural Language Processing
All industries generate content.
Healthcare
Finance
Retail
Legal
Education
Marketing
Real Estate Social Media
Academics
© 2019 Chegg, Inc. / All Rights Reserved
Natural Language Processing :
Why?
• Unstructured data cannot be
consumed directly.
• Automate simple and complex
functionalities.
• Users can query text data and
generate BU reports.
• Understand customers better
and take necessary actions for
better experience.
© 2019 Chegg, Inc. / All Rights Reserved
Natural Language Processing :
How?
Build a Machine Learning
Pipeline to infer properties of
the text that will solve a
particular problem.
7
What is
Chegg?
The	Chegg logo	is	a	registered	trademark	of	Chegg,	Inc. All	other	trademarks	are	owned	by	
their	respective	owners.
• Chegg is a student first learning
platform.
• Multiple services: question answering,
online tutoring, flashcards, writing,
math solver, internships, etc.
• Content drives product.
© 2019 Chegg, Inc. / All Rights Reserved
8
Chegg Study
© 2019 Chegg, Inc. / All Rights Reserved
9
Chegg Study
© 2019 Chegg, Inc. / All Rights Reserved
NLP Goal: Create a Knowledge Base
What is a knowledgebase?
All Content
Algebra Physics Statistics Mechanical Eng Accounting ….
Several Tens of Subjects
© 2019 Chegg, Inc. / All Rights Reserved
NLP Goal: Create a Knowledge Base
What is a knowledgebase?
Statistics
Probability Testing Regression
Discrete
PDs
Continuous
PD’s
Sampling
Estimation Hypothesis Testing Regression
Binomial
Normal
Probability
© 2019 Chegg, Inc. / All Rights Reserved
Building a Machine Learning Pipeline for NLP
• Classification using tfidf
• Weak Supervision
• Transfer learning techniques
• Thresholding
• Active Learning
© 2019 Chegg, Inc. / All Rights Reserved
Machine Learning needs huge number of examples
This is expensive
1. Collecting Data 2. Gathering labelled data 3. Feature Engineering 4. Fit a model
Deep learning replaces
feature engineering !!
However, DL requires huge
amounts of data.
© 2019 Chegg, Inc. / All Rights Reserved
Machine Learning needs huge number of examples
This is expensive
1. Collecting Data 2. Gathering labelled data 3. Feature Engineering 4. Fit a model
Deep learning replaces
feature engineering !!
However, DL requires huge
amounts of data.
© 2019 Chegg, Inc. / All Rights Reserved
Weak Supervision
© 2019 Chegg, Inc. / All Rights Reserved
What is Weak Supervision?
Using noisy sources of truth to generate training data.
• Rules/heuristics
• Constraints
• Invariances
• Existing knowledge
• Cheaper sources of Labels
• Pre-trained model
• Get labels from the
customer
© 2019 Chegg, Inc. / All Rights Reserved
17
https://hazyresearch.github.io/snorkel/
A System for Fast Training Data Creation
• Weak supervision augments manual
generation of labelled data.
• Fast adoption and user friendly interface.
Snorkel – packaging Weak supervision tools
© 2019 Chegg, Inc. / All Rights Reserved
18
Weak Supervision Pipeline using Snorkel
Labeling Functions:
o 𝜆1: Does the word
’wife’ occur between
names?
o 𝜆2: Is this a name in
the dictionary of
couples?
o 𝜆3: Does the word
‘spouse’ occur in the
sentence?
• Write multiple functions that can label data
• The functions encode one of the weak
supervision techniques
External
Knowledge
Sources
Patterns and
Dictionaries
Domain
heuristics
SMEs (Subject
Matter Experts)
providing
valuable inputs
© 2019 Chegg, Inc. / All Rights Reserved
19
Weak Supervision Pipeline using Snorkel
1. David and his wife Mel boarded the
flight to Adelaide.
2. Former US President Barack Obama
and the former first lady Michelle
Obama waved to the crowds in Ohio.
Document Parsing
Sentence
Phrases/n-grams
Labeling Functions:
o 𝜆1: Does the word
’wife’ occur between
names?
o 𝜆2: Is this a name in
the dictionary of
couples?
o 𝜆3: Does the word
‘spouse’ occur in the
sentence?
External
Knowledge
Sources
Patterns and
Dictionaries
Domain
heuristics
SMEs (Subject
Matter Experts)
providing
valuable inputs
Input Data
© 2019 Chegg, Inc. / All Rights Reserved
Weak Supervision Pipeline using Snorkel
• Labeling Functions have different latent accuracies
• We want to learn these accuracies without using
labeled data.
• Essentially, compare agreements and disagreements.
© 2019 Chegg, Inc. / All Rights Reserved
Weak Supervision Pipeline using Snorkel
• Build supervised models using probabilistic training labels
• Increase coverage
• Improved number of features
Ratner et. al., 2016
© 2019 Chegg, Inc. / All Rights Reserved
Weak Supervision Steps
v
Prove that a triangle is equilateral
Question word preceding a Keyword
Keyword is contained in known Algebra: triangles
database
Concept: Algebra: triangles✓
Broad Stroke Filtering Rules
Language Pattern Recognition: Using part of
speech tags to extract keywords.
‘symmetric matrices’, ‘real number’
‘JJ NNS’
JJ = Noun, singular
NNS = Noun, plural
© 2019 Chegg, Inc. / All Rights Reserved
How do the rules work
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training set
1: Class 1,document belongs to Algebra: triangles
0: unlabelled data
-1: Class 2, document does not belong to Algebra: triangles.
Rule 1: All documents with words/phrases contained in
Algebra: triangles database preceded by question
words or phrases (ex: prove, calculate, etc) fall under the
category of Algebra: triangles
Rule 2: If words or phrases have appeared only once in
your corpus it is not Algebra: triangles.
© 2019 Chegg, Inc. / All Rights Reserved
How do the rules work?
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training set
Rule 1: All documents with words/phrases contained in
Algebra: triangles database preceeded by question
words or phrases (ex: prove, calculate, etc) fall under the
category of Algebra: triangles
Rule 2: If words or phrases have appeared only once in
your corpus it is not Algebra: triangles.
Rule 3: All documents with words/phrases not in Algebra:
database do not fall under the category of Algebra:
triangles
Rule 4: All documents containing> 4 keywords/phrases
from Algebra: triangles database belong to Algebra:
triangles.
1: Class 1,document belongs to Algebra: triangles
0: unlabelled data
-1: Class 2, document does not belong to Algebra: triangles.
© 2019 Chegg, Inc. / All Rights Reserved
Transfer Learning
© 2019 Chegg, Inc. / All Rights Reserved
• Embeddings/vectors
• LSTM: Long Short term memory. Ideal for processing sequence of data such as
text
• CNN: Convolution Neural Networks are regularized versions of multilayer
perceptrons.
• Transformer: Sequence-to-Sequence (or Seq2Seq) is a neural net that
transforms a given sequence of elements, such as the sequence of words in a
sentence, into another sequence.
• Softmax: Softmax is often used in neural networks, to map the non-normalized
output of a network to a probability distribution over predicted output classes.
Some Deep Learning Terms
© 2019 Chegg, Inc. / All Rights Reserved
What is transfer learning?
Picture	from	Orielly
Transfer learning is the ability to learn a new task
from fewer examples with the knowledge from a
correlated task that has already been learned.
Humans are very good at transfer learning.
• Know to code in C++. Learn to code in python
• Get a PhD in physics. Learn to do machine
learning
• Know to play classic piano. Learn to play jazz
piano.
Language itself has patterns and coherence. Language Models(LM) learns from them to create embeddings/vectors.
© 2019 Chegg, Inc. / All Rights Reserved
Word2vec
The cat sat on the mat
Proposed in 2013 as an
approximation to language
modeling
vec(king) + vec(queen) - vec(man) = vec(woman)
© 2019 Chegg, Inc. / All Rights Reserved
Word2vec
The cat sat on the mat
Proposed in 2013 as an
approximation to language
modeling
© 2019 Chegg, Inc. / All Rights Reserved
Sentence Embeddings
Language Model: Given a sentence, predicting the next word
Embed
LSTM
Softmax
Predict
© 2019 Chegg, Inc. / All Rights Reserved
Character Embeddings:
31
The broadway play premiered yesterday
The broadway play premiered yesterday
Softmax
Concatenation of character Embeddings
] Convolution Layer with Multiple Filters
] Max over time pooling layer
Cross Entropy between next word and prediction
Given a word, predicting the next word
© 2019 Chegg, Inc. / All Rights Reserved
• Task 1: Given a question predict the answer.
• Task 2: Given the front of a flash card predict the back of a flash card.
• Task 3: Predict the concept/topic associated with the content.
• Task 4: Predict the the course of a piece of content.
• Task 5: Predict the subject of a piece of content.
Domain Specific Optimizations
© 2019 Chegg, Inc. / All Rights Reserved
33
Open Source: Puppets on Sesame Street
ELMo: Embeddings
from Language
Models : Bi-directional
LSTM
BERT: Bidirectional
Transformer.
GPT-2: Left to Right
transformer models.
Contextualized word embeddings.
The tree was burned to a crisp. The morning air is crisp.
Code publicly available: Allennlp:
• pytorch
• Tensorflow
• keras
Best paper at NAACL in June 2018
Trained model publicly
available. Released code
helps you predict the next
word given a sentence.
Code publicly available: google
• Tensorflow
• Pytorch
• keras
Released towards the end of 2018. Several releases, Latest ~ March
Others: ULMfit, Fasttext, etc
© 2019 Chegg, Inc. / All Rights Reserved
Open Source vs In house Embeddings
Open Source In house
• Low barrier to starting a project.
• Does not require deep learning
knowledge.
• Works well for generic tasks such as
sentiment analysis, news category
detection, etc.
Download à Concatenate
with existing
features
à Classify Build Embeddings:
Deep Learning
Language Model,
Domain specific
optimization
à
Concatenate
with existing
features
à Classify
• Barrier to starting a project is higher.
• Requires deep learning expertise to build the
embeddings.
• The embeddings would be optimized for the
domains such as education, healthcare etc.
© 2019 Chegg, Inc. / All Rights Reserved
Thresholding
Probability
0 10.5
0.70.3
• You want your model to be correct.
• It is possible improve the
percentage of correct results at the
cost of coverage.
© 2019 Chegg, Inc. / All Rights Reserved
Active Learning
1. Train a classifier and
predict on unseen data.
2. Evaluate points close to
the decision boundary
3. Collect SME annotations
on these points and add
them to the training set
Advantages:
• Requires less training Data.
• Strategizing can lead to higher coverage of
the model space
Dis-advantages:
• Uncertainty sampling is noise
seeking: this can lead to fitting the
noise in the data.
• Outliers: Getting label on outliers
does not improve the models
(there are techniques to avoid this
issue).
© 2019 Chegg, Inc. / All Rights Reserved
Machine Learning Pipline
Produce training data with
Weak Supervision
Feature Generation with
Transfer Learning
Supervised
Learning:
Active Learning:
gather more training
data
Populate database with inferred tags
from model
Prob>threshold
Yes:
prediction
accepted
No
Goal: Concept classification
© 2019 Chegg, Inc. / All Rights Reserved
Collaboration
Product/Businesss
Content team
Iterate
• define Filters
• define rules
• Explain model
performance
What is the
use case?
Product Integration
Collaboration
© 2019 Chegg, Inc. / All Rights Reserved
• Routing appropriate topics to relevant experts for answering
questions
• Recommend appropriate topics to a student for practicing
before an exam
• Determine topics with highest demand to students and
improve or create content for these topics
• Connect different products based on topic similarity
Applications
© 2019 Chegg, Inc. / All Rights Reserved
In conclusion …
There is infinite amount of content
Language itself has logic, coherence and meaning.
Human curated examples for training models are expensive.
Be smart about collecting examples and working with small amounts of example.
Tackling new use cases less expensive.
Facilitate automation
© 2019 Chegg, Inc. / All Rights Reserved
Questions
Sanghamitra Deb
@sangha_deb,
sdeb@chegg.com
© 2019 Chegg, Inc. / All Rights Reserved
Bayesian Nonparametric Crowdsourcing , Moreno et al. 2014
Weakly supervised classification of rare aortic valve malformations using
unlabeled cardiac MRI sequences, Fries et al. 2018
Weak Supervision: The New Programming Paradigm for Machine Learning,
Ratner et al. 2017
Character-Aware Neural Language Models, Kim et al 2015
Deep contextualized word representations, Peters et al 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding, Devlin et al 2018
Distributed Representations of Words and Phrases and their Compositionality,
Mikolov et al 2013
References

More Related Content

What's hot

Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech taggersadakpramodh
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project PresentationAryak Sengupta
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!taeseon ryu
 
Memory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringMemory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringAkram El-Korashy
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentationBushra Jbawi
 
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsAseda Owusua Addai-Deseh
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman
 
Nurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande Loo
Nurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande LooNurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande Loo
Nurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande LooInformation Development World
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentationSurya Sg
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 

What's hot (20)

Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Umap v1
Umap v1Umap v1
Umap v1
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech tagger
 
First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
 
Memory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringMemory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question Answering
 
Ire major project
Ire major projectIre major project
Ire major project
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Nurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande Loo
Nurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande LooNurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande Loo
Nurse Avatar Delivers Mayo Clinic Symptom Care Advice with Sarah Vande Loo
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 

Similar to NLP and ML for Non-Experts: An Introduction

A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...Neo4j
 
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and ConfigurationsInclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and ConfigurationsAnne Gentle
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
When Deep Learning Meets Recommender System
When Deep Learning Meets Recommender SystemWhen Deep Learning Meets Recommender System
When Deep Learning Meets Recommender SystemAsi Messica
 
Building Large Sustainable Apps
Building Large Sustainable AppsBuilding Large Sustainable Apps
Building Large Sustainable AppsBuğra Oral
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCVApache MXNet
 
Node.js Deeper Dive
Node.js Deeper DiveNode.js Deeper Dive
Node.js Deeper DiveJustin Reock
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningYogesh Sharma
 
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoTTop 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoTAmazon Web Services
 
2019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v22019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v2Tao Wang
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search overIEEEFINALSEMSTUDENTPROJECTS
 
Thinking Strategically About Content Destined for Machine Translation
Thinking Strategically About Content Destined for Machine TranslationThinking Strategically About Content Destined for Machine Translation
Thinking Strategically About Content Destined for Machine TranslationContent Rules, Inc.
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEEMEMTECHSTUDENTPROJECTS
 
Week07
Week07Week07
Week07hccit
 
Resume_Harikrishna_Tekkam (1)
Resume_Harikrishna_Tekkam (1)Resume_Harikrishna_Tekkam (1)
Resume_Harikrishna_Tekkam (1)Harikrishna Snr
 

Similar to NLP and ML for Non-Experts: An Introduction (20)

Using weak supervision and transfer learning techniques to build knowledge gr...
Using weak supervision and transfer learning techniques to build knowledge gr...Using weak supervision and transfer learning techniques to build knowledge gr...
Using weak supervision and transfer learning techniques to build knowledge gr...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and ConfigurationsInclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
When Deep Learning Meets Recommender System
When Deep Learning Meets Recommender SystemWhen Deep Learning Meets Recommender System
When Deep Learning Meets Recommender System
 
Building Large Sustainable Apps
Building Large Sustainable AppsBuilding Large Sustainable Apps
Building Large Sustainable Apps
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCV
 
Node.js Deeper Dive
Node.js Deeper DiveNode.js Deeper Dive
Node.js Deeper Dive
 
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptxEIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Deep Learning Workshop
Deep Learning WorkshopDeep Learning Workshop
Deep Learning Workshop
 
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoTTop 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
 
2019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v22019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v2
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
 
Thinking Strategically About Content Destined for Machine Translation
Thinking Strategically About Content Destined for Machine TranslationThinking Strategically About Content Destined for Machine Translation
Thinking Strategically About Content Destined for Machine Translation
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
 
Week07
Week07Week07
Week07
 
Resume_Harikrishna_Tekkam (1)
Resume_Harikrishna_Tekkam (1)Resume_Harikrishna_Tekkam (1)
Resume_Harikrishna_Tekkam (1)
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (14)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 

NLP and ML for Non-Experts: An Introduction

  • 1. © 2019 Chegg, Inc. / All Rights Reserved Natural Language Processing and Machine Learning for non - experts. May 7 Sanghamitra Deb @sangha_deb, sdeb@chegg.com
  • 2. © 2019 Chegg, Inc. / All Rights Reserved Zoha Zargham Sakshi Bhargava Priya Venkat Collaborators
  • 3. © 2019 Chegg, Inc. / All Rights Reserved Natural Language Processing • Giving structure to unstructured data. • Learn properties of the data that makes decision making simple. • Provide concise information to drive intelligence of different systems.
  • 4. © 2019 Chegg, Inc. / All Rights Reserved Natural Language Processing All industries generate content. Healthcare Finance Retail Legal Education Marketing Real Estate Social Media Academics
  • 5. © 2019 Chegg, Inc. / All Rights Reserved Natural Language Processing : Why? • Unstructured data cannot be consumed directly. • Automate simple and complex functionalities. • Users can query text data and generate BU reports. • Understand customers better and take necessary actions for better experience.
  • 6. © 2019 Chegg, Inc. / All Rights Reserved Natural Language Processing : How? Build a Machine Learning Pipeline to infer properties of the text that will solve a particular problem.
  • 7. 7 What is Chegg? The Chegg logo is a registered trademark of Chegg, Inc. All other trademarks are owned by their respective owners. • Chegg is a student first learning platform. • Multiple services: question answering, online tutoring, flashcards, writing, math solver, internships, etc. • Content drives product.
  • 8. © 2019 Chegg, Inc. / All Rights Reserved 8 Chegg Study
  • 9. © 2019 Chegg, Inc. / All Rights Reserved 9 Chegg Study
  • 10. © 2019 Chegg, Inc. / All Rights Reserved NLP Goal: Create a Knowledge Base What is a knowledgebase? All Content Algebra Physics Statistics Mechanical Eng Accounting …. Several Tens of Subjects
  • 11. © 2019 Chegg, Inc. / All Rights Reserved NLP Goal: Create a Knowledge Base What is a knowledgebase? Statistics Probability Testing Regression Discrete PDs Continuous PD’s Sampling Estimation Hypothesis Testing Regression Binomial Normal Probability
  • 12. © 2019 Chegg, Inc. / All Rights Reserved Building a Machine Learning Pipeline for NLP • Classification using tfidf • Weak Supervision • Transfer learning techniques • Thresholding • Active Learning
  • 13. © 2019 Chegg, Inc. / All Rights Reserved Machine Learning needs huge number of examples This is expensive 1. Collecting Data 2. Gathering labelled data 3. Feature Engineering 4. Fit a model Deep learning replaces feature engineering !! However, DL requires huge amounts of data.
  • 14. © 2019 Chegg, Inc. / All Rights Reserved Machine Learning needs huge number of examples This is expensive 1. Collecting Data 2. Gathering labelled data 3. Feature Engineering 4. Fit a model Deep learning replaces feature engineering !! However, DL requires huge amounts of data.
  • 15. © 2019 Chegg, Inc. / All Rights Reserved Weak Supervision
  • 16. © 2019 Chegg, Inc. / All Rights Reserved What is Weak Supervision? Using noisy sources of truth to generate training data. • Rules/heuristics • Constraints • Invariances • Existing knowledge • Cheaper sources of Labels • Pre-trained model • Get labels from the customer
  • 17. © 2019 Chegg, Inc. / All Rights Reserved 17 https://hazyresearch.github.io/snorkel/ A System for Fast Training Data Creation • Weak supervision augments manual generation of labelled data. • Fast adoption and user friendly interface. Snorkel – packaging Weak supervision tools
  • 18. © 2019 Chegg, Inc. / All Rights Reserved 18 Weak Supervision Pipeline using Snorkel Labeling Functions: o 𝜆1: Does the word ’wife’ occur between names? o 𝜆2: Is this a name in the dictionary of couples? o 𝜆3: Does the word ‘spouse’ occur in the sentence? • Write multiple functions that can label data • The functions encode one of the weak supervision techniques External Knowledge Sources Patterns and Dictionaries Domain heuristics SMEs (Subject Matter Experts) providing valuable inputs
  • 19. © 2019 Chegg, Inc. / All Rights Reserved 19 Weak Supervision Pipeline using Snorkel 1. David and his wife Mel boarded the flight to Adelaide. 2. Former US President Barack Obama and the former first lady Michelle Obama waved to the crowds in Ohio. Document Parsing Sentence Phrases/n-grams Labeling Functions: o 𝜆1: Does the word ’wife’ occur between names? o 𝜆2: Is this a name in the dictionary of couples? o 𝜆3: Does the word ‘spouse’ occur in the sentence? External Knowledge Sources Patterns and Dictionaries Domain heuristics SMEs (Subject Matter Experts) providing valuable inputs Input Data
  • 20. © 2019 Chegg, Inc. / All Rights Reserved Weak Supervision Pipeline using Snorkel • Labeling Functions have different latent accuracies • We want to learn these accuracies without using labeled data. • Essentially, compare agreements and disagreements.
  • 21. © 2019 Chegg, Inc. / All Rights Reserved Weak Supervision Pipeline using Snorkel • Build supervised models using probabilistic training labels • Increase coverage • Improved number of features Ratner et. al., 2016
  • 22. © 2019 Chegg, Inc. / All Rights Reserved Weak Supervision Steps v Prove that a triangle is equilateral Question word preceding a Keyword Keyword is contained in known Algebra: triangles database Concept: Algebra: triangles✓ Broad Stroke Filtering Rules Language Pattern Recognition: Using part of speech tags to extract keywords. ‘symmetric matrices’, ‘real number’ ‘JJ NNS’ JJ = Noun, singular NNS = Noun, plural
  • 23. © 2019 Chegg, Inc. / All Rights Reserved How do the rules work 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set 1: Class 1,document belongs to Algebra: triangles 0: unlabelled data -1: Class 2, document does not belong to Algebra: triangles. Rule 1: All documents with words/phrases contained in Algebra: triangles database preceded by question words or phrases (ex: prove, calculate, etc) fall under the category of Algebra: triangles Rule 2: If words or phrases have appeared only once in your corpus it is not Algebra: triangles.
  • 24. © 2019 Chegg, Inc. / All Rights Reserved How do the rules work? 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set Rule 1: All documents with words/phrases contained in Algebra: triangles database preceeded by question words or phrases (ex: prove, calculate, etc) fall under the category of Algebra: triangles Rule 2: If words or phrases have appeared only once in your corpus it is not Algebra: triangles. Rule 3: All documents with words/phrases not in Algebra: database do not fall under the category of Algebra: triangles Rule 4: All documents containing> 4 keywords/phrases from Algebra: triangles database belong to Algebra: triangles. 1: Class 1,document belongs to Algebra: triangles 0: unlabelled data -1: Class 2, document does not belong to Algebra: triangles.
  • 25. © 2019 Chegg, Inc. / All Rights Reserved Transfer Learning
  • 26. © 2019 Chegg, Inc. / All Rights Reserved • Embeddings/vectors • LSTM: Long Short term memory. Ideal for processing sequence of data such as text • CNN: Convolution Neural Networks are regularized versions of multilayer perceptrons. • Transformer: Sequence-to-Sequence (or Seq2Seq) is a neural net that transforms a given sequence of elements, such as the sequence of words in a sentence, into another sequence. • Softmax: Softmax is often used in neural networks, to map the non-normalized output of a network to a probability distribution over predicted output classes. Some Deep Learning Terms
  • 27. © 2019 Chegg, Inc. / All Rights Reserved What is transfer learning? Picture from Orielly Transfer learning is the ability to learn a new task from fewer examples with the knowledge from a correlated task that has already been learned. Humans are very good at transfer learning. • Know to code in C++. Learn to code in python • Get a PhD in physics. Learn to do machine learning • Know to play classic piano. Learn to play jazz piano. Language itself has patterns and coherence. Language Models(LM) learns from them to create embeddings/vectors.
  • 28. © 2019 Chegg, Inc. / All Rights Reserved Word2vec The cat sat on the mat Proposed in 2013 as an approximation to language modeling vec(king) + vec(queen) - vec(man) = vec(woman)
  • 29. © 2019 Chegg, Inc. / All Rights Reserved Word2vec The cat sat on the mat Proposed in 2013 as an approximation to language modeling
  • 30. © 2019 Chegg, Inc. / All Rights Reserved Sentence Embeddings Language Model: Given a sentence, predicting the next word Embed LSTM Softmax Predict
  • 31. © 2019 Chegg, Inc. / All Rights Reserved Character Embeddings: 31 The broadway play premiered yesterday The broadway play premiered yesterday Softmax Concatenation of character Embeddings ] Convolution Layer with Multiple Filters ] Max over time pooling layer Cross Entropy between next word and prediction Given a word, predicting the next word
  • 32. © 2019 Chegg, Inc. / All Rights Reserved • Task 1: Given a question predict the answer. • Task 2: Given the front of a flash card predict the back of a flash card. • Task 3: Predict the concept/topic associated with the content. • Task 4: Predict the the course of a piece of content. • Task 5: Predict the subject of a piece of content. Domain Specific Optimizations
  • 33. © 2019 Chegg, Inc. / All Rights Reserved 33 Open Source: Puppets on Sesame Street ELMo: Embeddings from Language Models : Bi-directional LSTM BERT: Bidirectional Transformer. GPT-2: Left to Right transformer models. Contextualized word embeddings. The tree was burned to a crisp. The morning air is crisp. Code publicly available: Allennlp: • pytorch • Tensorflow • keras Best paper at NAACL in June 2018 Trained model publicly available. Released code helps you predict the next word given a sentence. Code publicly available: google • Tensorflow • Pytorch • keras Released towards the end of 2018. Several releases, Latest ~ March Others: ULMfit, Fasttext, etc
  • 34. © 2019 Chegg, Inc. / All Rights Reserved Open Source vs In house Embeddings Open Source In house • Low barrier to starting a project. • Does not require deep learning knowledge. • Works well for generic tasks such as sentiment analysis, news category detection, etc. Download à Concatenate with existing features à Classify Build Embeddings: Deep Learning Language Model, Domain specific optimization à Concatenate with existing features à Classify • Barrier to starting a project is higher. • Requires deep learning expertise to build the embeddings. • The embeddings would be optimized for the domains such as education, healthcare etc.
  • 35. © 2019 Chegg, Inc. / All Rights Reserved Thresholding Probability 0 10.5 0.70.3 • You want your model to be correct. • It is possible improve the percentage of correct results at the cost of coverage.
  • 36. © 2019 Chegg, Inc. / All Rights Reserved Active Learning 1. Train a classifier and predict on unseen data. 2. Evaluate points close to the decision boundary 3. Collect SME annotations on these points and add them to the training set Advantages: • Requires less training Data. • Strategizing can lead to higher coverage of the model space Dis-advantages: • Uncertainty sampling is noise seeking: this can lead to fitting the noise in the data. • Outliers: Getting label on outliers does not improve the models (there are techniques to avoid this issue).
  • 37. © 2019 Chegg, Inc. / All Rights Reserved Machine Learning Pipline Produce training data with Weak Supervision Feature Generation with Transfer Learning Supervised Learning: Active Learning: gather more training data Populate database with inferred tags from model Prob>threshold Yes: prediction accepted No Goal: Concept classification
  • 38. © 2019 Chegg, Inc. / All Rights Reserved Collaboration Product/Businesss Content team Iterate • define Filters • define rules • Explain model performance What is the use case? Product Integration Collaboration
  • 39. © 2019 Chegg, Inc. / All Rights Reserved • Routing appropriate topics to relevant experts for answering questions • Recommend appropriate topics to a student for practicing before an exam • Determine topics with highest demand to students and improve or create content for these topics • Connect different products based on topic similarity Applications
  • 40. © 2019 Chegg, Inc. / All Rights Reserved In conclusion … There is infinite amount of content Language itself has logic, coherence and meaning. Human curated examples for training models are expensive. Be smart about collecting examples and working with small amounts of example. Tackling new use cases less expensive. Facilitate automation
  • 41. © 2019 Chegg, Inc. / All Rights Reserved Questions Sanghamitra Deb @sangha_deb, sdeb@chegg.com
  • 42. © 2019 Chegg, Inc. / All Rights Reserved Bayesian Nonparametric Crowdsourcing , Moreno et al. 2014 Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac MRI sequences, Fries et al. 2018 Weak Supervision: The New Programming Paradigm for Machine Learning, Ratner et al. 2017 Character-Aware Neural Language Models, Kim et al 2015 Deep contextualized word representations, Peters et al 2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al 2018 Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al 2013 References