SlideShare a Scribd company logo
1 of 53
Download to read offline
Confidential Material – Chegg Inc. © 2005 - 2016. All Rights Reserved.Chegg Inc. © 2018 by Chegg Inc. All Rights Reserved. 1
Democratizing NLP content modeling with
transfer learning using GPUs
Sanghamitra	Deb
Staff	Data	Scientist
Email:	sdeb@chegg.com
Twitter:	@sangha_deb
2
What	is		
Chegg?
Example Slide
Chegg Inc. © 2018. All Rights Reserved.3
Chegg
Example Slide
Chegg Inc. © 2018. All Rights Reserved.4
Chegg Tutors?
Example Slide
Chegg Inc. © 2018. All Rights Reserved.5
Flashtools
Metaphase:	
The	chromosomes	line	up	at	equator,	
centriole	fibers	attactch to	
centromeres	(where	the	chromatids	
are	joined	to	each	other)
Front	
Back
Flashcards
Example Slide
Chegg Inc. © 2018. All Rights Reserved.6
Chegg Study
Example Slide
Chegg Inc. © 2018. All Rights Reserved.7
Chegg Study
Example Slide
Chegg Inc. © 2018. All Rights Reserved.8
Chegg Study
Example Slide
Chegg Inc. © 2018. All Rights Reserved.9
Chegg Study
Example Slide
Chegg Inc. © 2018. All Rights Reserved.
• Democratizing NLP : what does it mean?
• Transfer Learning
• Word2vec
• Sentence and character embeddings
• Word embeddings in context
• Applications
• Knowledgebase creation
• Unique concepts …
10
Overview
Example Slide
Chegg Inc. © 2018. All Rights Reserved.11
Democratizing NLP with transfer learning
• Giving	structure	to	unstructured	data
• Data	analysts	should	be	able	to	query	language	data	and	get	insights.
• Machine	Learning	practitioners	with	no	knowledge	of	language	should	be	
able	to	use	features	from	the	NLP	data.
Why	Transfer	Learning?
• Converts	text	into	vectors	thereby	giving	structure.
• Transfer	learning	can	be	used	to	solve	problems	such	as	text	summarization,	
entity	recognition,	tagging,	keyword	extraction	and	other	downstream	
classification	task.	This	structured	data	can	be	queried	for	further	insights.
• The	result	of	transfer	learning	is	typically	feature-ization of	text	data	at	
document,	word	or	sentence	level.
Example Slide
Chegg Inc. © 2018. All Rights Reserved.12
Traditional NLP Pipeline
1.	Collecting	Data																			2.	Gathering	labelled	data														3.	Feature	Engineering																										 4. Fit	a	model
Example Slide
Chegg Inc. © 2018. All Rights Reserved.13
Traditional Machine Learning Pipeline
1.	Collecting	Data																			2.	Gathering	labelled	data														3.	Feature	Engineering																										 4. Fit	a	model																					
Deep	learning	replaces	
feature	engineering	!!	
However,	DL	requires	
huge	amounts	of	data.
Example Slide
Chegg Inc. © 2018. All Rights Reserved.14
What is transfer learning?
Traditional	Machine	Learning
Task	1,	
domain	1
Model	1
Results
Task	2,	
domain	2
Model	2
Results
All	models	are	
task/domain	specific
Example Slide
Chegg Inc. © 2018. All Rights Reserved.15
What is transfer learning?
Traditional	Machine	Learning
Source	
Task/dom
ain
Model	1
Knowledge
Task	2,	
domain	2
Model	2All	models	are	
task/domain	specific
Sentence	Vectors	,	word	vectors
Example Slide
Chegg Inc. © 2018. All Rights Reserved.16
Classic Example: Computer Vision
Pretrained ImageNet models have
been used to achieve state-of-the-art
results in tasks such as object
detection
, semantic segmentation
, human pose estimation
, and video recognition
. At the same time, they have enabled
the application of CV to domains
where the number of training
examples is small and annotation is
expensive.
Example Slide
Chegg Inc. © 2018. All Rights Reserved.17
Transfer Learning in NLP: Word2vec
Proposed	in	2013	by	mikolov as	an	
approximation	to	language	modeling
The cat sat on the mat
Glove,	FastText
Example Slide
Chegg Inc. © 2018. All Rights Reserved.18
Transfer Learning in NLP: Word2vec
Proposed	in	2013	as	an	
approximation	to	language	modeling
The cat sat on the mat
Large	corpus	of	text	~	say	10000	words
• 10000	dim	one	hot	vector
• Interface	it	with	300	node	hidden	layer	
(weights	connecting	this	layer	:	
wordvectors)
• Activations:	linear	summations	of	
weighted	inputs
• Nodes	are	fed	into	softmax
• During	training	the	weights	are	changed	
such	that	words	surrounding	cat	have	a	
higher	probability.
Glove,	FastText
Download	google	pre-trained	vectors.	Or
Example Slide
Chegg Inc. © 2018. All Rights Reserved.19
Transfer Learning in NLP: Word2vec
Proposed	in	2013	as	an	
approximation	to	language	modeling
The cat sat on the mat
Glove,	FastText
Example Slide
Chegg Inc. © 2018. All Rights Reserved.20
Character Embedding
The				broadway play					premiered				yesterday
https://arxiv.org/abs/1508.06615
The				broadway play					premiered				yesterday
Softmax
Concatenation	of	character	Embeddings
] Convolution	Layer	with	Multiple	Filters
] Max	over	time	pooling	layer
Cross	Entropy	between	next	word	and	prediction
Example Slide
Chegg Inc. © 2018. All Rights Reserved.21
Why are character embeddings important?
• Input	Layer	is	simplified,	instead	of	10000	sized	one	hot	vector,	we	
have	an	input	of	size<100.
• At	Chegg we	have	a	lot	of	free	form	raw	text	that	are	input	by	
students.
• Spelling	mistakes,	student	language,	etc.
• The	vocabulary	can	have	a	lot	of	variation
• Student	linguistics	evolve	fast	with	time
• We	have	a	wide	range	of	subjects	with	different	symbols	(for	
example:	greek letters	are	common	in	physics	math	and	other	
stem	subjects)
Example Slide
Chegg Inc. © 2018. All Rights Reserved.22
Sentence Embeddings
Autoencoders
Language	Models
Skip	Thought	Vectors
Example Slide
Chegg Inc. © 2018. All Rights Reserved.23
Sentence Embeddings: AutoEncoders/Language Models
….............
150
20
20
Softmax
….............
Next	word
After	learning	this	128	
dimensional	state	
represents	the	
sentence	vector	
Auto-encoders
Language-Models
Example Slide
Chegg Inc. © 2018. All Rights Reserved.24
Word Embeddings in Context
https://www.gocomics.com/frazz/
In	a	context	free	embedding	”crisp”	in	sentence	“The	morning	air	is	getting	crisp”	and	“getting	burned	to	a	crisp”	would	
have	the	same	vector:																																																										f(crisp)
In	a	context	aware	model	the	embedding	would	be	specific	to	the	would	be	augmented	by	the	context	in	which	it	appears.
f(crisp,	context)
Example Slide
Chegg Inc. © 2018. All Rights Reserved.25
Word Embeddings in Context : Elmo
Best	paper	at	NAACL	2018
Code	publicly	available:
• Allennlp – pytorch
• Tensorflow
• Keras
• Chainer
https://allennlp.org/elmo
Example Slide
Chegg Inc. © 2018. All Rights Reserved.26
How does it work?
The				broadway play					premiered				yesterday
https://arxiv.org/abs/1508.06615
The				broadway play					premiered				yesterday
Softmax
Concatenation	of	character	Embeddings
] Convolution	Layer	with	Multiple	Filters
] Max	over	time	pooling	layer
Cross	Entropy	between	next	word	and	prediction
Input	to	Elmo
Example Slide
Chegg Inc. © 2018. All Rights Reserved.27
Elmo: How does it work?
Bi-directional	Language	Model
ELMo is	a	task	specific	combination	of	the	intermediate	layer	
representations	in	the	biLM.
Example Slide
Chegg Inc. © 2018. All Rights Reserved.28
What is the output?
Example Slide
Chegg Inc. © 2018. All Rights Reserved.29
What is the output?
Example Slide
Chegg Inc. © 2018. All Rights Reserved.30
Transfer Learning Pipeline: Then
Example Slide
Chegg Inc. © 2018. All Rights Reserved.31
Transfer Learning Pipeline: Now
Finetuning Aggregating
Use	Neural	Architecture	to	
fine	tune	the	vectors	from	
each	layer,	this	is	common	
practice	in	computer	vision
Use	any	technique	to	
aggregate	all	the	layers
Example Slide
Chegg Inc. © 2018. All Rights Reserved.32
Transfer Learning at a glance
Training
Classifier:	The	embeddings
contain	direct	signals	from	
labels.	The	labels	should	be	
related	to	the	task	at	hand	for	
maximum	data	efficiency.
Language	model:	The	
embeddings are	learned	from	
any	text,	but	no	signals	from	
labels.
Machine	Translation:	The	
embeddings are	learned	from	
surprisingly	copious	
translation	data.	The	idea	is	
that	if	the	embeddings can	
translate	to	a	foreign	
language,	it	can	translate	to	a	
classifier	task	as	well.
Neural	Architectures
RNN:	Model	with	infinite	context,	
but	can’t	parallelize	the	
computation.
CNN:	Model	with	local	context	
that	is	highly	parallelizable	but	can	
increase	the	context	 by	stacking	
them	deeply	and	add	either	
dilated	or	separable	convolutions.
Transformer:	uses	self-attention	
and	positional	encoding.	Good	for	
small	documents.
Mean/Max	Pool:	A	simple	averaging	/	
maxing	of	all	the	context	vectors	will	
give	you	reasonable	results.
Last	Vector:	If	the	model	aggregates	
information	to	the	last	vector,	you	can	
simply	pop	off	the	last	vector	as	the	
document	embeddings.
Attention:	Attention	dynamically	
allocates	the	importances of	the	
context	embeddings before	averaging	
to	a	document	embedding.
Aggregation
Example Slide
Chegg Inc. © 2018. All Rights Reserved.33
Transfer Learning at a glance
Training
Classifier:	The	embeddings
contain	direct	signals	from	
labels.	The	labels	should	be	
related	to	the	task	at	hand	for	
maximum	data	efficiency.
Language	model:	The	
embeddings are	learned	from	
any	text,	but	no	signals	from	
labels.
Machine	Translation:	The	
embeddings are	learned	from	
surprisingly	copious	
translation	data.	The	idea	is	
that	if	the	embeddings can	
translate	to	a	foreign	
language,	it	can	translate	to	a	
classifier	task	as	well.
Mean/Max	Pool:	A	simple	averaging	/	
maxing	of	all	the	context	vectors	will	
give	you	reasonable	results.
Last	Vector:	If	the	model	aggregates	
information	to	the	last	vector,	you	can	
simply	pop	off	the	last	vector	as	the	
document	embeddings.
Attention:	Attention	dynamically	
allocates	the	importances of	the	
context	embeddings before	averaging	
to	a	document	embedding.
Aggregation
Example Slide
Chegg Inc. © 2018. All Rights Reserved.34
Performance
https://indico.io/blog/more-effective-transfer-learning-for-nlp/
Example Slide
Chegg Inc. © 2018. All Rights Reserved.35
Democratization: Phase 1
• Use	the	advantages	of	deep	
learning	even	in	regimes	of	
small	data
• Feature	Engineering	is	
automated,	increasing	
performance	of	data	scientists
• This	features	can	be	used	to	
productionize	an	ML	model	by	
a	team
WIP:	DRAFT	SLIDES
Confidential Material – Chegg Inc. © 2005 – 2016. All Rights Reserved.
Applications
36c
Example Slide
Chegg Inc. © 2018. All Rights Reserved.
• Creating a knowledgebase
• Finding unique concepts.
• Word Sense Disambiguation
• Equation extraction
• Text summarization
• Creating flash cards
• Providing condensed information to tutors/experts to efficiently answer
student needs.
37
Applications
Example Slide
Chegg Inc. © 2018. All Rights Reserved.38
Creating a knowledgebase
All	Content
Algebra Physics Statistics Mechanical	Eng Accounting ….
Several	Tens	of	Subjects
Example Slide
Chegg Inc. © 2018. All Rights Reserved.39
Creating a knowledgebase
All	Content
Algebra Physics Statistics Mechanical	Eng Accounting ….
Several	Tens	of	Subjects
Example Slide
Chegg Inc. © 2018. All Rights Reserved.40
What is a knowledgebase?
Statistics
Probability Testing Regression
Probability
Discrete	
PDs
Continuous	
PD’s
Sampling
Estimation Hypothesis	Testing Regression
Binomial	
NormalBayes	Theorem
Example Slide
Chegg Inc. © 2018. All Rights Reserved.41
Classification Task
• Get	sentence	embeddings for	each	
of	the	sentences.	
• Concatenate	elmo embeddings to	
the	sentence	embeddings.
• Do	a	tfidf weighting	for	the	
sentences.
• Use	a	simple	classifier	such	as	
logistic	regression	or	SVM	(to	
ensure	scalability	in	the	production	
pipeline)	for	the	classification
Model	Building Fine	Tuning	results
• Look	at	precision,	recall	and	
f1-score
• For	specific	product	need	use	
high	precision	results	to	avoid	
false	positives	
Example
completely factor the following
expressions. 1. t 2 + 4tv + 4v 2 2. z 2 +
15z -54 3. 4x 2 - 8x -12 + 6x 4. 144 - 9p
2 5. 5c 2 - 24cd - 5d 2 6. w 2 - 17w + 42
7. 256z 2 - 4 - 192z 2 + 3 8. 2a 2 c 3 -
14bc 3 + 32c 3 d 2 9. 35g 2 + 6g - 9 10.
3j 3 - 51j 2 + 210j use factoring and
the zero- product property to solve
the following problems. 1. z(z - 1)(z +
3) = 0 2. x 2 - x - 10 = 2 3. 4a 2 - 11a +
6 = 0 4. 9r 2 - 30r + 21 = -4.
Example Slide
Chegg Inc. © 2018. All Rights Reserved.
• Classification using tfidf
• Transfer learning techniques
• Weak Supervision
• Thresholding
• Active Learning
42
Machine Learning methods used
On	average	we	could	achieve	an	average	of	>	80%	accuracy	for	all	nodes	in	the	
knowledge	tree
Example Slide
Chegg Inc. © 2018. All Rights Reserved.43
FlashCards
Identifying	unique	concepts Word	Sense	Disambiguation
• Clustering	feature	vectors	(context	words,	
sentences)	to	identify	unique	topics.	Caveat:	It	
is	typically	possible	to	extract	a	subset	of	
uniques concepts	using	this	method.
• Finding	similarities	between	flash	cards	using	
the	feature	vectors
Example:		 Cardiac	muscle ~1000	flash	cards	exist
Dominating	topics:	
1) Striated	muscle	of	the	heart
2) Propels	blood
3) …
4) …
5) Mix	bag	of	topics
Pleated	sheet	:	cloth,	curtain
Pleated	sheet	:	regular	secondary	structure	in	proteins
--- a	topic	in	bio	chemistry
Circular:		Mathmatical concept	
Circular:	a	store	advertisement
Example Slide
Chegg Inc. © 2018. All Rights Reserved.44
Equation Extraction using character embeddings
Math Data: Equation Extraction
…given by P(x) = -x 2 + 150x + 50, where x is…
…given by F(x)=0.02x2 +1.56x+9.8 where x…
Example Slide
Chegg Inc. © 2018. All Rights Reserved.
• Tagging content with nodes of the knowledge tree provides
queriable structure to unstructured text
• It is possible perform analytics on this the volume, student
engagement, dominance of different concepts at different times
of a year from this data.
• Finally the tagged contents form the basis of recommendation
systems in different parts of chegg.
45
Democratizing NLP: Phase 2
Example Slide
Chegg Inc. © 2018. All Rights Reserved.46
Conclusion
In	real	life	small	data	problems	are	more	common	
compared	to	big	data	problems.
With	transfer	learning	we	are	able	to	use	the	advantages	of	
deep	learning,	i.e replace	feature	engineering	in	domain	of	
small	data
Transfer	Learning	produces	structured	dense	features	from	
unstructured	text,	these	features	can	be	combines	with	
structured	features	(eg:	views,	clicks,	conversion)	to	produce	
more	robust	models
WIP:	DRAFT	SLIDES
Confidential Material – Chegg Inc. © 2005 – 2016. All Rights Reserved.
Questions
47c
Email:	sdeb@chegg.com
Twitter:	@sangha_deb
WIP:	DRAFT	SLIDES
Confidential Material – Chegg Inc. © 2005 – 2016. All Rights Reserved.
Appendix
48c
Example Slide
Chegg Inc. © 2018. All Rights Reserved.49
References
Character	Embeddings:
https://machinelearningmastery.com/develop-character-based-neural-language-model-keras/
https://medium.com/@surmenok/character-level-convolutional-networks-for-text-classification-d582c0c36ace
https://medium.com/@zhuixiyou/character-level-cnn-with-keras-50391c3adf33
Contextual	Word	Embeddings:
https://towardsdatascience.com/elmo-helps-to-further-improve-your-word-embeddings-c6ed2c9df95f
Attention	model
http://jalammar.github.io/illustrated-transformer/
Sentence	Embedding
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
Ulmfit
http://nlp.fast.ai/category/classification.html
Example Slide
Chegg Inc. © 2018. All Rights Reserved.50
Sentence Embeddings: Skip-grams
Sentence_Before
….............
20	dim
Sentence_Current
….............
20	dim
Senetnce_After
….............
20	dim
Given:
Predict
Predict
Example Slide
Chegg Inc. © 2018. All Rights Reserved.51
Sentence Embeddings: Skip-grams
Sentence_prev Sentence_Current Senetnce_next
…............. …............. ….............
20	dim 20	dim 20	dim
Word	Embedding	matrix	for	
previous		sentence.
150 150
Word	Embedding	matrix	for	
current		sentence.
Emb_prev Emb_curr Emb_next
Word	Embedding	matrix	for	
next		sentence.
Example Slide
Chegg Inc. © 2018. All Rights Reserved.52
Sentence Embeddings: Skip-grams, Architecture
Sentence_prev Sentence_Current :	20	
dim
Senetnce_next
…............. …............. ….............
20	dim 20	dim
Emb_curr
Encoder
Emb_prev
128
Emb_next
Softmax Softmax
Decoder_Final_
Output:	20	*	
vocab_sizeDecoder_prev
After	learning	this	128	
dimensional	state	
represents	the	
sentence	vector	
Decoder_next
128*20 128*20
Example Slide
Chegg Inc. © 2018. All Rights Reserved.
Sentence Embeddings: Skip-grams, Learning
Sentence_prev
….............
20	dim
Senetnce_next
….............
20	dim
20*2000
Decoder_Final_Output
_prev:	20	*	vocab_size
Decoder_Final_Output
_next:	20	*	vocab_size
20*2000
Loss	:sparse	categorical	
crossentropy
Loss	:sparse	categorical	
crossentropy

More Related Content

Similar to Democratizing NLP content modeling with transfer learning using GPUs

Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator ProgramGoDataDriven
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the RescueDomino Data Lab
 
Fajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMS
Fajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMSFajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMS
Fajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMSFajar Purnama
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...Amazon Web Services
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Codemotion
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
AI Powered Conversational Interfaces for Personalized Learning & Chatbots
AI Powered Conversational Interfaces for Personalized Learning & ChatbotsAI Powered Conversational Interfaces for Personalized Learning & Chatbots
AI Powered Conversational Interfaces for Personalized Learning & ChatbotsAmazon Web Services
 
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
Flink Forward San Francisco 2018 keynote:  Anand Iyer - "Apache Flink + Apach...Flink Forward San Francisco 2018 keynote:  Anand Iyer - "Apache Flink + Apach...
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...Flink Forward
 
Research Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and DialogueResearch Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and DialogueRasa Technologies
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Capgemini - Project industrialization with apache spark
Capgemini - Project industrialization with apache sparkCapgemini - Project industrialization with apache spark
Capgemini - Project industrialization with apache sparkJean-Baptiste Martin
 

Similar to Democratizing NLP content modeling with transfer learning using GPUs (20)

Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the Rescue
 
Fajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMS
Fajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMSFajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMS
Fajar Purnama 152-d8713 Dynamic Content Synchronization Distributed LMS
 
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
AI Powered Conversational Interfaces for Personalized Learning & Chatbots
AI Powered Conversational Interfaces for Personalized Learning & ChatbotsAI Powered Conversational Interfaces for Personalized Learning & Chatbots
AI Powered Conversational Interfaces for Personalized Learning & Chatbots
 
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
Flink Forward San Francisco 2018 keynote:  Anand Iyer - "Apache Flink + Apach...Flink Forward San Francisco 2018 keynote:  Anand Iyer - "Apache Flink + Apach...
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
 
Research Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and DialogueResearch Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and Dialogue
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Capgemini - Project industrialization with apache spark
Capgemini - Project industrialization with apache sparkCapgemini - Project industrialization with apache spark
Capgemini - Project industrialization with apache spark
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (15)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 

Recently uploaded (20)

ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Democratizing NLP content modeling with transfer learning using GPUs

  • 1. Confidential Material – Chegg Inc. © 2005 - 2016. All Rights Reserved.Chegg Inc. © 2018 by Chegg Inc. All Rights Reserved. 1 Democratizing NLP content modeling with transfer learning using GPUs Sanghamitra Deb Staff Data Scientist Email: sdeb@chegg.com Twitter: @sangha_deb
  • 3. Example Slide Chegg Inc. © 2018. All Rights Reserved.3 Chegg
  • 4. Example Slide Chegg Inc. © 2018. All Rights Reserved.4 Chegg Tutors?
  • 5. Example Slide Chegg Inc. © 2018. All Rights Reserved.5 Flashtools Metaphase: The chromosomes line up at equator, centriole fibers attactch to centromeres (where the chromatids are joined to each other) Front Back Flashcards
  • 6. Example Slide Chegg Inc. © 2018. All Rights Reserved.6 Chegg Study
  • 7. Example Slide Chegg Inc. © 2018. All Rights Reserved.7 Chegg Study
  • 8. Example Slide Chegg Inc. © 2018. All Rights Reserved.8 Chegg Study
  • 9. Example Slide Chegg Inc. © 2018. All Rights Reserved.9 Chegg Study
  • 10. Example Slide Chegg Inc. © 2018. All Rights Reserved. • Democratizing NLP : what does it mean? • Transfer Learning • Word2vec • Sentence and character embeddings • Word embeddings in context • Applications • Knowledgebase creation • Unique concepts … 10 Overview
  • 11. Example Slide Chegg Inc. © 2018. All Rights Reserved.11 Democratizing NLP with transfer learning • Giving structure to unstructured data • Data analysts should be able to query language data and get insights. • Machine Learning practitioners with no knowledge of language should be able to use features from the NLP data. Why Transfer Learning? • Converts text into vectors thereby giving structure. • Transfer learning can be used to solve problems such as text summarization, entity recognition, tagging, keyword extraction and other downstream classification task. This structured data can be queried for further insights. • The result of transfer learning is typically feature-ization of text data at document, word or sentence level.
  • 12. Example Slide Chegg Inc. © 2018. All Rights Reserved.12 Traditional NLP Pipeline 1. Collecting Data 2. Gathering labelled data 3. Feature Engineering 4. Fit a model
  • 13. Example Slide Chegg Inc. © 2018. All Rights Reserved.13 Traditional Machine Learning Pipeline 1. Collecting Data 2. Gathering labelled data 3. Feature Engineering 4. Fit a model Deep learning replaces feature engineering !! However, DL requires huge amounts of data.
  • 14. Example Slide Chegg Inc. © 2018. All Rights Reserved.14 What is transfer learning? Traditional Machine Learning Task 1, domain 1 Model 1 Results Task 2, domain 2 Model 2 Results All models are task/domain specific
  • 15. Example Slide Chegg Inc. © 2018. All Rights Reserved.15 What is transfer learning? Traditional Machine Learning Source Task/dom ain Model 1 Knowledge Task 2, domain 2 Model 2All models are task/domain specific Sentence Vectors , word vectors
  • 16. Example Slide Chegg Inc. © 2018. All Rights Reserved.16 Classic Example: Computer Vision Pretrained ImageNet models have been used to achieve state-of-the-art results in tasks such as object detection , semantic segmentation , human pose estimation , and video recognition . At the same time, they have enabled the application of CV to domains where the number of training examples is small and annotation is expensive.
  • 17. Example Slide Chegg Inc. © 2018. All Rights Reserved.17 Transfer Learning in NLP: Word2vec Proposed in 2013 by mikolov as an approximation to language modeling The cat sat on the mat Glove, FastText
  • 18. Example Slide Chegg Inc. © 2018. All Rights Reserved.18 Transfer Learning in NLP: Word2vec Proposed in 2013 as an approximation to language modeling The cat sat on the mat Large corpus of text ~ say 10000 words • 10000 dim one hot vector • Interface it with 300 node hidden layer (weights connecting this layer : wordvectors) • Activations: linear summations of weighted inputs • Nodes are fed into softmax • During training the weights are changed such that words surrounding cat have a higher probability. Glove, FastText Download google pre-trained vectors. Or
  • 19. Example Slide Chegg Inc. © 2018. All Rights Reserved.19 Transfer Learning in NLP: Word2vec Proposed in 2013 as an approximation to language modeling The cat sat on the mat Glove, FastText
  • 20. Example Slide Chegg Inc. © 2018. All Rights Reserved.20 Character Embedding The broadway play premiered yesterday https://arxiv.org/abs/1508.06615 The broadway play premiered yesterday Softmax Concatenation of character Embeddings ] Convolution Layer with Multiple Filters ] Max over time pooling layer Cross Entropy between next word and prediction
  • 21. Example Slide Chegg Inc. © 2018. All Rights Reserved.21 Why are character embeddings important? • Input Layer is simplified, instead of 10000 sized one hot vector, we have an input of size<100. • At Chegg we have a lot of free form raw text that are input by students. • Spelling mistakes, student language, etc. • The vocabulary can have a lot of variation • Student linguistics evolve fast with time • We have a wide range of subjects with different symbols (for example: greek letters are common in physics math and other stem subjects)
  • 22. Example Slide Chegg Inc. © 2018. All Rights Reserved.22 Sentence Embeddings Autoencoders Language Models Skip Thought Vectors
  • 23. Example Slide Chegg Inc. © 2018. All Rights Reserved.23 Sentence Embeddings: AutoEncoders/Language Models …............. 150 20 20 Softmax …............. Next word After learning this 128 dimensional state represents the sentence vector Auto-encoders Language-Models
  • 24. Example Slide Chegg Inc. © 2018. All Rights Reserved.24 Word Embeddings in Context https://www.gocomics.com/frazz/ In a context free embedding ”crisp” in sentence “The morning air is getting crisp” and “getting burned to a crisp” would have the same vector: f(crisp) In a context aware model the embedding would be specific to the would be augmented by the context in which it appears. f(crisp, context)
  • 25. Example Slide Chegg Inc. © 2018. All Rights Reserved.25 Word Embeddings in Context : Elmo Best paper at NAACL 2018 Code publicly available: • Allennlp – pytorch • Tensorflow • Keras • Chainer https://allennlp.org/elmo
  • 26. Example Slide Chegg Inc. © 2018. All Rights Reserved.26 How does it work? The broadway play premiered yesterday https://arxiv.org/abs/1508.06615 The broadway play premiered yesterday Softmax Concatenation of character Embeddings ] Convolution Layer with Multiple Filters ] Max over time pooling layer Cross Entropy between next word and prediction Input to Elmo
  • 27. Example Slide Chegg Inc. © 2018. All Rights Reserved.27 Elmo: How does it work? Bi-directional Language Model ELMo is a task specific combination of the intermediate layer representations in the biLM.
  • 28. Example Slide Chegg Inc. © 2018. All Rights Reserved.28 What is the output?
  • 29. Example Slide Chegg Inc. © 2018. All Rights Reserved.29 What is the output?
  • 30. Example Slide Chegg Inc. © 2018. All Rights Reserved.30 Transfer Learning Pipeline: Then
  • 31. Example Slide Chegg Inc. © 2018. All Rights Reserved.31 Transfer Learning Pipeline: Now Finetuning Aggregating Use Neural Architecture to fine tune the vectors from each layer, this is common practice in computer vision Use any technique to aggregate all the layers
  • 32. Example Slide Chegg Inc. © 2018. All Rights Reserved.32 Transfer Learning at a glance Training Classifier: The embeddings contain direct signals from labels. The labels should be related to the task at hand for maximum data efficiency. Language model: The embeddings are learned from any text, but no signals from labels. Machine Translation: The embeddings are learned from surprisingly copious translation data. The idea is that if the embeddings can translate to a foreign language, it can translate to a classifier task as well. Neural Architectures RNN: Model with infinite context, but can’t parallelize the computation. CNN: Model with local context that is highly parallelizable but can increase the context by stacking them deeply and add either dilated or separable convolutions. Transformer: uses self-attention and positional encoding. Good for small documents. Mean/Max Pool: A simple averaging / maxing of all the context vectors will give you reasonable results. Last Vector: If the model aggregates information to the last vector, you can simply pop off the last vector as the document embeddings. Attention: Attention dynamically allocates the importances of the context embeddings before averaging to a document embedding. Aggregation
  • 33. Example Slide Chegg Inc. © 2018. All Rights Reserved.33 Transfer Learning at a glance Training Classifier: The embeddings contain direct signals from labels. The labels should be related to the task at hand for maximum data efficiency. Language model: The embeddings are learned from any text, but no signals from labels. Machine Translation: The embeddings are learned from surprisingly copious translation data. The idea is that if the embeddings can translate to a foreign language, it can translate to a classifier task as well. Mean/Max Pool: A simple averaging / maxing of all the context vectors will give you reasonable results. Last Vector: If the model aggregates information to the last vector, you can simply pop off the last vector as the document embeddings. Attention: Attention dynamically allocates the importances of the context embeddings before averaging to a document embedding. Aggregation
  • 34. Example Slide Chegg Inc. © 2018. All Rights Reserved.34 Performance https://indico.io/blog/more-effective-transfer-learning-for-nlp/
  • 35. Example Slide Chegg Inc. © 2018. All Rights Reserved.35 Democratization: Phase 1 • Use the advantages of deep learning even in regimes of small data • Feature Engineering is automated, increasing performance of data scientists • This features can be used to productionize an ML model by a team
  • 36. WIP: DRAFT SLIDES Confidential Material – Chegg Inc. © 2005 – 2016. All Rights Reserved. Applications 36c
  • 37. Example Slide Chegg Inc. © 2018. All Rights Reserved. • Creating a knowledgebase • Finding unique concepts. • Word Sense Disambiguation • Equation extraction • Text summarization • Creating flash cards • Providing condensed information to tutors/experts to efficiently answer student needs. 37 Applications
  • 38. Example Slide Chegg Inc. © 2018. All Rights Reserved.38 Creating a knowledgebase All Content Algebra Physics Statistics Mechanical Eng Accounting …. Several Tens of Subjects
  • 39. Example Slide Chegg Inc. © 2018. All Rights Reserved.39 Creating a knowledgebase All Content Algebra Physics Statistics Mechanical Eng Accounting …. Several Tens of Subjects
  • 40. Example Slide Chegg Inc. © 2018. All Rights Reserved.40 What is a knowledgebase? Statistics Probability Testing Regression Probability Discrete PDs Continuous PD’s Sampling Estimation Hypothesis Testing Regression Binomial NormalBayes Theorem
  • 41. Example Slide Chegg Inc. © 2018. All Rights Reserved.41 Classification Task • Get sentence embeddings for each of the sentences. • Concatenate elmo embeddings to the sentence embeddings. • Do a tfidf weighting for the sentences. • Use a simple classifier such as logistic regression or SVM (to ensure scalability in the production pipeline) for the classification Model Building Fine Tuning results • Look at precision, recall and f1-score • For specific product need use high precision results to avoid false positives Example completely factor the following expressions. 1. t 2 + 4tv + 4v 2 2. z 2 + 15z -54 3. 4x 2 - 8x -12 + 6x 4. 144 - 9p 2 5. 5c 2 - 24cd - 5d 2 6. w 2 - 17w + 42 7. 256z 2 - 4 - 192z 2 + 3 8. 2a 2 c 3 - 14bc 3 + 32c 3 d 2 9. 35g 2 + 6g - 9 10. 3j 3 - 51j 2 + 210j use factoring and the zero- product property to solve the following problems. 1. z(z - 1)(z + 3) = 0 2. x 2 - x - 10 = 2 3. 4a 2 - 11a + 6 = 0 4. 9r 2 - 30r + 21 = -4.
  • 42. Example Slide Chegg Inc. © 2018. All Rights Reserved. • Classification using tfidf • Transfer learning techniques • Weak Supervision • Thresholding • Active Learning 42 Machine Learning methods used On average we could achieve an average of > 80% accuracy for all nodes in the knowledge tree
  • 43. Example Slide Chegg Inc. © 2018. All Rights Reserved.43 FlashCards Identifying unique concepts Word Sense Disambiguation • Clustering feature vectors (context words, sentences) to identify unique topics. Caveat: It is typically possible to extract a subset of uniques concepts using this method. • Finding similarities between flash cards using the feature vectors Example: Cardiac muscle ~1000 flash cards exist Dominating topics: 1) Striated muscle of the heart 2) Propels blood 3) … 4) … 5) Mix bag of topics Pleated sheet : cloth, curtain Pleated sheet : regular secondary structure in proteins --- a topic in bio chemistry Circular: Mathmatical concept Circular: a store advertisement
  • 44. Example Slide Chegg Inc. © 2018. All Rights Reserved.44 Equation Extraction using character embeddings Math Data: Equation Extraction …given by P(x) = -x 2 + 150x + 50, where x is… …given by F(x)=0.02x2 +1.56x+9.8 where x…
  • 45. Example Slide Chegg Inc. © 2018. All Rights Reserved. • Tagging content with nodes of the knowledge tree provides queriable structure to unstructured text • It is possible perform analytics on this the volume, student engagement, dominance of different concepts at different times of a year from this data. • Finally the tagged contents form the basis of recommendation systems in different parts of chegg. 45 Democratizing NLP: Phase 2
  • 46. Example Slide Chegg Inc. © 2018. All Rights Reserved.46 Conclusion In real life small data problems are more common compared to big data problems. With transfer learning we are able to use the advantages of deep learning, i.e replace feature engineering in domain of small data Transfer Learning produces structured dense features from unstructured text, these features can be combines with structured features (eg: views, clicks, conversion) to produce more robust models
  • 47. WIP: DRAFT SLIDES Confidential Material – Chegg Inc. © 2005 – 2016. All Rights Reserved. Questions 47c Email: sdeb@chegg.com Twitter: @sangha_deb
  • 48. WIP: DRAFT SLIDES Confidential Material – Chegg Inc. © 2005 – 2016. All Rights Reserved. Appendix 48c
  • 49. Example Slide Chegg Inc. © 2018. All Rights Reserved.49 References Character Embeddings: https://machinelearningmastery.com/develop-character-based-neural-language-model-keras/ https://medium.com/@surmenok/character-level-convolutional-networks-for-text-classification-d582c0c36ace https://medium.com/@zhuixiyou/character-level-cnn-with-keras-50391c3adf33 Contextual Word Embeddings: https://towardsdatascience.com/elmo-helps-to-further-improve-your-word-embeddings-c6ed2c9df95f Attention model http://jalammar.github.io/illustrated-transformer/ Sentence Embedding https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93 Ulmfit http://nlp.fast.ai/category/classification.html
  • 50. Example Slide Chegg Inc. © 2018. All Rights Reserved.50 Sentence Embeddings: Skip-grams Sentence_Before …............. 20 dim Sentence_Current …............. 20 dim Senetnce_After …............. 20 dim Given: Predict Predict
  • 51. Example Slide Chegg Inc. © 2018. All Rights Reserved.51 Sentence Embeddings: Skip-grams Sentence_prev Sentence_Current Senetnce_next …............. …............. …............. 20 dim 20 dim 20 dim Word Embedding matrix for previous sentence. 150 150 Word Embedding matrix for current sentence. Emb_prev Emb_curr Emb_next Word Embedding matrix for next sentence.
  • 52. Example Slide Chegg Inc. © 2018. All Rights Reserved.52 Sentence Embeddings: Skip-grams, Architecture Sentence_prev Sentence_Current : 20 dim Senetnce_next …............. …............. …............. 20 dim 20 dim Emb_curr Encoder Emb_prev 128 Emb_next Softmax Softmax Decoder_Final_ Output: 20 * vocab_sizeDecoder_prev After learning this 128 dimensional state represents the sentence vector Decoder_next 128*20 128*20
  • 53. Example Slide Chegg Inc. © 2018. All Rights Reserved. Sentence Embeddings: Skip-grams, Learning Sentence_prev …............. 20 dim Senetnce_next …............. 20 dim 20*2000 Decoder_Final_Output _prev: 20 * vocab_size Decoder_Final_Output _next: 20 * vocab_size 20*2000 Loss :sparse categorical crossentropy Loss :sparse categorical crossentropy