SlideShare a Scribd company logo
AI MADE EASY
An Introduction to
Machine Learning in
Natural Language Processing
The use of different techniques from computer science to
understand and manipulate human language and speech.
Isar Nejadgholi
isar@imrsv.ai
What do we want machines to
learn?
• Prediction or labeling
• Reasoning
• Understanding
• Language generation
LANGUAGE IS HARD
Irony and sarcasm
Elizabeth told Amanda
that she had a problem.
Reference resolution
The tank was full of water
I saw the military tank.
Lexical ambiguity
visiting relatives can
cause problems.
Syntactic (structural) ambiguity
I bought an iPhone a few
days ago. It was such a nice
phone. The touch screen
was really cool. The battery
life was not long, though and
my mother thought the
phone was too expensive.
Subjectivity of annotations
Tell me something
I don’t know.
APPLICATIONS
Natural
Language
Processing
Classical Natural Language Processing
Computational
Linguistics
Statistical and
Probabilistic
ML
TEXT REPRESENTATION IN CLASSICAL NLP
Term frequency vector
How to get a dense and informative representation?
… …
CLASSICAL NLP PIPELINE
• Frequency based representation
• Converting sparse representation to dense
vectors using SVD-based methods
• Statistical inference, probabilistic models,
similarity metrics
• Knowledge base of lexical rules and
word relations
• Character pattern matching
TOOLS
NLP Meets Deep Learning
Recent advances in NLP
DEEP LEARNING NLP MODEL
Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models
November 10, 2016, Matthew Honnibal
EMBED: WORD REPRESENTATION WITH WORD EMBEDDINGS
INPUT PROJECTION
OUTPUT
w(t)
w(t-2)
w(t+2)
w(t+1)
w(t-1)
Pretrained embeddings:
• Word2vec
• Glove
• Fasttext
REASONING WITH WORD VECTORS
Pair 1 Pair 2
China - Chinese Sri Lanka - Sri Lankan
Colombian - FARC Somalian - Alshabab
Roma - Hungarian Bahai - Iranian
Palestinian - Hamas Lebanon - Hezbollah
PRRA - Preremoval RPD - post hearing
Man is to woman as king is to ______?
Meaning(king) - meaning(man) + meaning(woman) = ?
(Mikolov et al. 2013)
Analogies in Legal Embedding
IMRSV WORD SUMMARIZER
Probabilistic PCA
ENCODE: SENTENCE REPRESENTATION WITH
RNNs
• Converts concatenation of word vectors (or bag of words) to a more meaningful sentence matrix
• Gets meaning of word in the context using memory and sequential modeling
• Can be bidirectional
• Popular models: GRU, LSTM, BiLSTM
• Can be used as decoder to generate text
Context vector or matrix
Predict
ATTEND: TEXT REPRESENTATION WITH
ATTENTION
• Reduces sentence matrix to sentence vector
• Without attention this conversion is done by averaging or max pooling
• Attention is a nonlinear weighted averaging of sentence matrix and context and learns what to keep
UNBALANCED AND MULTI-LABEL CLASSIFICATION PROBLEM
DATA PREPARATION
Cleaning
• Removing punctuations
• Autocorrect a set of critical words:
a$$ clowns -> ass clowns
sh!t -> shit
5hit -> shit
b l o o d y -> bloody
Augmentation with translation
• en -> fr -> en
• en -> de -> en
• en -> sp -> en
STRUCTURE
OF TOXICITY
CLASSIFIER
VIOLENCE IN ISIS FANBOY’S TWEETS
VIOLENT TWEETS OVER TIME
Explosion in Pakistan
ISIS commander killed
Explosion in Turkey
Numberofviolenttweets
WHERE DO I GET THE RIGHT DATA?
• Text is everywhere, but it is messy
• Volume, variety and velocity
• Annotation is subjective and expensive
Get messy!
Learn to clean!
Learn to augment!
Try to develop an intuition about your data!
Questions?

More Related Content

Similar to Introduction to Natural Language Processing

Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
butest
 

Similar to Introduction to Natural Language Processing (20)

State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
 
ICS1020 NLP 2020
ICS1020 NLP 2020ICS1020 NLP 2020
ICS1020 NLP 2020
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
 Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
 
Incrementality
IncrementalityIncrementality
Incrementality
 
Speech totext
Speech totextSpeech totext
Speech totext
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
 
ALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
ALX320_The Science Behind the Alexa Prize Meeting The AI ChallengesALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
ALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Diving deep into NLP
Diving deep into NLPDiving deep into NLP
Diving deep into NLP
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 

More from Jenny Midwinter

Machine Learning meets Granular Computing
Machine Learning meets Granular ComputingMachine Learning meets Granular Computing
Machine Learning meets Granular Computing
Jenny Midwinter
 

More from Jenny Midwinter (11)

Practical Challenges ML Workflows
Practical Challenges ML WorkflowsPractical Challenges ML Workflows
Practical Challenges ML Workflows
 
Machine learning applications in clinical brain computer interfacing
Machine learning applications in clinical brain computer interfacingMachine learning applications in clinical brain computer interfacing
Machine learning applications in clinical brain computer interfacing
 
Augmented Intelligence Bridging the Gap Between BI and AI
Augmented Intelligence  Bridging the Gap Between BI and AIAugmented Intelligence  Bridging the Gap Between BI and AI
Augmented Intelligence Bridging the Gap Between BI and AI
 
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen AngelovAutonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
 
Ai and analytics for business
Ai and analytics for businessAi and analytics for business
Ai and analytics for business
 
Building an NLP DNN in 5 Minutes
Building an NLP DNN in 5 MinutesBuilding an NLP DNN in 5 Minutes
Building an NLP DNN in 5 Minutes
 
Machine Learning meets Granular Computing
Machine Learning meets Granular ComputingMachine Learning meets Granular Computing
Machine Learning meets Granular Computing
 
2016 09-19 - stephan jou - machine learning meetup v1
2016 09-19 - stephan jou - machine learning meetup v12016 09-19 - stephan jou - machine learning meetup v1
2016 09-19 - stephan jou - machine learning meetup v1
 
Machine Learning at Amazon
Machine Learning at AmazonMachine Learning at Amazon
Machine Learning at Amazon
 
AI and Machine Learning: The many different approaches
AI and Machine Learning: The many different approachesAI and Machine Learning: The many different approaches
AI and Machine Learning: The many different approaches
 
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsApplying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 

Introduction to Natural Language Processing

  • 2. An Introduction to Machine Learning in Natural Language Processing The use of different techniques from computer science to understand and manipulate human language and speech. Isar Nejadgholi isar@imrsv.ai
  • 3. What do we want machines to learn? • Prediction or labeling • Reasoning • Understanding • Language generation
  • 4. LANGUAGE IS HARD Irony and sarcasm Elizabeth told Amanda that she had a problem. Reference resolution The tank was full of water I saw the military tank. Lexical ambiguity visiting relatives can cause problems. Syntactic (structural) ambiguity I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The battery life was not long, though and my mother thought the phone was too expensive. Subjectivity of annotations Tell me something I don’t know.
  • 6. Classical Natural Language Processing Computational Linguistics Statistical and Probabilistic ML
  • 7. TEXT REPRESENTATION IN CLASSICAL NLP Term frequency vector How to get a dense and informative representation? … …
  • 8. CLASSICAL NLP PIPELINE • Frequency based representation • Converting sparse representation to dense vectors using SVD-based methods • Statistical inference, probabilistic models, similarity metrics • Knowledge base of lexical rules and word relations • Character pattern matching
  • 10. NLP Meets Deep Learning Recent advances in NLP
  • 11.
  • 12. DEEP LEARNING NLP MODEL Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models November 10, 2016, Matthew Honnibal
  • 13. EMBED: WORD REPRESENTATION WITH WORD EMBEDDINGS INPUT PROJECTION OUTPUT w(t) w(t-2) w(t+2) w(t+1) w(t-1) Pretrained embeddings: • Word2vec • Glove • Fasttext
  • 14. REASONING WITH WORD VECTORS Pair 1 Pair 2 China - Chinese Sri Lanka - Sri Lankan Colombian - FARC Somalian - Alshabab Roma - Hungarian Bahai - Iranian Palestinian - Hamas Lebanon - Hezbollah PRRA - Preremoval RPD - post hearing Man is to woman as king is to ______? Meaning(king) - meaning(man) + meaning(woman) = ? (Mikolov et al. 2013) Analogies in Legal Embedding
  • 16. ENCODE: SENTENCE REPRESENTATION WITH RNNs • Converts concatenation of word vectors (or bag of words) to a more meaningful sentence matrix • Gets meaning of word in the context using memory and sequential modeling • Can be bidirectional • Popular models: GRU, LSTM, BiLSTM • Can be used as decoder to generate text
  • 17. Context vector or matrix Predict ATTEND: TEXT REPRESENTATION WITH ATTENTION • Reduces sentence matrix to sentence vector • Without attention this conversion is done by averaging or max pooling • Attention is a nonlinear weighted averaging of sentence matrix and context and learns what to keep
  • 18.
  • 19. UNBALANCED AND MULTI-LABEL CLASSIFICATION PROBLEM
  • 20. DATA PREPARATION Cleaning • Removing punctuations • Autocorrect a set of critical words: a$$ clowns -> ass clowns sh!t -> shit 5hit -> shit b l o o d y -> bloody Augmentation with translation • en -> fr -> en • en -> de -> en • en -> sp -> en
  • 22. VIOLENCE IN ISIS FANBOY’S TWEETS
  • 23. VIOLENT TWEETS OVER TIME Explosion in Pakistan ISIS commander killed Explosion in Turkey Numberofviolenttweets
  • 24. WHERE DO I GET THE RIGHT DATA? • Text is everywhere, but it is messy • Volume, variety and velocity • Annotation is subjective and expensive Get messy! Learn to clean! Learn to augment! Try to develop an intuition about your data!