SlideShare a Scribd company logo
1 of 12
Unit-3
Session-3
Concepts of Natural Language
processing
Concept of NLP:-
● Computer can’t yet truly understand English in the way that
humans do– but thanks to AI and NLP, they are learning it
fast, try to reach the meaning of sentence and respond
accordingly.
● AI technologies one thing in common
○ They breakup the problem into very small pieces to simplify
○ Reduce the complexity by removing extra information
○ Use AI to solve each smaller piece separately
○ Tie together the processed result
○ Finally convert the processed result to number so that the
computer can understand it
Corpus:-
● A corpus is a large and structured set of machine-readable
texts that have been produced in a natural communicative
setting.
OR
● A corpus can be defined as a collection of text documents. It
can be thought of as just a bunch of text files in a directory,
often alongside many other directories of text files
1. Text Normalization :-
It comes under data processing.
It is a process to reduce the variation in text’s word forms to a
common form when the variations means the same thing.
The text normalization divides the text into smaller components
called tokens( usually the words in the text) and group related
tokens together.
2. Sentence Segmentation:-
Dividing the whole text (corpus) into individual sentences.
Before Sentence Segmentation After Sentence Segmentation
“You want to see the dreams with close
eyes and achieve them? They’ll remain
dreams, look for AIMs and your eyes
have to stay open for a change to be
seen.”
1. You want to see the dreams with
close eyes and achieve them?
2. They’ll remain dreams, look for
AIMs and your eyes have to stay
open for a change to be seen.
3. Tokenization:-
It is the process of splitting up of individual sentence into smaller
units called token(a word,a phrase,a number,a symbol).
TOKEN:- A Token is a well defined semantic unit inside a sentence
and contributes to the overall meaning of the sentence.A Token
may represent a word,a phrase,a number or a symbol.
Zain walked down four blocks to pick
up ice cream.
Tokenization
Zain walked down four blocks to pick up ice cream .
Proper Noun Verb Adv Num Noun Part verb adp noun noun Punctuation
4. Removal Of Stop words , Special Characters and Numbers:-
In this step, the tokens which are not necessary are removed from the
token list.
To make it easier for the computer to focus on meaningful terms,
these words are removed..
Stopwords: Words in any language which do not add much meaning to a sentence.They
can safely be ignored without sacrificing the meaning of the sentence.
Examples: a, an, and, are, as, for, it, is, into, in, if, on, or, such, the, there, to.
1. You want to see the dreams with close eyes and achieve them?
● the removed words would be
● to, the, and, ?
2. The outcome would be:
● You want see dreams with close eyes achieve them.
Converting text to a common case:-
we e convert the whole text into a similar case, preferably lower case. This ensures that the case
sensitivity of the machine does not consider the same words as different just because of different
cases.
Stemming:-
● The process of extracting the root from of the word by removing affixes, is
known as stemming.
● The words extracted through stemming are called stem.
Words Affixes Stem
healing ing heal
dreams s dream
studies es studi
Lemmatization:-
Definition: In lemmatization, the word we get after affix removal (also known as lemma) is a meaningful
one and it takes a longer time to execute than stemming.
Lemma:-is the base ,root from
Words Affixes Stem
healing ing heal
dreams s dream
studies es study
Difference between stemming and lemmatization
Stemming lemmatization
1. The stemmed words might not be meaningful. 1. The lemma word is a meaningful one.
Caring ➔ Car Caring ➔ Care
Thank you
Questions:-

More Related Content

Similar to Concepts of NLP.pptx

Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptxNLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
rohithprabhas1
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdfNatural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
theboysaiml
 

Similar to Concepts of NLP.pptx (20)

Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AI
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
 
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptxNLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Ijetcas14 458
Ijetcas14 458Ijetcas14 458
Ijetcas14 458
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMEROPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
 
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMEROPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
 
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Natural Language Processing from Object Automation
Natural Language Processing from Object Automation
 
Ai assignment
Ai assignmentAi assignment
Ai assignment
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdfNatural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 

More from Judesharp1

More from Judesharp1 (6)

powerpoint-160209205440 (1).pdf
powerpoint-160209205440 (1).pdfpowerpoint-160209205440 (1).pdf
powerpoint-160209205440 (1).pdf
 
transgenicplantsandanimals-140518135541-phpapp01.pdf
transgenicplantsandanimals-140518135541-phpapp01.pdftransgenicplantsandanimals-140518135541-phpapp01.pdf
transgenicplantsandanimals-140518135541-phpapp01.pdf
 
Delhi-NTSE-Sample-Paper-SAT.pdf
Delhi-NTSE-Sample-Paper-SAT.pdfDelhi-NTSE-Sample-Paper-SAT.pdf
Delhi-NTSE-Sample-Paper-SAT.pdf
 
AGRICULTURE MAJOR CROPS.pdf
AGRICULTURE MAJOR CROPS.pdfAGRICULTURE MAJOR CROPS.pdf
AGRICULTURE MAJOR CROPS.pdf
 
ANALYTICAL PARAGRAPH-1.pptx
ANALYTICAL PARAGRAPH-1.pptxANALYTICAL PARAGRAPH-1.pptx
ANALYTICAL PARAGRAPH-1.pptx
 
X-8-How do organisms reproduce.ppt
X-8-How do organisms reproduce.pptX-8-How do organisms reproduce.ppt
X-8-How do organisms reproduce.ppt
 

Recently uploaded

Captain america painting competition -- 13
Captain america painting competition -- 13Captain america painting competition -- 13
Captain america painting competition -- 13
Su Yan-Jen
 
prodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyuprodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyu
LeonBraley
 
Short film analysis.pptxdddddddddddddddddddddddddddd
Short film analysis.pptxddddddddddddddddddddddddddddShort film analysis.pptxdddddddddddddddddddddddddddd
Short film analysis.pptxdddddddddddddddddddddddddddd
LeonBraley
 
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
yulianti213969
 
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
basxuke
 
一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理
一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理
一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理
Fir
 
OBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGI
OBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGIOBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGI
OBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGI
Obat Cytotec
 
Sun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung daiSun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung dai
GiangTra20
 
Norco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdf
Norco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdfNorco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdf
Norco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdf
RebeccaPontieri
 

Recently uploaded (20)

Green Lantern the Animated Series Practice Boards by Phoebe Holmes.pdf
Green Lantern the Animated Series Practice Boards by Phoebe Holmes.pdfGreen Lantern the Animated Series Practice Boards by Phoebe Holmes.pdf
Green Lantern the Animated Series Practice Boards by Phoebe Holmes.pdf
 
Captain america painting competition -- 13
Captain america painting competition -- 13Captain america painting competition -- 13
Captain america painting competition -- 13
 
Museum Quality | PrintAction.pdf
Museum Quality | PrintAction.pdfMuseum Quality | PrintAction.pdf
Museum Quality | PrintAction.pdf
 
prodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyuprodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyu
 
Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...
Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...
Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...
 
Short film analysis.pptxdddddddddddddddddddddddddddd
Short film analysis.pptxddddddddddddddddddddddddddddShort film analysis.pptxdddddddddddddddddddddddddddd
Short film analysis.pptxdddddddddddddddddddddddddddd
 
Our great adventures in Warsaw - On arrival training
Our great adventures in Warsaw - On arrival trainingOur great adventures in Warsaw - On arrival training
Our great adventures in Warsaw - On arrival training
 
Reading 1 Artworks about books and readers
Reading 1 Artworks about books and readersReading 1 Artworks about books and readers
Reading 1 Artworks about books and readers
 
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
 
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
 
isodactylism_Diagram.pdf................
isodactylism_Diagram.pdf................isodactylism_Diagram.pdf................
isodactylism_Diagram.pdf................
 
Reading 8 Artworks about books and readers
Reading 8 Artworks about books and readersReading 8 Artworks about books and readers
Reading 8 Artworks about books and readers
 
Star Wars Inspired Lightsaber Battle Assignment
Star Wars Inspired Lightsaber Battle AssignmentStar Wars Inspired Lightsaber Battle Assignment
Star Wars Inspired Lightsaber Battle Assignment
 
My scariest moment presentation-part one
My scariest moment presentation-part oneMy scariest moment presentation-part one
My scariest moment presentation-part one
 
VIP ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Serviℂe...
VIP ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Serviℂe...VIP ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Serviℂe...
VIP ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Serviℂe...
 
Visionaries Alchemy 2017, Olga Spiegel, Miguel Tio, France Garrido and Bienve...
Visionaries Alchemy 2017, Olga Spiegel, Miguel Tio, France Garrido and Bienve...Visionaries Alchemy 2017, Olga Spiegel, Miguel Tio, France Garrido and Bienve...
Visionaries Alchemy 2017, Olga Spiegel, Miguel Tio, France Garrido and Bienve...
 
一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理
一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理
一比一原版(Drexel毕业证书)美国芝加哥城市学院毕业证如何办理
 
OBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGI
OBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGIOBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGI
OBAT ABORSI BANYUWANGI 087776558899 💊 OBAT PENGGUGUR KANDUNGAN BANYNYUWANGI
 
Sun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung daiSun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung dai
 
Norco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdf
Norco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdfNorco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdf
Norco College - M4MH Athlete Pilot - 4.30.24 - Presentation.pdf
 

Concepts of NLP.pptx

  • 2. Concept of NLP:- ● Computer can’t yet truly understand English in the way that humans do– but thanks to AI and NLP, they are learning it fast, try to reach the meaning of sentence and respond accordingly. ● AI technologies one thing in common ○ They breakup the problem into very small pieces to simplify ○ Reduce the complexity by removing extra information ○ Use AI to solve each smaller piece separately ○ Tie together the processed result ○ Finally convert the processed result to number so that the computer can understand it
  • 3. Corpus:- ● A corpus is a large and structured set of machine-readable texts that have been produced in a natural communicative setting. OR ● A corpus can be defined as a collection of text documents. It can be thought of as just a bunch of text files in a directory, often alongside many other directories of text files
  • 4. 1. Text Normalization :- It comes under data processing. It is a process to reduce the variation in text’s word forms to a common form when the variations means the same thing. The text normalization divides the text into smaller components called tokens( usually the words in the text) and group related tokens together.
  • 5. 2. Sentence Segmentation:- Dividing the whole text (corpus) into individual sentences. Before Sentence Segmentation After Sentence Segmentation “You want to see the dreams with close eyes and achieve them? They’ll remain dreams, look for AIMs and your eyes have to stay open for a change to be seen.” 1. You want to see the dreams with close eyes and achieve them? 2. They’ll remain dreams, look for AIMs and your eyes have to stay open for a change to be seen.
  • 6. 3. Tokenization:- It is the process of splitting up of individual sentence into smaller units called token(a word,a phrase,a number,a symbol). TOKEN:- A Token is a well defined semantic unit inside a sentence and contributes to the overall meaning of the sentence.A Token may represent a word,a phrase,a number or a symbol. Zain walked down four blocks to pick up ice cream. Tokenization Zain walked down four blocks to pick up ice cream . Proper Noun Verb Adv Num Noun Part verb adp noun noun Punctuation
  • 7. 4. Removal Of Stop words , Special Characters and Numbers:- In this step, the tokens which are not necessary are removed from the token list. To make it easier for the computer to focus on meaningful terms, these words are removed.. Stopwords: Words in any language which do not add much meaning to a sentence.They can safely be ignored without sacrificing the meaning of the sentence. Examples: a, an, and, are, as, for, it, is, into, in, if, on, or, such, the, there, to. 1. You want to see the dreams with close eyes and achieve them? ● the removed words would be ● to, the, and, ? 2. The outcome would be: ● You want see dreams with close eyes achieve them.
  • 8. Converting text to a common case:- we e convert the whole text into a similar case, preferably lower case. This ensures that the case sensitivity of the machine does not consider the same words as different just because of different cases.
  • 9. Stemming:- ● The process of extracting the root from of the word by removing affixes, is known as stemming. ● The words extracted through stemming are called stem. Words Affixes Stem healing ing heal dreams s dream studies es studi
  • 10. Lemmatization:- Definition: In lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one and it takes a longer time to execute than stemming. Lemma:-is the base ,root from Words Affixes Stem healing ing heal dreams s dream studies es study Difference between stemming and lemmatization Stemming lemmatization 1. The stemmed words might not be meaningful. 1. The lemma word is a meaningful one. Caring ➔ Car Caring ➔ Care