SlideShare a Scribd company logo
1 of 16
By:
Kavivarma M P
21MDT1024
M.Sc. Data science
 https://www.mygreatlearning.com/entry-level-
professionals?&utm_source=google&utm_medium=search_brand&u
tm_campaign=GL_brand_below_24&adgroup_id=143559080351&c
ampaign_id=18212251889&keyword=great%20learning&placement
=&gclid=CjwKCAiAmuKbBhA2EiwAxQnt77AYwFXvofkibM3fua
o90Ug0VScdcZM58g1AZQIUpQ_U8lSVwhZrQxoC_jAQAvD_Bw
E
• NLP, or natural language processing, is a technique used to
translate human language for a machine. In essence, it is the
automatic manipulation of natural language, such as speech
and text, by software for additional analysis to obtain the
necessary information from them.
• Computational linguistics, or the rule-based modelling of
human language, is combined with statistical, machine
learning, and deep learning models to form NLP. These make it
possible for computers to process spoken or written language.
 Predictive Text
 Pos tagging
 E-mail filters
 Smart Assistants
 Sentimental analysis
 Language Translation
 Data Analysis
 It served as a method of dividing a word, a sentence, a paragraph, or an entire written
document into manageable pieces.
 We can obtain the specific keywords or words by doing that. Tokens are the smaller,
individual units. Analyzing the words that are used in the text aids in interpreting the
meaning of the text.
 The text's word count should be determined. The ball is very big, for instance: ["the ,ball,
is, very, big"]
 Stemming is the process of
stripping a word back to its root,
which attaches to suffixes and
prefixes. This operates by
removing the beginning or end of
the word while taking into account
a list of frequently occurring
prefixes and suffixes that can be
found in an inflected word.
 Why Stemming:-
 A smaller input
 dimensional benefits from using
machine learning techniques.
 Densify the training data.
 shrink the dictionary's size helps to
make the document's wording
more
Form Suffix Stem
Books -s Book
coins -s coin
 Lemmatization helps to do the
morphological analysis of the words.
 It is important to have the knowledge
about the detailed dictionaries which the
algorithm can refer to link the form back
to its lemma.
Form Morphologic
al
Information
Lemma
Sleeps Third person
singular,
present
tense 
Sleep
Sleep
Bowling Ing form of
the verb
Bowl
Topic Stemming Lemmatization
Goal Reduce inflectional forms
(Stemming chops off the
ends of the words in order
to achieve the goal
correctly)
Reduce inflectional forms
(Lemmatization refers to do
things properly with the
help of a vocabulary and
morphological analysis of
words)
Implementation Stemmers are easier to
implement and run faster
when compared to
Lemmatization
Lemmatization is slightly
difficult to implement
 Stop words are frequent words that appear in sentences and give the sentence
more emphasis.
 Stop words serve as a transitional element and guarantee proper grammar.
 A stop word is, in essence, a word that is filtered out before to processing
natural language data.
 This pre-processing technique is widely used.
 The preprocessing of the text or documentation uses the Bag o
f Words model.
 It turns the documents into a collection of words and keeps tra
ck of how many times the most common words appear overall.
 One of the most popular ways to turn tokens into a set of chara
cteristics is the bag-of-words technique.
 Term Frequency and Inverse Document Frequency is referred to as TF-IDF.
 This aids in calculating the score needed to obtain information retrieval (I
R) or summary.
 The TFIDF can also be used to determine how pertinent a term is in a parti
cular document.
 How to compound two measures to determine the TF-IDF:
 How frequently a word appears in a document, as well as its inverse docu
ment frequency throughout a collection of documents
 A word's significance within the context of the document corpus can be determi
ned with the use of TF-IDF.
 When calculating TFIDF, the number of times a word appears in a document is t
aken into account, offset by the number of documents included in the corpus.
 TF is calculated by dividing the number of terms in the document by their freq
uency.
 DF is calculated by taking the logarithm of the number of documents divided by
the number of documents containing the phrase.
 Word Embeddings vectors are one of the most common
way to encode words as vectors of numbers those
vectors can ben fed in into the Machine Learning
models for inference and also it helps to establish the
distance between two tokens
Types:-
• Word2vec
• Glove
• fasttext
NLP tasks like lemmatization, stemming, tokenization, noun phrase extraction, POS
tagging, N-grams, and sentiment analysis are carried out using the open-
source Python module Textblob.
Although it is quicker than NLTK, it does not include functions like dependency par
sing or vectorization.
Textblob can be used for text classification and sentiment analysis.

More Related Content

Similar to Natural Language Processing in Artificial intelligence

Natural Language Processing (NLP).pdf
Natural Language Processing (NLP).pdfNatural Language Processing (NLP).pdf
Natural Language Processing (NLP).pdfMoar Digital 360
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMCrafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMChristopherTHyatt
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsIOSR Journals
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Natural language processing
Natural language processingNatural language processing
Natural language processingRobert Antony
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
 
Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Taggingkevig
 
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGGENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGijnlc
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsParisa Niksefat
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdfAnime196637
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 

Similar to Natural Language Processing in Artificial intelligence (20)

NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Natural Language Processing (NLP).pdf
Natural Language Processing (NLP).pdfNatural Language Processing (NLP).pdf
Natural Language Processing (NLP).pdf
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMCrafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Tagging
 
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGGENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdf
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 

Recently uploaded

School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...gragchanchal546
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 

Recently uploaded (20)

School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 

Natural Language Processing in Artificial intelligence

  • 3.
  • 4. • NLP, or natural language processing, is a technique used to translate human language for a machine. In essence, it is the automatic manipulation of natural language, such as speech and text, by software for additional analysis to obtain the necessary information from them. • Computational linguistics, or the rule-based modelling of human language, is combined with statistical, machine learning, and deep learning models to form NLP. These make it possible for computers to process spoken or written language.
  • 5.  Predictive Text  Pos tagging  E-mail filters  Smart Assistants  Sentimental analysis  Language Translation  Data Analysis
  • 6.  It served as a method of dividing a word, a sentence, a paragraph, or an entire written document into manageable pieces.  We can obtain the specific keywords or words by doing that. Tokens are the smaller, individual units. Analyzing the words that are used in the text aids in interpreting the meaning of the text.  The text's word count should be determined. The ball is very big, for instance: ["the ,ball, is, very, big"]
  • 7.  Stemming is the process of stripping a word back to its root, which attaches to suffixes and prefixes. This operates by removing the beginning or end of the word while taking into account a list of frequently occurring prefixes and suffixes that can be found in an inflected word.  Why Stemming:-  A smaller input  dimensional benefits from using machine learning techniques.  Densify the training data.  shrink the dictionary's size helps to make the document's wording more Form Suffix Stem Books -s Book coins -s coin
  • 8.  Lemmatization helps to do the morphological analysis of the words.  It is important to have the knowledge about the detailed dictionaries which the algorithm can refer to link the form back to its lemma. Form Morphologic al Information Lemma Sleeps Third person singular, present tense  Sleep Sleep Bowling Ing form of the verb Bowl
  • 9. Topic Stemming Lemmatization Goal Reduce inflectional forms (Stemming chops off the ends of the words in order to achieve the goal correctly) Reduce inflectional forms (Lemmatization refers to do things properly with the help of a vocabulary and morphological analysis of words) Implementation Stemmers are easier to implement and run faster when compared to Lemmatization Lemmatization is slightly difficult to implement
  • 10.  Stop words are frequent words that appear in sentences and give the sentence more emphasis.  Stop words serve as a transitional element and guarantee proper grammar.  A stop word is, in essence, a word that is filtered out before to processing natural language data.  This pre-processing technique is widely used.
  • 11.
  • 12.  The preprocessing of the text or documentation uses the Bag o f Words model.  It turns the documents into a collection of words and keeps tra ck of how many times the most common words appear overall.  One of the most popular ways to turn tokens into a set of chara cteristics is the bag-of-words technique.
  • 13.  Term Frequency and Inverse Document Frequency is referred to as TF-IDF.  This aids in calculating the score needed to obtain information retrieval (I R) or summary.  The TFIDF can also be used to determine how pertinent a term is in a parti cular document.  How to compound two measures to determine the TF-IDF:  How frequently a word appears in a document, as well as its inverse docu ment frequency throughout a collection of documents
  • 14.  A word's significance within the context of the document corpus can be determi ned with the use of TF-IDF.  When calculating TFIDF, the number of times a word appears in a document is t aken into account, offset by the number of documents included in the corpus.  TF is calculated by dividing the number of terms in the document by their freq uency.  DF is calculated by taking the logarithm of the number of documents divided by the number of documents containing the phrase.
  • 15.  Word Embeddings vectors are one of the most common way to encode words as vectors of numbers those vectors can ben fed in into the Machine Learning models for inference and also it helps to establish the distance between two tokens Types:- • Word2vec • Glove • fasttext
  • 16. NLP tasks like lemmatization, stemming, tokenization, noun phrase extraction, POS tagging, N-grams, and sentiment analysis are carried out using the open- source Python module Textblob. Although it is quicker than NLTK, it does not include functions like dependency par sing or vectorization. Textblob can be used for text classification and sentiment analysis.