Natural Language
Processing
Madan Kartheesan
Technical Leader
Object Automation Software Solutions
NLP • NLP is a part of computer science and
artificial intelligence which deals with
human languages.
Data Science
• 1) Programming
• 2) Maths and Statistics
• 3) Communication
NLP
Libraries for NLP in
Python—pandas, sklearn, re,
nltk, gensim, TextBlob
EDA—Corpus,
document-term matrix, word
counts
Use cases—sentiment
analysis, topic modeling, text
generation
• Natural Language (English Language, Tamil
Language)
• Processing (How a computer carries out
instructions)
• How to deal with text data?
What is
natural
language
processing?
•Natural language processing strives to build
machines that understand and respond to
text or voice data—and respond with text or
speech of their own—in much the same way
humans do.
NLP combines computational linguistics—rule-based modeling of
human language—with statistical, machine learning, and deep
learning models.
Together, these technologies enable computers to process human
language in the form of text or voice data and to ‘understand’ its full
meaning, complete with the speaker or writer’s intent and sentiment.
NLP tasks
•Speech recognition, also called speech-to-text, is the
task of reliably converting voice data into text data.
Speech recognition is required for any application that
follows voice commands or answers spoken questions.
What makes speech recognition especially challenging
is the way people talk—quickly, slurring words
together, with varying emphasis and intonation, in
different accents, and often using incorrect grammar.
Part of speech
tagging
•The Part of speech tagging, also
called grammatical tagging, is the
process of determining the part
of speech of a particular word or
piece of text based on its use and
context. Part of speech identifies
‘make’ as a verb in ‘I can make a
paper plane,’ and as a noun in
‘What make of car do you own?’
Word sense disambiguation
Word sense disambiguation is the selection of the meaning of a word with multiple meanings through a process
of semantic analysis that determine the word that makes the most sense in the given context. For example, word
sense disambiguation helps distinguish the meaning of the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a
bet’ (place).
I saw a bear
I cannot bear the pain.
Named
entity
recognition
Named entity
recognition, or NEM,
identifies words or
phrases as useful
entities. NEM
identifies ‘Kentucky’
as a location or ‘Fred’
as a man's name.
Madan lives in
Chennai.
Sentiment
analysis
• Sentiment analysis attempts to
extract subjective
qualities—attitudes, emotions,
sarcasm, confusion,
suspicion—from text.
Natural
language
generation
• Natural language generation is
sometimes described as the
opposite of speech recognition or
speech-to-text; it's the task of
putting structured information
into human language.
What is natural
language
understanding?
• Natural language understanding is a subset of natural
language processing, which uses syntactic and semantic
analysis of text and speech to determine the meaning of
a sentence. Syntax refers to the grammatical structure of
a sentence, while semantics alludes to its intended
meaning. NLU also establishes a relevant ontology: a
data structure which specifies the relationships between
words and phrases. While humans naturally do this in
conversation, the combination of these analyses is
required for a machine to understand the intended
meaning of different texts.
•
Our ability to distinguish between homonyms and
homophones illustrates the nuances of language well.
For example, let’s take the following two sentences:
Alice is swimming against the current.
The current version of the report is in the folder.
In the first sentence, the word, current is a noun. The verb that precedes it, swimming, provides additional
context to the reader, allowing us to conclude that we are referring to the flow of water in the ocean. The second
sentence uses the word current, but as an adjective. The noun it describes, version, denotes multiple iterations of
a report, enabling us to determine that we are referring to the most up-to-date status of a file.
What is
natural
language
generation?
•Natural language generation is another
subset of natural language processing. While
natural language understanding focuses on
computer reading comprehension, natural
language generation enables computers to
write. NLG is the process of producing a
human language text response based on
some data input. This text can also be
converted into a speech format through
text-to-speech services.
What is
natural
language
processing?
• Natural language processing, which evolved from
computational linguistics, uses methods from various
disciplines, such as computer science, artificial intelligence,
linguistics, and data science, to enable computers to
understand human language in both written and verbal forms.
While computational linguistics has more of a focus on aspects
of language, natural language processing emphasizes its use of
machine learning and deep learning techniques to complete
tasks, like language translation or question answering. Natural
language processing works by taking unstructured data and
converting it into a structured data format. It does this through
the identification of named entities (a process called named
entity recognition) and identification of word patterns, using
methods like tokenization, stemming, and lemmatization,
which examine the root forms of words. For example, the
suffix -ed on a word, like called, indicates past tense, but it has
the same base infinitive (to call) as the present tense verb
calling.
Basic steps in NLP
1)
TOKENISATION
2) STEMMING 3)
LEMMATIZATION
4) STOPWORDS
Data cleaning
He is eating an apple. {"He","is","eating","an",
"apple"}—Tokenisation
"eating" to
"eat"—Stemming and
Lemmatization
Removes "an" from the
sentence—Stopwords
Document
Term Matrix
corpus = ['This is the first document.', 'This
document is the second document.', 'And this is
the third one.', 'Is this the first document?']
['and', 'document', 'first', 'is',
'one', 'second', 'the', 'third',
'this']
[[0 1 1 1 0 0 1 0 1]
[0 2 0 1 0 1 1 0 1]
[1 0 0 1 1 0 1 1 1]
[0 1 1 1 0 0 1 0 1]]
Vector to
sequence
models
Sequence
to vector
models
"I love to eat" Positive sentiment
[0.10] bad [0.90] good
Sequence to sequence
models
•Language translation

Natural Language Processing from Object Automation

  • 2.
    Natural Language Processing Madan Kartheesan TechnicalLeader Object Automation Software Solutions
  • 3.
    NLP • NLPis a part of computer science and artificial intelligence which deals with human languages.
  • 4.
    Data Science • 1)Programming • 2) Maths and Statistics • 3) Communication
  • 5.
    NLP Libraries for NLPin Python—pandas, sklearn, re, nltk, gensim, TextBlob EDA—Corpus, document-term matrix, word counts Use cases—sentiment analysis, topic modeling, text generation
  • 6.
    • Natural Language(English Language, Tamil Language) • Processing (How a computer carries out instructions) • How to deal with text data?
  • 7.
    What is natural language processing? •Natural languageprocessing strives to build machines that understand and respond to text or voice data—and respond with text or speech of their own—in much the same way humans do.
  • 8.
    NLP combines computationallinguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
  • 9.
    NLP tasks •Speech recognition,also called speech-to-text, is the task of reliably converting voice data into text data. Speech recognition is required for any application that follows voice commands or answers spoken questions. What makes speech recognition especially challenging is the way people talk—quickly, slurring words together, with varying emphasis and intonation, in different accents, and often using incorrect grammar.
  • 10.
    Part of speech tagging •ThePart of speech tagging, also called grammatical tagging, is the process of determining the part of speech of a particular word or piece of text based on its use and context. Part of speech identifies ‘make’ as a verb in ‘I can make a paper plane,’ and as a noun in ‘What make of car do you own?’
  • 11.
    Word sense disambiguation Wordsense disambiguation is the selection of the meaning of a word with multiple meanings through a process of semantic analysis that determine the word that makes the most sense in the given context. For example, word sense disambiguation helps distinguish the meaning of the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place). I saw a bear I cannot bear the pain.
  • 12.
    Named entity recognition Named entity recognition, orNEM, identifies words or phrases as useful entities. NEM identifies ‘Kentucky’ as a location or ‘Fred’ as a man's name. Madan lives in Chennai.
  • 13.
    Sentiment analysis • Sentiment analysisattempts to extract subjective qualities—attitudes, emotions, sarcasm, confusion, suspicion—from text.
  • 14.
    Natural language generation • Natural languagegeneration is sometimes described as the opposite of speech recognition or speech-to-text; it's the task of putting structured information into human language.
  • 15.
    What is natural language understanding? •Natural language understanding is a subset of natural language processing, which uses syntactic and semantic analysis of text and speech to determine the meaning of a sentence. Syntax refers to the grammatical structure of a sentence, while semantics alludes to its intended meaning. NLU also establishes a relevant ontology: a data structure which specifies the relationships between words and phrases. While humans naturally do this in conversation, the combination of these analyses is required for a machine to understand the intended meaning of different texts. • Our ability to distinguish between homonyms and homophones illustrates the nuances of language well. For example, let’s take the following two sentences:
  • 16.
    Alice is swimmingagainst the current. The current version of the report is in the folder. In the first sentence, the word, current is a noun. The verb that precedes it, swimming, provides additional context to the reader, allowing us to conclude that we are referring to the flow of water in the ocean. The second sentence uses the word current, but as an adjective. The noun it describes, version, denotes multiple iterations of a report, enabling us to determine that we are referring to the most up-to-date status of a file.
  • 17.
    What is natural language generation? •Natural languagegeneration is another subset of natural language processing. While natural language understanding focuses on computer reading comprehension, natural language generation enables computers to write. NLG is the process of producing a human language text response based on some data input. This text can also be converted into a speech format through text-to-speech services.
  • 18.
    What is natural language processing? • Naturallanguage processing, which evolved from computational linguistics, uses methods from various disciplines, such as computer science, artificial intelligence, linguistics, and data science, to enable computers to understand human language in both written and verbal forms. While computational linguistics has more of a focus on aspects of language, natural language processing emphasizes its use of machine learning and deep learning techniques to complete tasks, like language translation or question answering. Natural language processing works by taking unstructured data and converting it into a structured data format. It does this through the identification of named entities (a process called named entity recognition) and identification of word patterns, using methods like tokenization, stemming, and lemmatization, which examine the root forms of words. For example, the suffix -ed on a word, like called, indicates past tense, but it has the same base infinitive (to call) as the present tense verb calling.
  • 19.
    Basic steps inNLP 1) TOKENISATION 2) STEMMING 3) LEMMATIZATION 4) STOPWORDS
  • 20.
    Data cleaning He iseating an apple. {"He","is","eating","an", "apple"}—Tokenisation "eating" to "eat"—Stemming and Lemmatization Removes "an" from the sentence—Stopwords
  • 21.
    Document Term Matrix corpus =['This is the first document.', 'This document is the second document.', 'And this is the third one.', 'Is this the first document?'] ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'] [[0 1 1 1 0 0 1 0 1] [0 2 0 1 0 1 1 0 1] [1 0 0 1 1 0 1 1 1] [0 1 1 1 0 0 1 0 1]]
  • 22.
  • 23.
    Sequence to vector models "I loveto eat" Positive sentiment [0.10] bad [0.90] good
  • 24.