Natural Language Processing
Day 1
 A natural language is just like a human language
which we use for communication like English, Hindi,
Gujarati, Bengali, Arabic ....etc.
 The one of the main aim of AI is to build a machine
that can understand commands written or spoken
in natural Language.
 It is not an easy task to teach a person or a
computer, a natural language. The problem is
Syntax ( rules governing the way in which the words
are arranged) & understanding context.
What is Natural Language ?
Difference between Natural language
and Computer Language
Natural Language Computer Language
Natural language has a
very large vocabulary.
Computer language has
a very limited
vocabulary.
Natural language is
easily understood by
humans.
Computer language is
easily understood by the
machines.
Natural language is
ambiguous in nature.
Computer language is
unambiguous.
 NLP stands for Natural Language Processing, which
is a part of Computer Science, Human
language, and Artificial Intelligence.
 It is the technology that is used by machines to
understand, analyze, manipulate, and interpret
human's languages.
 It helps developers to organize knowledge for
performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER),
speech recognition, relationship extraction, and topic
segmentation
What is Natural Language
Processing ?
(1940-1960) - Focused on Machine Translation (MT)
The Natural Languages Processing started in the year
1940s.
1948 - In the Year 1948, the first recognisable NLP
application was introduced in Birkbeck College, London.
1950s - In the Year 1950s, there was a conflicting view
between linguistics and computer science. Now,
Chomsky developed his first book syntactic structures
and claimed that language is generative in nature.
In 1957, Chomsky also introduced the idea of
Generative Grammar, which is rule based descriptions
of syntactic structures.
History of NLP
(1960-1980) - Flavored with Artificial Intelligence (AI)
In the year 1960 to 1980, the key developments were:
 Augmented Transition Networks (ATN)
Augmented Transition Networks is a finite state machine
that is capable of recognizing regular languages.
 Case Grammar
Case Grammar was developed by Linguist Charles J.
Fillmore in the year 1968. Case Grammar uses
languages such as English to express the relationship
between nouns and verbs by using the preposition.
History of NLP
In Case Grammar, case roles can be defined to link
certain kinds of verbs and objects.
For example:
"Neha broke the mirror with the hammer". In this
example case grammar identify Neha as an agent,
mirror as a theme, and hammer as an instrument.
In the year 1960 to 1980, key systems were:
 SHRDLU
 LUNAR
History of NLP
SHRDLU
 SHRDLU is a program written by Terry Winograd in
1968-70.
 It helps users to communicate with the computer and
moving objects.
 It can handle instructions such as "pick up the green
boll" and also answer the questions like "What is inside
the black box.“
 The main importance of SHRDLU is that it shows those
syntax, semantics, and reasoning about the world that
can be combined to produce a system that
understands a natural language.
History of NLP
LUNAR
 LUNAR is the classic example of a Natural Language
database interface system that is used ATNs and
Woods' Procedural Semantics.
 It was capable of translating elaborate natural
language expressions into database queries and
handle 78% of requests without errors.
History of NLP
1980 - Current
 Till the year 1980, natural language processing
systems were based on complex sets of hand-written
rules. After 1980, NLP introduced machine learning
algorithms for language processing.
 In the beginning of the year 1990s, NLP started
growing faster and achieved good process accuracy,
especially in English Grammar
 In 1990 also, an electronic text introduced, which
provided a good resource for training and examining
natural language programs.
History of NLP
 Other factors may include the availability of computers
with fast CPUs and more memory. The major factor
behind the advancement of natural language
processing was the Internet.
 Now, modern NLP consists of various applications,
like speech recognition, machine
translation, and machine text reading. When we
combine all these applications then it allows the
artificial intelligence to gain knowledge of the world.
 Let's consider the example of AMAZON ALEXA, using
this robot you can ask the question to Alexa, and it will
reply to you.
History of NLP
Advantages of NLP
 Users can ask questions about any subject and
get a direct response within seconds.
 NLP system provides answers to the questions in
natural language
 NLP system offers exact answers to the questions,
no unnecessary or unwanted information
 The accuracy of the answers increases with the
amount of relevant information provided in the
question.
Advantages of NLP
 NLP process helps computers communicate with
humans in their language and scales other language-
related tasks.
 Allows you to perform more language-based data
compares to a human being without fatigue and in an
unbiased and consistent way.
 Structuring a highly unstructured data source
Disadvantages of NLP
A list of disadvantages of NLP is given below:
NLP may not show context.
NLP is unpredictable
NLP may require more keystrokes.
NLP is unable to adapt to the new domain, and it
has a limited function that's why NLP is built for a
single and specific task only.
Components of NLP
There are two components of NLP as given −
Natural Language Understanding (NLU)
Natural Language Generation (NLG)
Natural Language Understanding (NLU)
Understanding involves the following tasks −
Mapping the given input in natural language
into useful representations.
Analyzing different aspects of the language.
Components of NLP
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and
sentences in the form of natural language from some
internal representation.
It involves −
Text planning − It includes retrieving the relevant
content from knowledge base.
Sentence planning − It includes choosing required
words, forming meaningful phrases, setting tone of
the sentence.
Text Realization − It is mapping sentence plan into
sentence structure.
Applications of NLP
There are the following applications of NLP -
 Question Answering
Question Answering focuses on building systems that
automatically answer the questions asked by humans in
a natural language.
Applications of NLP
 Spam Detection
Spam detection is used to detect unwanted e-mails
getting to a user's inbox.
Applications of NLP
 Sentiment Analysis
Sentiment Analysis is also known as opinion mining. It is
used on the web to analyze the attitude, behavior, and
emotional state of the sender.
 This application is implemented through a combination of
NLP (Natural Language Processing) and statistics by
assigning the values to the text (positive, negative, or
natural), identify the mood of the context (happy, sad,
angry, etc.)
Applications of NLP
 Machine Translation
Machine translation is used to translate text or speech
from one natural language to another natural language.
Applications of NLP
 Spelling correction
Microsoft Corporation provides word processor software
like MS-word, PowerPoint for the spelling correction.
Applications of NLP
 Speech Recognition
Speech recognition is used for converting spoken
words into text. It is used in applications, such as
mobile, home automation, video recovery, dictating
to Microsoft Word, voice biometrics, voice user
interface, and so on.
 Chatbot
Implementing the Chatbot is one of the important
applications of NLP. It is used by many companies
to provide the customer's chat services.
Applications of NLP
 Automatic summarization: Natural language
processing can be used to produce a readable
summary from a large chunk of text. For example,
one might us automatic summarization to produce a
short summary of a dense academic article.
Phases of NLP
There are the following five phases of NLP:
Phases of NLP
 Lexical Analysis and Morphological
The first phase of NLP is the Lexical Analysis. This
phase scans the source code as a stream of
characters and converts it into meaningful lexemes. It
divides the whole text into paragraphs, sentences,
and words.
 Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word
arrangements, and shows the relationship among the
words.
 Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not
make any sense, so this sentence is rejected by the
Syntactic analyzer.
Phases of NLP
 Semantic Analysis
Semantic analysis is concerned with the meaning
representation. It mainly focuses on the literal meaning
of words, phrases, and sentences.
 Discourse Integration
Discourse Integration depends upon the sentences that
proceeds it and also invokes the meaning of the
sentences that follow it.
Phases of NLP
 Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps
you to discover the intended effect by applying a set
of rules that characterize cooperative dialogues.
Example:
"Open the door" is interpreted as a request instead of
an order.
Why NLP is difficult?
NLP is difficult because Ambiguity and Uncertainty
exist in the language.
Ambiguity
There are the following three ambiguity -
Lexical Ambiguity
Lexical Ambiguity exists in the presence of two or
more possible meanings of the sentence within a
single word.
Example:
Manya is looking for a match.
In the above example, the word match refers to that
either Manya is looking for a partner or Manya is
looking for a match. (Cricket or other match)
Why NLP is difficult?
 Syntactic Ambiguity
Syntactic Ambiguity exists in the presence of two or
more possible meanings within the sentence.
Example:
I saw the girl with the binocular.
In the above example, did I have the binoculars? Or
did the girl have the binoculars?
Why NLP is difficult?
Referential Ambiguity
Referential Ambiguity exists when you are referring to
something using the pronoun.
Example:
Kiran went to Sunita. She said, "I am hungry."
In the above sentence, you do not know that who is
hungry, either Kiran or Sunita.
Future of NLP
 Human readable natural language processing is the
biggest Al- problem. It is all most same as solving the
central artificial intelligence problem and making
computers as intelligent as people.
 Future computers or machines with the help of NLP will
able to learn from the information online and apply that
in the real world, however, lots of work need to on this
regard.
 Naturla language toolkit or nltk become more effective
 Combined with natural language generation, computers
will become more capable of receiving and giving useful
and resourceful information or data
Why use Python for NLP
There are many things about Python that make it a
really good programming language choice for an NLP
project.
 The simple syntax and transparent semantics of
this language make it an excellent choice for
projects that include Natural Language Processing
tasks.
 Moreover, developers can enjoy excellent support
for integration with other languages and tools that
come in handy for techniques like machine
learning.
Why use Python for NLP
 It provides developers with an extensive collection of
NLP tools and libraries that enable developers to
handle a great number of NLP-related tasks such as
document classification, topic modeling, part-of-
speech (POS) tagging, word vectors, and sentiment
analysis.
NLP Libraries
 Scikit-learn: It provides a wide range of algorithms for
building machine learning models in Python.
 Natural language Toolkit (NLTK): NLTK is a complete
toolkit for all NLP techniques.
 Pattern: It is a web mining module for NLP and machine
learning.
 TextBlob: It provides an easy interface to learn basic
NLP tasks like sentiment analysis, noun phrase
extraction, or pos-tagging.
NLP Libraries
 Quepy:
Quepy is used to transform natural language
questions into queries in a database query
language.
 SpaCy:
SpaCy is an open-source NLP library which is used
for Data Extraction, Data Analysis, Sentiment
Analysis, and Text Summarization.
 Gensim:
Gensim works with large datasets and processes data
streams.
How to Download all packages of
NLTK
 NLTK can be easily installed using a “pip” installer.
 To install NLTK, open command prompt and type below
command.
 Make sure the installation is successful.
 After successful installation now its time to use the NLTK
for Natural Language Processing.
How to Download all packages of
NLTK
 Open Python Shell and type below command.
If it is imported without any error that means
NLTK is installed properly.
How to Download all packages of NLTK
In the Anaconda prompt,
Enter command
conda install -c anaconda nltk
NLTK Dataset
 NLTK has many datasets available for Natural
language processing, for example, WordNet,
WikiCorpus, Gutenberg, Opinion Lexicon,
Tweebank, etc.
 These datasets are called corpora. Basically, the
NLTK dataset contains a set of files or documents.
 Every file/ document contains a collection of words,
letters or text in a single language. Thus, a corpus
is mainly libraries for understanding/learning a
language.
 It has rules of grammar and structure of a
language.
NLTK Dataset
 After successfully installing NLTK, you can import it and
also download its corpora with the following command.
NLTK Dataset
 NLTK downloader opens a window to download the
datasets. The size of the dataset is big hence it will take
time. To test if datasets are installed properly, try
importing the dataset and use it.

NLP Introduction for engineering stuedents.pptx

  • 1.
  • 2.
     A naturallanguage is just like a human language which we use for communication like English, Hindi, Gujarati, Bengali, Arabic ....etc.  The one of the main aim of AI is to build a machine that can understand commands written or spoken in natural Language.  It is not an easy task to teach a person or a computer, a natural language. The problem is Syntax ( rules governing the way in which the words are arranged) & understanding context. What is Natural Language ?
  • 3.
    Difference between Naturallanguage and Computer Language Natural Language Computer Language Natural language has a very large vocabulary. Computer language has a very limited vocabulary. Natural language is easily understood by humans. Computer language is easily understood by the machines. Natural language is ambiguous in nature. Computer language is unambiguous.
  • 4.
     NLP standsfor Natural Language Processing, which is a part of Computer Science, Human language, and Artificial Intelligence.  It is the technology that is used by machines to understand, analyze, manipulate, and interpret human's languages.  It helps developers to organize knowledge for performing tasks such as translation, automatic summarization, Named Entity Recognition (NER), speech recognition, relationship extraction, and topic segmentation What is Natural Language Processing ?
  • 5.
    (1940-1960) - Focusedon Machine Translation (MT) The Natural Languages Processing started in the year 1940s. 1948 - In the Year 1948, the first recognisable NLP application was introduced in Birkbeck College, London. 1950s - In the Year 1950s, there was a conflicting view between linguistics and computer science. Now, Chomsky developed his first book syntactic structures and claimed that language is generative in nature. In 1957, Chomsky also introduced the idea of Generative Grammar, which is rule based descriptions of syntactic structures. History of NLP
  • 6.
    (1960-1980) - Flavoredwith Artificial Intelligence (AI) In the year 1960 to 1980, the key developments were:  Augmented Transition Networks (ATN) Augmented Transition Networks is a finite state machine that is capable of recognizing regular languages.  Case Grammar Case Grammar was developed by Linguist Charles J. Fillmore in the year 1968. Case Grammar uses languages such as English to express the relationship between nouns and verbs by using the preposition. History of NLP
  • 7.
    In Case Grammar,case roles can be defined to link certain kinds of verbs and objects. For example: "Neha broke the mirror with the hammer". In this example case grammar identify Neha as an agent, mirror as a theme, and hammer as an instrument. In the year 1960 to 1980, key systems were:  SHRDLU  LUNAR History of NLP
  • 8.
    SHRDLU  SHRDLU isa program written by Terry Winograd in 1968-70.  It helps users to communicate with the computer and moving objects.  It can handle instructions such as "pick up the green boll" and also answer the questions like "What is inside the black box.“  The main importance of SHRDLU is that it shows those syntax, semantics, and reasoning about the world that can be combined to produce a system that understands a natural language. History of NLP
  • 9.
    LUNAR  LUNAR isthe classic example of a Natural Language database interface system that is used ATNs and Woods' Procedural Semantics.  It was capable of translating elaborate natural language expressions into database queries and handle 78% of requests without errors. History of NLP
  • 10.
    1980 - Current Till the year 1980, natural language processing systems were based on complex sets of hand-written rules. After 1980, NLP introduced machine learning algorithms for language processing.  In the beginning of the year 1990s, NLP started growing faster and achieved good process accuracy, especially in English Grammar  In 1990 also, an electronic text introduced, which provided a good resource for training and examining natural language programs. History of NLP
  • 11.
     Other factorsmay include the availability of computers with fast CPUs and more memory. The major factor behind the advancement of natural language processing was the Internet.  Now, modern NLP consists of various applications, like speech recognition, machine translation, and machine text reading. When we combine all these applications then it allows the artificial intelligence to gain knowledge of the world.  Let's consider the example of AMAZON ALEXA, using this robot you can ask the question to Alexa, and it will reply to you. History of NLP
  • 12.
    Advantages of NLP Users can ask questions about any subject and get a direct response within seconds.  NLP system provides answers to the questions in natural language  NLP system offers exact answers to the questions, no unnecessary or unwanted information  The accuracy of the answers increases with the amount of relevant information provided in the question.
  • 13.
    Advantages of NLP NLP process helps computers communicate with humans in their language and scales other language- related tasks.  Allows you to perform more language-based data compares to a human being without fatigue and in an unbiased and consistent way.  Structuring a highly unstructured data source
  • 14.
    Disadvantages of NLP Alist of disadvantages of NLP is given below: NLP may not show context. NLP is unpredictable NLP may require more keystrokes. NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is built for a single and specific task only.
  • 15.
    Components of NLP Thereare two components of NLP as given − Natural Language Understanding (NLU) Natural Language Generation (NLG) Natural Language Understanding (NLU) Understanding involves the following tasks − Mapping the given input in natural language into useful representations. Analyzing different aspects of the language.
  • 16.
    Components of NLP NaturalLanguage Generation (NLG) It is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation. It involves − Text planning − It includes retrieving the relevant content from knowledge base. Sentence planning − It includes choosing required words, forming meaningful phrases, setting tone of the sentence. Text Realization − It is mapping sentence plan into sentence structure.
  • 17.
    Applications of NLP Thereare the following applications of NLP -  Question Answering Question Answering focuses on building systems that automatically answer the questions asked by humans in a natural language.
  • 18.
    Applications of NLP Spam Detection Spam detection is used to detect unwanted e-mails getting to a user's inbox.
  • 19.
    Applications of NLP Sentiment Analysis Sentiment Analysis is also known as opinion mining. It is used on the web to analyze the attitude, behavior, and emotional state of the sender.  This application is implemented through a combination of NLP (Natural Language Processing) and statistics by assigning the values to the text (positive, negative, or natural), identify the mood of the context (happy, sad, angry, etc.)
  • 20.
    Applications of NLP Machine Translation Machine translation is used to translate text or speech from one natural language to another natural language.
  • 21.
    Applications of NLP Spelling correction Microsoft Corporation provides word processor software like MS-word, PowerPoint for the spelling correction.
  • 22.
    Applications of NLP Speech Recognition Speech recognition is used for converting spoken words into text. It is used in applications, such as mobile, home automation, video recovery, dictating to Microsoft Word, voice biometrics, voice user interface, and so on.  Chatbot Implementing the Chatbot is one of the important applications of NLP. It is used by many companies to provide the customer's chat services.
  • 23.
    Applications of NLP Automatic summarization: Natural language processing can be used to produce a readable summary from a large chunk of text. For example, one might us automatic summarization to produce a short summary of a dense academic article.
  • 24.
    Phases of NLP Thereare the following five phases of NLP:
  • 25.
    Phases of NLP Lexical Analysis and Morphological The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. It divides the whole text into paragraphs, sentences, and words.  Syntactic Analysis (Parsing) Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the words.  Example: Agra goes to the Poonam In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the Syntactic analyzer.
  • 26.
    Phases of NLP Semantic Analysis Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal meaning of words, phrases, and sentences.  Discourse Integration Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the sentences that follow it.
  • 27.
    Phases of NLP Pragmatic Analysis Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a set of rules that characterize cooperative dialogues. Example: "Open the door" is interpreted as a request instead of an order.
  • 28.
    Why NLP isdifficult? NLP is difficult because Ambiguity and Uncertainty exist in the language. Ambiguity There are the following three ambiguity - Lexical Ambiguity Lexical Ambiguity exists in the presence of two or more possible meanings of the sentence within a single word. Example: Manya is looking for a match. In the above example, the word match refers to that either Manya is looking for a partner or Manya is looking for a match. (Cricket or other match)
  • 29.
    Why NLP isdifficult?  Syntactic Ambiguity Syntactic Ambiguity exists in the presence of two or more possible meanings within the sentence. Example: I saw the girl with the binocular. In the above example, did I have the binoculars? Or did the girl have the binoculars?
  • 30.
    Why NLP isdifficult? Referential Ambiguity Referential Ambiguity exists when you are referring to something using the pronoun. Example: Kiran went to Sunita. She said, "I am hungry." In the above sentence, you do not know that who is hungry, either Kiran or Sunita.
  • 31.
    Future of NLP Human readable natural language processing is the biggest Al- problem. It is all most same as solving the central artificial intelligence problem and making computers as intelligent as people.  Future computers or machines with the help of NLP will able to learn from the information online and apply that in the real world, however, lots of work need to on this regard.  Naturla language toolkit or nltk become more effective  Combined with natural language generation, computers will become more capable of receiving and giving useful and resourceful information or data
  • 32.
    Why use Pythonfor NLP There are many things about Python that make it a really good programming language choice for an NLP project.  The simple syntax and transparent semantics of this language make it an excellent choice for projects that include Natural Language Processing tasks.  Moreover, developers can enjoy excellent support for integration with other languages and tools that come in handy for techniques like machine learning.
  • 33.
    Why use Pythonfor NLP  It provides developers with an extensive collection of NLP tools and libraries that enable developers to handle a great number of NLP-related tasks such as document classification, topic modeling, part-of- speech (POS) tagging, word vectors, and sentiment analysis.
  • 34.
    NLP Libraries  Scikit-learn:It provides a wide range of algorithms for building machine learning models in Python.  Natural language Toolkit (NLTK): NLTK is a complete toolkit for all NLP techniques.  Pattern: It is a web mining module for NLP and machine learning.  TextBlob: It provides an easy interface to learn basic NLP tasks like sentiment analysis, noun phrase extraction, or pos-tagging.
  • 35.
    NLP Libraries  Quepy: Quepyis used to transform natural language questions into queries in a database query language.  SpaCy: SpaCy is an open-source NLP library which is used for Data Extraction, Data Analysis, Sentiment Analysis, and Text Summarization.  Gensim: Gensim works with large datasets and processes data streams.
  • 36.
    How to Downloadall packages of NLTK  NLTK can be easily installed using a “pip” installer.  To install NLTK, open command prompt and type below command.  Make sure the installation is successful.  After successful installation now its time to use the NLTK for Natural Language Processing.
  • 37.
    How to Downloadall packages of NLTK  Open Python Shell and type below command. If it is imported without any error that means NLTK is installed properly.
  • 38.
    How to Downloadall packages of NLTK In the Anaconda prompt, Enter command conda install -c anaconda nltk
  • 39.
    NLTK Dataset  NLTKhas many datasets available for Natural language processing, for example, WordNet, WikiCorpus, Gutenberg, Opinion Lexicon, Tweebank, etc.  These datasets are called corpora. Basically, the NLTK dataset contains a set of files or documents.  Every file/ document contains a collection of words, letters or text in a single language. Thus, a corpus is mainly libraries for understanding/learning a language.  It has rules of grammar and structure of a language.
  • 40.
    NLTK Dataset  Aftersuccessfully installing NLTK, you can import it and also download its corpora with the following command.
  • 41.
    NLTK Dataset  NLTKdownloader opens a window to download the datasets. The size of the dataset is big hence it will take time. To test if datasets are installed properly, try importing the dataset and use it.