Natural Language Processing
• Language is means of Communication for humans. By studying language,
we come to understand more about the world.
• If we can succeed at building computational mode of language, we will have
a powerful tool for communicating about the world.
• We look at how we can exploit knowledge about the world, in combination
with linguistic facts, to build computational natural language systems.
• Natural Language Processing (NLP) is the process of computer analysis of
input provided in a human language (natural language), and conversion of
this input into a useful form of representation.
• NLP is one of field of AI that processes or analyzes written or spoken
language.
• NLP involve processing of speech, grammar and meaning.
• NLP is composed of two part: NLU (Natural Language Understanding) and
NLG (Natural Language generation).
• Understanding any language requires detailed knowledge of that language.
Components of NLP
• Natural Language Understanding (ambiguity is major problem)
– Lexical(word level):
Which words to choose.
– Syntactic(sentence level/parsing)
Example: “Call me a cab.” Sentence has multiple meanings for compiler.
– Referential ambiguity
Example: “Meena went to Geeta. She says that she is hungry.” Here, “she”
can refer to either Meena or Geeta in compiler.
• Natural Language Generation
– Text planning
Knowledge base is used to choose appropriate words for NLG.
– Sentence planning
It includes choosing required words, forming meaningful phrases, setting
tone of the sentence. It Arrange words in proper meaningful way.
– Text realization
It is used to make a structure of sentences and display as output.
NLP Steps/ Processes & Steps
Input/Source
• The input of a NLP system can be written text
or speech.
• Quality of input decides the possible errors in
language processing that is high quality input
leads to correct language understanding.
Segmentation/Lexical analysis
• The text inputs are divided in segments
(Chunks) and the segments are analyzed.
• Each such chunk is called frames.
Syntactic Analysis
• Syntactic analysis takes an input sentence and produces
a representation of its grammatical structure.
• A grammar describes the valid parts of speech of a
language and how to combine them into phrases.
• The grammar of English is nearly context free.
• Grammar: A computer grammar specifies which
sentences are in a language and their parse trees. A
parse tree is a hierarchical structure that shows how the
grammar applies to the input. Each level of the tree
corresponds to the application of one grammar rule.
Semantic Analysis
• Semantic analysis is a process of converting the syntactic representations into a
meaning representation.
• This involves the following tasks:
– Word sense determination
– Sentence level analysis
• Word sense: Words have different meanings in different contexts.
– Example: Mary had a bat in her office.
– bat = ‘a baseball thing’
– bat = ‘a flying mammal’
• Sentence Level Analysis:
– Once the words are understood, the sentence must be assigned some meaning
– Examples:
– She saw her duck.
– I saw a man with a telescope.
• Non-examples: Colorless green ideas sleep furiously - >This would be rejected
semantically as colorless green would make no sense
Pragmatic Analysis
• Pragmatics comprises aspects of meaning that
depend upon the context or upon facts about real
world.
• It deals with using and understanding sentences in
different situations and how the interpretation of the
sentence is affected.
• These aspects include:
– Pronouns and referring expressions
– Logical inferences, that can be drawn from the meanings
of a set of propositions.
– Discourse structure: the meaning of a collection of
sentences taken together.
Natural Language Analysis
Techniques
i. Keyword analysis or Pattern matching
• Accept the given sentence as input.
• Segment the sentence.
• Identify keywords in each segment.
• If keyword is present
– If only one keyword is present , give suitable reply as
of keyword
– If more the one keywords are present, prioritize them
and give suitable reply as of keyword
• If no keyword present in the segment , give a
random reply.
ii. Syntactic driven parsing technique
• Words can fit together higher level units such
as phrases, clauses and sentences.
• Interpretations of larger groups of words are
built up out of the interpretation of their
syntactic constituent words or phrases.
• Interpretation I/P done as a whole.
• Obtained by application of grammar that
determines what sentences are legal in the
language that is being parsed.
NLP Problems
1. The same expression means different things in different
context.
– Where’s the water? ( Chemistry lab? Must be pure)
– Where’s the water? ( Thirsty? Must be drinking water)
– Where’s the water? ( Leaky roof? It can be dirty)
2. No natural language program can be complete because of
new words, expression, and meaning can be generated quite
freely.
– I’ll fax it to you
3. There are lots of ways to say the same thing.
– Ram was born on October 11.
– Ram’s birthday is October 11.
4. Sentence and phrases might have hidden meanings
– “Out of sight, out of mind”-> “ invisible idiot”
– “The spirit was willing but the flesh was weak” - > “ the vodka was
good, but the meat was bad”
5. Problem due to syntax and semantics
6. Problem due to extensive use of pronouns.
(semantic issue)
– E.g.. Ravi went to the supermarket. He found his
favorite brand of coffee in rack. He paid for it and left.
– It denotes??
7. Use of grammatically incorrect sentence
– He rice eats. (syntax issue)
8. Use of conjunctions to avoid repetition of
phrases cause problem in NLP
– E.g.. Ram and Hari went to restaurant. While Ram had
a cup of coffee, Hari had tea.
– Hari had a cup of tea.
Formal Grammar
Most famous classification of grammars and
languages ​​introduced by Noam Chomsky is
divided into four classes:
• Recursively enumerable grammars -
recognizable by a Turing machine.
• Context-sensitive grammars - recognizable by
the linear bounded automaton.
• Context-free grammars - recognizable by the
pushdown automaton.
• Regular grammars - recognizable by the finite
state automaton.
Type – 0 grammar
• Type-0 grammars (unrestricted grammars) include
all formal grammars.
• They generate exactly all languages that can be
recognized by a Turing machine.
• These languages are also known as the recursively
enumerable languages.
• Note that this is different from the recursive
languages which can be decided by an always-
halting Turing machine.
• Class 0 grammars are too general to describe the
syntax of programming languages ​​and natural
languages​​.
Type – 1 Grammar
• Type-1 grammars generate the context-sensitive languages.
• These grammars have rules of the form αAβ→αγβ with A a
non-terminal and α, β and γ strings of terminals and non-
terminals.
• The strings α and β may be empty, but γ must be nonempty.
• The languages described by these grammars are exactly all
languages that can be recognized by a linear bounded
automaton.
• Example:
– AB → CDB
– AB → CdEB
– ABcd → abCDBcd
– B → b
Type – 2 Grammar
• Type-2 grammars generate the context-free
languages.
• These are defined by rules of the form A → γ with A a
non-terminal and γ a string of terminals and non-
terminals.
• These languages are exactly all languages that can be
recognized by a non-deterministic pushdown
automaton.
• This language is used in AI for Natural language
processing in field of machine learning.
• Context-free languages are the theoretical basis for the
syntax of most programming languages.
• Example:
– A → aBc
Type – 3 Grammar
• Type-3 grammars generate the regular languages.
• Such a grammar restricts its rules to a single non-terminal on the left-hand
side and a right-hand side consisting of a single terminal, possibly
followed (or preceded, but not both in the same grammar) by a single non-
terminal.
• The rule S → ε is also allowed here if S does not appear on the right side
of any rule. These languages are exactly all languages that can be decided
by a finite state automaton.
• Additionally, this family of formal languages can be obtained by regular
expressions.
• Regular languages are commonly used to define search patterns and the
lexical structure of programming languages.
• Example:
• A → ε
• A → a
• A → abc
• A → B
• A → abcB
Parsing
• The term parsing comes from Latin word “pars”
meaning part.
• Parsing or syntactic analysis is the process of
analyzing a string of symbols, either in natural
language or in computer languages according to
the rules of a formal grammar.
• So Parsing means determining the syntactic
structure of an expression
Example of parsing
Types of Parser
• The task of the parser is essentially to
determine if and how the input can be derived
from the start symbol of the grammar.
• This can be done in essentially two ways:
1. Top-down parser
2. Bottom-up parser
Top-down parsers
• Top-down parsing expands a parse tree from
the start symbol to the leaves.
• Always expand the leftmost non-terminal.
Bottom up parsing
• Start at the leaves and grow toward root
– And just as efficient
– Builds on ideas in top-down parsing
– Preferred method in practice
• Also called LR parsing
– L means that tokens are read left to right
– R means that it constructs a rightmost derivation-->
Bottom up parsing
Natural Language Processing
Natural Language Processing
Natural Language Processing
Natural Language Processing
Natural Language Processing
Natural Language Processing
Natural Language Processing

Natural Language Processing

  • 2.
    Natural Language Processing •Language is means of Communication for humans. By studying language, we come to understand more about the world. • If we can succeed at building computational mode of language, we will have a powerful tool for communicating about the world. • We look at how we can exploit knowledge about the world, in combination with linguistic facts, to build computational natural language systems. • Natural Language Processing (NLP) is the process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation. • NLP is one of field of AI that processes or analyzes written or spoken language. • NLP involve processing of speech, grammar and meaning. • NLP is composed of two part: NLU (Natural Language Understanding) and NLG (Natural Language generation). • Understanding any language requires detailed knowledge of that language.
  • 3.
    Components of NLP •Natural Language Understanding (ambiguity is major problem) – Lexical(word level): Which words to choose. – Syntactic(sentence level/parsing) Example: “Call me a cab.” Sentence has multiple meanings for compiler. – Referential ambiguity Example: “Meena went to Geeta. She says that she is hungry.” Here, “she” can refer to either Meena or Geeta in compiler. • Natural Language Generation – Text planning Knowledge base is used to choose appropriate words for NLG. – Sentence planning It includes choosing required words, forming meaningful phrases, setting tone of the sentence. It Arrange words in proper meaningful way. – Text realization It is used to make a structure of sentences and display as output.
  • 4.
  • 5.
    Input/Source • The inputof a NLP system can be written text or speech. • Quality of input decides the possible errors in language processing that is high quality input leads to correct language understanding.
  • 6.
    Segmentation/Lexical analysis • Thetext inputs are divided in segments (Chunks) and the segments are analyzed. • Each such chunk is called frames.
  • 7.
    Syntactic Analysis • Syntacticanalysis takes an input sentence and produces a representation of its grammatical structure. • A grammar describes the valid parts of speech of a language and how to combine them into phrases. • The grammar of English is nearly context free. • Grammar: A computer grammar specifies which sentences are in a language and their parse trees. A parse tree is a hierarchical structure that shows how the grammar applies to the input. Each level of the tree corresponds to the application of one grammar rule.
  • 8.
    Semantic Analysis • Semanticanalysis is a process of converting the syntactic representations into a meaning representation. • This involves the following tasks: – Word sense determination – Sentence level analysis • Word sense: Words have different meanings in different contexts. – Example: Mary had a bat in her office. – bat = ‘a baseball thing’ – bat = ‘a flying mammal’ • Sentence Level Analysis: – Once the words are understood, the sentence must be assigned some meaning – Examples: – She saw her duck. – I saw a man with a telescope. • Non-examples: Colorless green ideas sleep furiously - >This would be rejected semantically as colorless green would make no sense
  • 9.
    Pragmatic Analysis • Pragmaticscomprises aspects of meaning that depend upon the context or upon facts about real world. • It deals with using and understanding sentences in different situations and how the interpretation of the sentence is affected. • These aspects include: – Pronouns and referring expressions – Logical inferences, that can be drawn from the meanings of a set of propositions. – Discourse structure: the meaning of a collection of sentences taken together.
  • 10.
  • 11.
    i. Keyword analysisor Pattern matching • Accept the given sentence as input. • Segment the sentence. • Identify keywords in each segment. • If keyword is present – If only one keyword is present , give suitable reply as of keyword – If more the one keywords are present, prioritize them and give suitable reply as of keyword • If no keyword present in the segment , give a random reply.
  • 12.
    ii. Syntactic drivenparsing technique • Words can fit together higher level units such as phrases, clauses and sentences. • Interpretations of larger groups of words are built up out of the interpretation of their syntactic constituent words or phrases. • Interpretation I/P done as a whole. • Obtained by application of grammar that determines what sentences are legal in the language that is being parsed.
  • 13.
    NLP Problems 1. Thesame expression means different things in different context. – Where’s the water? ( Chemistry lab? Must be pure) – Where’s the water? ( Thirsty? Must be drinking water) – Where’s the water? ( Leaky roof? It can be dirty) 2. No natural language program can be complete because of new words, expression, and meaning can be generated quite freely. – I’ll fax it to you 3. There are lots of ways to say the same thing. – Ram was born on October 11. – Ram’s birthday is October 11. 4. Sentence and phrases might have hidden meanings – “Out of sight, out of mind”-> “ invisible idiot” – “The spirit was willing but the flesh was weak” - > “ the vodka was good, but the meat was bad”
  • 14.
    5. Problem dueto syntax and semantics 6. Problem due to extensive use of pronouns. (semantic issue) – E.g.. Ravi went to the supermarket. He found his favorite brand of coffee in rack. He paid for it and left. – It denotes?? 7. Use of grammatically incorrect sentence – He rice eats. (syntax issue) 8. Use of conjunctions to avoid repetition of phrases cause problem in NLP – E.g.. Ram and Hari went to restaurant. While Ram had a cup of coffee, Hari had tea. – Hari had a cup of tea.
  • 15.
    Formal Grammar Most famousclassification of grammars and languages ​​introduced by Noam Chomsky is divided into four classes: • Recursively enumerable grammars - recognizable by a Turing machine. • Context-sensitive grammars - recognizable by the linear bounded automaton. • Context-free grammars - recognizable by the pushdown automaton. • Regular grammars - recognizable by the finite state automaton.
  • 16.
    Type – 0grammar • Type-0 grammars (unrestricted grammars) include all formal grammars. • They generate exactly all languages that can be recognized by a Turing machine. • These languages are also known as the recursively enumerable languages. • Note that this is different from the recursive languages which can be decided by an always- halting Turing machine. • Class 0 grammars are too general to describe the syntax of programming languages ​​and natural languages​​.
  • 17.
    Type – 1Grammar • Type-1 grammars generate the context-sensitive languages. • These grammars have rules of the form αAβ→αγβ with A a non-terminal and α, β and γ strings of terminals and non- terminals. • The strings α and β may be empty, but γ must be nonempty. • The languages described by these grammars are exactly all languages that can be recognized by a linear bounded automaton. • Example: – AB → CDB – AB → CdEB – ABcd → abCDBcd – B → b
  • 18.
    Type – 2Grammar • Type-2 grammars generate the context-free languages. • These are defined by rules of the form A → γ with A a non-terminal and γ a string of terminals and non- terminals. • These languages are exactly all languages that can be recognized by a non-deterministic pushdown automaton. • This language is used in AI for Natural language processing in field of machine learning. • Context-free languages are the theoretical basis for the syntax of most programming languages. • Example: – A → aBc
  • 19.
    Type – 3Grammar • Type-3 grammars generate the regular languages. • Such a grammar restricts its rules to a single non-terminal on the left-hand side and a right-hand side consisting of a single terminal, possibly followed (or preceded, but not both in the same grammar) by a single non- terminal. • The rule S → ε is also allowed here if S does not appear on the right side of any rule. These languages are exactly all languages that can be decided by a finite state automaton. • Additionally, this family of formal languages can be obtained by regular expressions. • Regular languages are commonly used to define search patterns and the lexical structure of programming languages. • Example: • A → ε • A → a • A → abc • A → B • A → abcB
  • 20.
    Parsing • The termparsing comes from Latin word “pars” meaning part. • Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages according to the rules of a formal grammar. • So Parsing means determining the syntactic structure of an expression
  • 21.
  • 22.
    Types of Parser •The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. • This can be done in essentially two ways: 1. Top-down parser 2. Bottom-up parser
  • 23.
    Top-down parsers • Top-downparsing expands a parse tree from the start symbol to the leaves. • Always expand the leftmost non-terminal.
  • 29.
    Bottom up parsing •Start at the leaves and grow toward root – And just as efficient – Builds on ideas in top-down parsing – Preferred method in practice • Also called LR parsing – L means that tokens are read left to right – R means that it constructs a rightmost derivation-->
  • 30.