Natural Language Processing
DR.VMS
Sentiment analysis
 A technique used to interpret and classify emotions in subjective data. Sentiment
analysis is often performed on textual data to detect sentiment in emails, survey
responses, social media data, and beyond.
Text classification
 Text classification is the process of categorizing text into organized groups. By
using Natural Language Processing (NLP), text classifiers can automatically
analyze text and then assign a set of pre-defined tags or categories based on its
content.
NLP
 Identify, Analyze, Understand and Generate human languages
 Applying computational techniques to natural language
• Explain computational linguistic theories
• Apply artificial Intelligence into possible contexts and makes
•Apply all statistical and mathematical models in human language use and usage
NLP
 NLP is used to teach a machine how to read and understand human languages.
Trained machines ​can extract the relationships between words, identify the
entities in a sentence (i.e., entity-recognition), etc.
Tokenizing
Breaking up a stream of characters into words, punctuation marks, numbers and
other discrete items.
Parts of speech
 Noun -fish, book, house, pen, procrastination, language
 Proper noun -John, France, Barack, Goldsmiths, Python
 Verb- loves, hates, studies, sleeps, thinks, is, has
 Adjective -grumpy, sleepy, happy, bashful
 Adverb- slowly, quickly, now, here, there
 Pronoun- I, you, he, she, we, us, it, they
 Preposition- in, on, at, by, around, with, without
 Conjunction -and, but, or, unless
 Determiner -the, a, an, some, many, few, 100
Constituent structure
 (((the | a)(cat | dog))(John | Jack | Susan))(barked | slept)
 Sentence → Noun Phrase, Verb Phrase
 Noun Phrase → Determiner, Noun (Example: the, dog)
 Noun Phrase → Proper Noun (Example: Jack)
 Noun Phrase → Noun Phrase, Conj,
 Noun Phrase (Examples: Jack and Jill, the owl and the pussycat)
 Verb Phrase → Verb, Noun Phrase (Example: saw the rabbit)
 Verb Phrase → Verb, Preposition, Noun Phrase (Examples: went up the hill, sat
on the mat)
corpus
 corpus is a collection of data selected with a descriptive or applicative aim as its
purpose
 a corpus must possess a common set of fundamental properties, including
representativeness, a finite size and existing in electronic format.
The linguistic data consortium
 Founded in 1992 and based at the University of Pennsylvania in the United
States, this research and development center is financed primarily by the National
Science Foundation (NSF). Its main activities consist of collecting, distributing and
annotating linguistic resources which correspond to the needs of research centers
and American companies which work in the field of language technology. The
linguistic data consortium (LDC) owns an extensive catalog of written and spoken
corpora which covers a fairly large number of different languages.
LFG-GPSG
 In LFG one parses sentences and builds up functional structures, in GPSG
sentences are parsed and translated into formulas of intentional logic, hardly
anyone knows how to generate from f-structures or from logical formulas
LFG-Lexical Functional Grammar
 Two levels of structure
 C-structure (tree)
 F-structure (representation of grammatical functions)
 Mappings between C-structure and F-structure
Pronunciation
 phonology and phonetics which is concerned with pronunciation.
 Pronunciation of characters in isolation and combinations
 Regular and irregular pronunciation need considerations
 some words have the same pronunciation with different meanings such as "weak"
and "week". Computers cannot differentiate between the two words
Morphology
 structure of words in their written (graphemic) form and spoken (phonemic) form. It has
two forms namely inflection and derivation.
 Inflection:
 It is related to the grammatical function of words of the same part of speech;
 e. g. the paradigm of the verb play as:
 Play, plays, played, playing
 Derivation:
 It is related to the production of new words of different parts of speech;
 e. g. nation - (a noun )
 national- (an adjective )
 nationalize- ( a verb )
Morphological Analyzer
 A morphological analyzer can extract the base forms from inserted documents in
computers.
 The applications which are achieved in this respect are:
 a: hyphenation (segmenting words into their morphs),
 b: spelling correction,
 c: stemming which reduces the related words as possible. The problem of such
computational programs is the input which should be very broad. Other forms of
application are parsing and generating natural language utterances in written or
spoken form and machine translation. (Trost, 2006)
Syntax
 concerned with the structure of sentences
 Syntax analysis checks the text for meaningfulness comparing to the rules of
formal grammar.
 Sometimes word order of some kinds of structure causes misleading-
 Eg. I saw her with a telescope.
Semantics
 deals with the meanings of words, phrases and sentences.
 Single word may have several meanings
 Eg. Chip, well, covers,
 “hot ice-cream” would be rejected by semantic analyzer based on probability
Pragmatics
 deals with the meanings of utterance depending on the context.
 Interpretation plays crucial role in understanding the meaning
 Eg. I am waiting
 Can be identified as:
 a.an ordinary fact,
 b. a promise and
 c.a threat.

Nlp (1)

  • 1.
  • 3.
    Sentiment analysis  Atechnique used to interpret and classify emotions in subjective data. Sentiment analysis is often performed on textual data to detect sentiment in emails, survey responses, social media data, and beyond.
  • 4.
    Text classification  Textclassification is the process of categorizing text into organized groups. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.
  • 5.
    NLP  Identify, Analyze,Understand and Generate human languages  Applying computational techniques to natural language • Explain computational linguistic theories • Apply artificial Intelligence into possible contexts and makes •Apply all statistical and mathematical models in human language use and usage
  • 6.
    NLP  NLP isused to teach a machine how to read and understand human languages. Trained machines ​can extract the relationships between words, identify the entities in a sentence (i.e., entity-recognition), etc.
  • 7.
    Tokenizing Breaking up astream of characters into words, punctuation marks, numbers and other discrete items.
  • 8.
    Parts of speech Noun -fish, book, house, pen, procrastination, language  Proper noun -John, France, Barack, Goldsmiths, Python  Verb- loves, hates, studies, sleeps, thinks, is, has  Adjective -grumpy, sleepy, happy, bashful  Adverb- slowly, quickly, now, here, there  Pronoun- I, you, he, she, we, us, it, they  Preposition- in, on, at, by, around, with, without  Conjunction -and, but, or, unless  Determiner -the, a, an, some, many, few, 100
  • 9.
    Constituent structure  (((the| a)(cat | dog))(John | Jack | Susan))(barked | slept)  Sentence → Noun Phrase, Verb Phrase  Noun Phrase → Determiner, Noun (Example: the, dog)  Noun Phrase → Proper Noun (Example: Jack)  Noun Phrase → Noun Phrase, Conj,  Noun Phrase (Examples: Jack and Jill, the owl and the pussycat)  Verb Phrase → Verb, Noun Phrase (Example: saw the rabbit)  Verb Phrase → Verb, Preposition, Noun Phrase (Examples: went up the hill, sat on the mat)
  • 10.
    corpus  corpus isa collection of data selected with a descriptive or applicative aim as its purpose  a corpus must possess a common set of fundamental properties, including representativeness, a finite size and existing in electronic format.
  • 11.
    The linguistic dataconsortium  Founded in 1992 and based at the University of Pennsylvania in the United States, this research and development center is financed primarily by the National Science Foundation (NSF). Its main activities consist of collecting, distributing and annotating linguistic resources which correspond to the needs of research centers and American companies which work in the field of language technology. The linguistic data consortium (LDC) owns an extensive catalog of written and spoken corpora which covers a fairly large number of different languages.
  • 12.
    LFG-GPSG  In LFGone parses sentences and builds up functional structures, in GPSG sentences are parsed and translated into formulas of intentional logic, hardly anyone knows how to generate from f-structures or from logical formulas
  • 13.
    LFG-Lexical Functional Grammar Two levels of structure  C-structure (tree)  F-structure (representation of grammatical functions)  Mappings between C-structure and F-structure
  • 14.
    Pronunciation  phonology andphonetics which is concerned with pronunciation.  Pronunciation of characters in isolation and combinations  Regular and irregular pronunciation need considerations  some words have the same pronunciation with different meanings such as "weak" and "week". Computers cannot differentiate between the two words
  • 15.
    Morphology  structure ofwords in their written (graphemic) form and spoken (phonemic) form. It has two forms namely inflection and derivation.  Inflection:  It is related to the grammatical function of words of the same part of speech;  e. g. the paradigm of the verb play as:  Play, plays, played, playing  Derivation:  It is related to the production of new words of different parts of speech;  e. g. nation - (a noun )  national- (an adjective )  nationalize- ( a verb )
  • 16.
    Morphological Analyzer  Amorphological analyzer can extract the base forms from inserted documents in computers.  The applications which are achieved in this respect are:  a: hyphenation (segmenting words into their morphs),  b: spelling correction,  c: stemming which reduces the related words as possible. The problem of such computational programs is the input which should be very broad. Other forms of application are parsing and generating natural language utterances in written or spoken form and machine translation. (Trost, 2006)
  • 17.
    Syntax  concerned withthe structure of sentences  Syntax analysis checks the text for meaningfulness comparing to the rules of formal grammar.  Sometimes word order of some kinds of structure causes misleading-  Eg. I saw her with a telescope.
  • 18.
    Semantics  deals withthe meanings of words, phrases and sentences.  Single word may have several meanings  Eg. Chip, well, covers,  “hot ice-cream” would be rejected by semantic analyzer based on probability
  • 19.
    Pragmatics  deals withthe meanings of utterance depending on the context.  Interpretation plays crucial role in understanding the meaning  Eg. I am waiting  Can be identified as:  a.an ordinary fact,  b. a promise and  c.a threat.