TEXT REPRESENTATION & APPLICATIONS OF NLP
Dr. M. Praneesh M.Sc.,M.Phil.,PGDNLP.,Ph.D
Assistant Professor
Department of Computer Science with Data Analytics
Sri Ramakrishna College of Arts and Science
2
Label and Features
 Instance / Sample: A single data point or
observation.
 Label: The target or output variable you want to
predict.
 Features: The input variables or attributes that
describe each instance and are used by the model
to make predictions.
3
4
5
Text Representation
6
Properties of Text Representation
Properties
7
Text Representation
 Vector Representation:
 Aims to numerically represent the unstructured text
documents to make them mathematically
computable.
 Types: Frequency representation & Semantic
Representation
8
Frequency Representation
 Taking words/phrases as symbols gets its occurrence in the
document as a vector
 Term Document Matrix
 Unigram, Bigram, Trigram
9
Inverse Document Matrix
10
Feature Matrix (Unigram)
11
Semantic Representation
 Represent the text documents by explicit or implicit
semantics/context/meaning instead of word
occurrence in the document
12
Knowledge Representation
 Part of Speech (POS) Tagging - Assigning a part-
of-speech to each word in a sentence
 Named Entity Recognizing (NER) - A processor
which assigns named entity mentioned in
unstructured text into predefined categories
 Parsing - Breaking down a text into its component
parts of speech with an explanation of the form,
function, and syntactic relationship of each part
13
Applications of NLP
Question Answering
focuses on building
systems that
automatically answer
the questions asked
by humans in a
natural language.
14
Applications of NLP
Spam detection is used to detect unwanted
e-mails getting to a user's inbox.
15
Applications of NLP
Sentiment Analysis is also known as opinion mining. It is
used on the web to analyse the attitude, behaviour, and
emotional state of the sender.
16
Applications of NLP
Machine translation is used to translate text or
speech from one natural language to another
natural language. Ex: Google Translator
17
Applications of NLP
Microsoft Corporation provides word processor
software like MS-word, PowerPoint for the spelling
correction.
18
Applications of NLP
 Speech recognition is used for converting spoken
words into text.
 It is used in applications, such as mobile, home
automation, video recovery, dictating to Microsoft
Word, voice biometrics, voice user interface, and so
on.
19
Applications of NLP
Implementing the Chatbot is one of the important
applications of NLP. It is used by many companies to
provide the customer's chat services.
20
Applications of NLP
Implementing the Chatbot is one of the important
applications of NLP. It is used by many companies to
provide the customer's chat services.
21
Applications of NLP
 Information extraction is one of the most important
applications of NLP.
 It is used for extracting structured information from
unstructured or semi-structured machine-readable
documents.
22
23

Applications & Text Representations.pptx

  • 1.
    TEXT REPRESENTATION &APPLICATIONS OF NLP Dr. M. Praneesh M.Sc.,M.Phil.,PGDNLP.,Ph.D Assistant Professor Department of Computer Science with Data Analytics Sri Ramakrishna College of Arts and Science
  • 2.
    2 Label and Features Instance / Sample: A single data point or observation.  Label: The target or output variable you want to predict.  Features: The input variables or attributes that describe each instance and are used by the model to make predictions.
  • 3.
  • 4.
  • 5.
  • 6.
    6 Properties of TextRepresentation Properties
  • 7.
    7 Text Representation  VectorRepresentation:  Aims to numerically represent the unstructured text documents to make them mathematically computable.  Types: Frequency representation & Semantic Representation
  • 8.
    8 Frequency Representation  Takingwords/phrases as symbols gets its occurrence in the document as a vector  Term Document Matrix  Unigram, Bigram, Trigram
  • 9.
  • 10.
  • 11.
    11 Semantic Representation  Representthe text documents by explicit or implicit semantics/context/meaning instead of word occurrence in the document
  • 12.
    12 Knowledge Representation  Partof Speech (POS) Tagging - Assigning a part- of-speech to each word in a sentence  Named Entity Recognizing (NER) - A processor which assigns named entity mentioned in unstructured text into predefined categories  Parsing - Breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part
  • 13.
    13 Applications of NLP QuestionAnswering focuses on building systems that automatically answer the questions asked by humans in a natural language.
  • 14.
    14 Applications of NLP Spamdetection is used to detect unwanted e-mails getting to a user's inbox.
  • 15.
    15 Applications of NLP SentimentAnalysis is also known as opinion mining. It is used on the web to analyse the attitude, behaviour, and emotional state of the sender.
  • 16.
    16 Applications of NLP Machinetranslation is used to translate text or speech from one natural language to another natural language. Ex: Google Translator
  • 17.
    17 Applications of NLP MicrosoftCorporation provides word processor software like MS-word, PowerPoint for the spelling correction.
  • 18.
    18 Applications of NLP Speech recognition is used for converting spoken words into text.  It is used in applications, such as mobile, home automation, video recovery, dictating to Microsoft Word, voice biometrics, voice user interface, and so on.
  • 19.
    19 Applications of NLP Implementingthe Chatbot is one of the important applications of NLP. It is used by many companies to provide the customer's chat services.
  • 20.
    20 Applications of NLP Implementingthe Chatbot is one of the important applications of NLP. It is used by many companies to provide the customer's chat services.
  • 21.
    21 Applications of NLP Information extraction is one of the most important applications of NLP.  It is used for extracting structured information from unstructured or semi-structured machine-readable documents.
  • 22.
  • 23.