Natural Language Processing using Text Mining

Agenda
Name : Zeeshan Rafi
Student No : 10512770
Name : Sushanti Acharya
Name : Debi Das
Name : Amit Sharma
By :
Introduction of Natural Language Processing
in R programming Using Text Mining.
NeedofText
Mining
WhatisText
Mining?
Terminologiesin
NLP
Handson
Experiencewith
R
NLPandits
Application

Agenda
NeedofText
Mining
WhatisText
Mining?
Terminologiesin
NLP
Handson
Experiencewith
R AGENDA
1 What is Text
Mining?
2 Need of Text
Mining
3 What is NLP?
4 Application of NLP
5 Terminologies in Text
mining.
6 Hands on Experience with R
programming.
NLPandits
Application

Agenda
What is Text
Mining?
WhatisText
Mining?
NeedofText
Mining
NLPandits
Application
Terminologiesin
NLP
Handson
Experiencewith
R
Text Mining / Text Analytics is the process of deriving meaningful
information from natural language text.
There are many examples
of text-based documents
So, we wish to find a way to gain knowledge
(in summarised form) from all that text,
without reading or examining them fully first
Not
enough
time or
patience
to read

Agenda
What is Text
Mining
WhatisText
Mining?
NeedofText
Mining
NLPandits
Application
Terminologiesin
NLP
Handson
Experiencewith
R
Text mining involves a series of activities to be performed in
order to efficiency mine the information. These activities are :

Agenda
In the past few years, an unprecedented amount of information has been
created. According to IDC(International data corporation), the digital
universe will reach over 40 ZB (1,000⁷ bytes) by 2020.
Need of Text Mining
WhatisText
Mining?
NLPandits
Application
Terminologiesin
NLP
Handson
Experiencewith
R
NeedofText
Mining
Many organizations are managing massive amounts of information in their big data
systems, but handling data, and making it make sense is a massive challenge.

Agenda
Need of Text Mining
WhatisText
Mining?
NLPandits
Application
Terminologiesin
NLP
Handson
Experiencewith
R
NeedofText
Mining
Approximately 90% of the world’s data is held in unstructured format.
Information intensive business
processes demand that we
transcend from simple document
retrieval to “knowledge”
discovery.

Agenda
Uses of Text
Mining
WhatisText
Mining?
NeedofText
Mining
NLPandits
Application
Terminologiesin
NLP
Handson
Experiencewith
R
Autocomplete
Spamdetection
PredictiveTyping
Spellchecker

Agenda
What is NLP?
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
Natural language
processing (NLP) :
Natural Language Processing or NLP
is a field of Artificial Intelligence that
gives the machines the ability to
read, understand and derive
meaning from human languages.

Agenda
Relation of Text Mining and
NLP
NLPandits
Application
NeedofText
Mining
WhatisText
Mining?
Terminologiesin
NLP
Handson
Experiencewith
R
The role of NLP in text mining is to deliver the system in the
information extraction phase as an input.
Text Mining is the process of
driving high quality of
information from the text.
The overall goal is, to turn
text into data for analytics,
via application of Natural
language processing.
The goal of text mining is to discover relevant
information in text by transforming the text into
data that can be used for further analysis. Text
mining accomplishes this using a variety
of analysis methodologies; natural language
processing (NLP) is one of them.

Agenda
Application of NLP
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
Translation
Spamdetection
SentimentalAnalysis
QuestionandAnswer

Agenda
Terminologies in NLP
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
1.Tokenizatation.
1Break a complex
sentences into
words.
2
Understand the
importance of each
word with respect to
sentence.
3
Produce a structural
description on an input
sentence.
Scoring Words :
 Counts. Count the number of times each word
appears in a document.
 Frequencies. Calculate the frequency that each
word appears in a document out of all the words
in the document.
A problem : with scoring
word frequency is that highly
frequent words may not
contain as much
“informational content”.
One approach is to rescale
the frequency of words by
how often they appear in all
documents.

Agenda
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
2.Stemming.
Normalize word into its base form or root form.
Stemming is the process of reducing inflected (or sometimes derived) words to
their word stem, base or root form—generally a written word form.

Agenda
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
3.Lemmatization.
Lemmatization usually refers to doing things properly with the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only and
to return the base or dictionary form of a word, which is known as the lemma .

Agenda
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
4.Stop words.
The list of words that are not to be added is called a stop list. ... In order to save both
space and time, these words are dropped at indexing time and then ignored at search
time. Some search engines allow you to include a stop word in your search by putting
an inclusion (plus sign) before each stop word in your query.

Agenda
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
5. DTM (Document By Term Matrix).
A document-term matrix or term-document matrix is a mathematical matrix that describes the
frequency of terms that occur in a collection of documents. In a document-term matrix, rows
correspond to documents in the collection and columns correspond to terms.

Agenda
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Handson
Experiencewith
R
5. Word cloud.
A word cloud is visual representation of text data,
typically used to visualize free form text.
 Tags are usually single words and the importance of each tag is shown with font size or color.
 This format is useful for quickly perceiving the most prominent terms.
 The larger the word in the visual the more common the word was in the document(s).
 This type of visualization can assist evaluators with exploratory textual analysis by identifying
words that frequently appear in a set of documents, or other text.
 It can also be used for communicating the most salient points or themes in the reporting stage.

ands on experience with R
Handson
Experiencewith
R
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Agenda
Programming language used for NLP :
Advantage of R :
1. R is open-source software, which means using it is completely free
2. source code is open to public inspection, modification, and improvement.
3. In built libraries for text mining in R
R was created by Ross Ihaka and Robert
Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R
Development Core Team (of which Chambers is a
member). R is named partly after the first names of
the first two R authors and partly as a play on the
name of S.

ands on experience with R
Handson
Experiencewith
R
NLPandits
Application
WhatisText
Mining?
NeedofText
Mining
Terminologiesin
NLP
Agenda
1.Sentimental Analysis using Natural language Processing for Predicting
Polarity of User Reviews through Random forest Algorithm.
2. SPAM Filtering using NLP in R.

Agenda
Introducti
on
Literature
Review
implementation
andpractical
context
Possible
finding
Thank
You

Natural Language Processing using Text Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Natural Language Processing using Text Mining

Similar to Natural Language Processing using Text Mining (20)

Recently uploaded

Recently uploaded (20)

Natural Language Processing using Text Mining