Text Analytics using NLTK

TEXT ANALYTICS
USING NLTK
By,
Vaishnavi A
III CSE B

What is Natural Language Processing?
NLP is a part of computer science and
artificial intelligence which deals with
human languages.

Think about how much text you see each day:
• Email
• SMS
• Web Pages
• Newspaper
• and so much more…
The list is endless.

What is Text Mining?
Text Mining / Text Analytics is the process of deriving
meaningful information from natural language text.

•TOKENIZATION
Tokenization is the first step in NLP
Tokenization is the first step in NLP

Removal of StopWords
•Stopwords might not add much value to the meaning of
the statement
•Perform tokenization before any stopwords removal.
Eg: “There is a book on the table”
The words “is”, “a”, “on” and “the’ 🡪 Stopwords
Words like “there”, “book” and “table” 🡪 Keywords

STEMMING
Normalize words into its base form or root form
Affects Affections Affected Affection Affecting
Affect

Groups together different
inflected forms of a word,
called Lemma
Somehow similar to
Stemming, as it maps several
words into one common root
Output of Lemmatization is a
proper word
For example, a Lemmatizer
should map gone, going and
went into go

Text Analytics using NLTK

More Related Content

What's hot

Similar to Text Analytics using NLTK

Recently uploaded

Text Analytics using NLTK