Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Text Mining - Text data Visualization
1. #AI2018
Text Mining - Text data Visualization
Manjusha Joshi
Sr. Data Scientist
Maxima IT Consulting
June 23, 2018
manjusha.joshi@gmail.com
2. #AI2018Overview
Data
Sources of text Data
Information Retrival
Text data : Techniques
Text Data Mining
Challeges in Text Data Mining
Text Data: AI
Text data visualization
24. #AI2018Text data : Techniques
NLP
Tokenization, n-gram Tokenization: Given a character
sequence and a defined document unit, tokenization is the
task of chopping it up into pieces, called tokens
25. #AI2018Text data : Techniques
NLP
Tokenization, n-gram Tokenization: Given a character
sequence and a defined document unit, tokenization is the
task of chopping it up into pieces, called tokens
Bag of words
26. #AI2018Text data : Techniques
NLP
Tokenization, n-gram Tokenization: Given a character
sequence and a defined document unit, tokenization is the
task of chopping it up into pieces, called tokens
Bag of words
word2Vec
27. #AI2018Text data : Techniques
NLP
Tokenization, n-gram Tokenization: Given a character
sequence and a defined document unit, tokenization is the
task of chopping it up into pieces, called tokens
Bag of words
word2Vec
Name Entity Extraction
28. #AI2018Text data : Techniques
NLP
Tokenization, n-gram Tokenization: Given a character
sequence and a defined document unit, tokenization is the
task of chopping it up into pieces, called tokens
Bag of words
word2Vec
Name Entity Extraction
29. #AI2018Text data : Techniques
NLP
Tokenization, n-gram Tokenization: Given a character
sequence and a defined document unit, tokenization is the
task of chopping it up into pieces, called tokens
Bag of words
word2Vec
Name Entity Extraction
Classify named entities in text into pre-defined categories such
as the names of persons, organizations, locations etc.
45. #AI2018Challeges in Text Data Mining
Missing data: s me − > some / same ?
Data with garbadge : scanned data with non English
characters @ etc.
46. #AI2018Challeges in Text Data Mining
Missing data: s me − > some / same ?
Data with garbadge : scanned data with non English
characters @ etc.
Incorrect data: Typo mistakes
47. #AI2018Challeges in Text Data Mining
Missing data: s me − > some / same ?
Data with garbadge : scanned data with non English
characters @ etc.
Incorrect data: Typo mistakes
Words with same meaning: Synonym Ex. Happy , pleased,
delighted, glad
48. #AI2018Challeges in Text Data Mining
Missing data: s me − > some / same ?
Data with garbadge : scanned data with non English
characters @ etc.
Incorrect data: Typo mistakes
Words with same meaning: Synonym Ex. Happy , pleased,
delighted, glad
Words with mulitple meaning: Homonyms Right: You were
right. Make a right turn at the tree.
53. #AI2018Text Data: AI
Predict next word/phrase
BoT : Robot An Internet Bot, also known as web robot,
WWW robot or simply -bot-, is a software application that
runs automated tasks (scripts) over the Internet.
Topic Modelling
54. #AI2018Text Data: AI
Predict next word/phrase
BoT : Robot An Internet Bot, also known as web robot,
WWW robot or simply -bot-, is a software application that
runs automated tasks (scripts) over the Internet.
Topic Modelling
55. #AI2018Text Data: AI
Predict next word/phrase
BoT : Robot An Internet Bot, also known as web robot,
WWW robot or simply -bot-, is a software application that
runs automated tasks (scripts) over the Internet.
Topic Modelling
56. #AI2018Text Data: AI
Predict next word/phrase
BoT : Robot An Internet Bot, also known as web robot,
WWW robot or simply -bot-, is a software application that
runs automated tasks (scripts) over the Internet.
Topic Modelling
Product recommendation
57. #AI2018Text Data: AI
Predict next word/phrase
BoT : Robot An Internet Bot, also known as web robot,
WWW robot or simply -bot-, is a software application that
runs automated tasks (scripts) over the Internet.
Topic Modelling
Product recommendation
Smart Document summerization
58. #AI2018Text Data: AI
Predict next word/phrase
BoT : Robot An Internet Bot, also known as web robot,
WWW robot or simply -bot-, is a software application that
runs automated tasks (scripts) over the Internet.
Topic Modelling
Product recommendation
Smart Document summerization
Smart categorisation of documents Ex. Gmail spam mails