SlideShare a Scribd company logo
Introduction to Text Mining and
Analytics
1
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Analytics
Wiki Definition
Text mining, also referred to as text data mining, roughly equivalent
to text analytics, is the process of deriving high-quality information from text
Source of Text Data
2
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Source of Text Data
Organizations today encounter textual data while running their day to day
business. The source of the data could be electronic text, call center logs,
social media, corporate documents, research papers, application forms,
service notes, emails, etc.
Unstructured Data
• “80 % of business-relevant information originates in unstructured form, primarily
text.” (a quote in 2008)
• “Based on the industry’s current estimations, unstructured data will occupy 90%
of the data by volume in the entire digital space over the next decade.” (a quote in
2010)
3
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Text Mining and Analytics
• Text analytics uses algorithms for turning free-form text (unstructured
data) into data that can be analyzed (structured data) by applying
statistical and machine learning methods, as well as Natural Language
Processing (NLP) techniques.
• Once structured data is obtained, the same mining and analytic
techniques can apply.
4
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
techniques can apply.
• So the most significant part of Text Mining/Analytics is how to convert
texts into structured data.
Text Mining Paradigm
5
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Text Mining Process Pipeline
6
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
• Process is essentially a linear pipeline.
• Feedback from the results of Text Mining might
affect earlier preprocessing (to Parsing, or even data
collection)..
Converting Text into Structured Data
• A huge amount of preprocessing is required to convert text.
– Cleaning up ‘dirty’ texts
• Remove mark-up tags from web documents, encrypted symbols such as emoticons/emoji’s,
extraneous strings such as “AHHHHHHHHHHHHHHHHHHHHH”
• Correct misspelled words..
– Tokenization
• Remove punctuations, normalizing upper/lower cases, etc.
– Sentence splitting
7
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
– Sentence splitting
– Identifying multi-word expressions (e.g. “as well as”, “radio wave”) and Named Entities
(e.g. “Allied Waste”, “Super Mario Bros.”)
• Adding other linguistic information
– Parts-of-speech (e.g. noun, verb, adjective, adverb, preposition)
• Filtering non-significant/irrelevant words – to reduce dimensions
– Filtering non-content words using a stop-list (e.g. “the”, “a”, “an”, “and”)
– Combining tokens by stemming/lemmatizing or using synonyms
• Other NLP features/techniques, e.g. n-grams, syntax trees
Text Mining Applications
• Text Clustering • Trend Analysis
8
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Trend for the Term “text mining” from Google Trends
• Spam filtering
Text Mining – Sentiment Analysis
• Sentiment Analysis
The field of sentiment analysis deals
with categorization (or classification)
of opinions expressed in textual
documents
9
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Sample Tweet:
14 days after #DeMonetisation, PM seeks opinion instead of
addressing the pain & anguish. This is called-Arrogate,
subjugate & dictate!
Two months after RBI Governor changes, #DeMonetisation
happens. Can you imagine what will happen after CJI Thakur
retires on 4 January 2017?
Typical Text Pre-processing Methods
• Given a raw text (in a corpus), we typically pre-process the text by
applying either of the following methods:
1. Part-Of-Speech (POS) tagging – assign a POS to every word in a
sentence in the text
2. Named Entity Recognition (NER) – identify named entities (proper
nouns and some common nouns which are relevant in the domain of
10
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
nouns and some common nouns which are relevant in the domain of
the text)
3. Information Extraction (IE) – identify relations between phrases, and
extract the relevant/significant “information” described in the text
1. Part-Of-Speech (POS) Tagging
• POS tagging is a process of assigning a POS or lexical class marker to each
word in a sentence (and all sentences in a corpus).
Input: the lead paint is unsafe
Output: the/Det lead/N paint/N is/V unsafe/Adj
2. Named Entity Recognition (NER)
11
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
2. Named Entity Recognition (NER)
• NER is to process a text and identify named entities in a sentence
e.g. “U.N. official Ekeus heads for Baghdad.”
3. Information Extraction (IE)
• Identify specific pieces of information (data) in an
unstructured or semi-structured text
• Transform unstructured information in a corpus of texts or
web pages into a structured database (or templates)
• Applied to various types of text, e.g.
12
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
– Newspaper
articles
– Scientific
articles
– Web pages
– etc.
Overview
• Tokenization
• Bag of words
• N-Grams
• TF*IDF
13
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
• TF*IDF
• Topic modeling LDA (Latent Dirichlet allocation)

More Related Content

What's hot

Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
Mario Flecha
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extractionunyil96
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
ssbd6985
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
IJCSIS Research Publications
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
rchbeir
 
Ontology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading GroupOntology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading Group
Tobias Wunner
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
Gan Keng Hoon
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
ijnlc
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
Alia Hamwi
 
II-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideII-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short Guide
Dr. Haxel Consult
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
INFOGAIN PUBLICATION
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
sathish sak
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontologyIAEME Publication
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
Tariq Hassan
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning   sstose
 
Ir 01
Ir   01Ir   01
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
Houw Liong The
 

What's hot (20)

Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Ontology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading GroupOntology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading Group
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
II-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideII-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short Guide
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontology
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 
Ir 01
Ir   01Ir   01
Ir 01
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 

Similar to Analysing Demonetisation through Text Mining using Live Twitter Data!

02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
beshahashenafe20
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
Rupak Roy
 
Fundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptxFundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptx
aini658222
 
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
National Information Standards Organization (NISO)
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
Netbase Solutions Inc.
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
Dr. Haxel Consult
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.ppt
SamuelKetema1
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
Seth Grimes
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
IRJET Journal
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
Tao Xie
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
Geohedrick
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Industrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingIndustrial strength - Natural Language Processing
Industrial strength - Natural Language Processing
Jeffrey Williams
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
IRJET Journal
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
Tommaso Teofili
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
Amit Sheth
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Sean Golliher
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
aciijournal
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
aciijournal
 

Similar to Analysing Demonetisation through Text Mining using Live Twitter Data! (20)

02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Fundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptxFundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptx
 
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.ppt
 
CRC Final Report
CRC Final ReportCRC Final Report
CRC Final Report
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Industrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingIndustrial strength - Natural Language Processing
Industrial strength - Natural Language Processing
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
 

Recently uploaded

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

Analysing Demonetisation through Text Mining using Live Twitter Data!

  • 1. Introduction to Text Mining and Analytics 1 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Analytics
  • 2. Wiki Definition Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text Source of Text Data 2 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Source of Text Data Organizations today encounter textual data while running their day to day business. The source of the data could be electronic text, call center logs, social media, corporate documents, research papers, application forms, service notes, emails, etc.
  • 3. Unstructured Data • “80 % of business-relevant information originates in unstructured form, primarily text.” (a quote in 2008) • “Based on the industry’s current estimations, unstructured data will occupy 90% of the data by volume in the entire digital space over the next decade.” (a quote in 2010) 3 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
  • 4. Text Mining and Analytics • Text analytics uses algorithms for turning free-form text (unstructured data) into data that can be analyzed (structured data) by applying statistical and machine learning methods, as well as Natural Language Processing (NLP) techniques. • Once structured data is obtained, the same mining and analytic techniques can apply. 4 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) techniques can apply. • So the most significant part of Text Mining/Analytics is how to convert texts into structured data.
  • 5. Text Mining Paradigm 5 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
  • 6. Text Mining Process Pipeline 6 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) • Process is essentially a linear pipeline. • Feedback from the results of Text Mining might affect earlier preprocessing (to Parsing, or even data collection)..
  • 7. Converting Text into Structured Data • A huge amount of preprocessing is required to convert text. – Cleaning up ‘dirty’ texts • Remove mark-up tags from web documents, encrypted symbols such as emoticons/emoji’s, extraneous strings such as “AHHHHHHHHHHHHHHHHHHHHH” • Correct misspelled words.. – Tokenization • Remove punctuations, normalizing upper/lower cases, etc. – Sentence splitting 7 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) – Sentence splitting – Identifying multi-word expressions (e.g. “as well as”, “radio wave”) and Named Entities (e.g. “Allied Waste”, “Super Mario Bros.”) • Adding other linguistic information – Parts-of-speech (e.g. noun, verb, adjective, adverb, preposition) • Filtering non-significant/irrelevant words – to reduce dimensions – Filtering non-content words using a stop-list (e.g. “the”, “a”, “an”, “and”) – Combining tokens by stemming/lemmatizing or using synonyms • Other NLP features/techniques, e.g. n-grams, syntax trees
  • 8. Text Mining Applications • Text Clustering • Trend Analysis 8 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Trend for the Term “text mining” from Google Trends • Spam filtering
  • 9. Text Mining – Sentiment Analysis • Sentiment Analysis The field of sentiment analysis deals with categorization (or classification) of opinions expressed in textual documents 9 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Sample Tweet: 14 days after #DeMonetisation, PM seeks opinion instead of addressing the pain & anguish. This is called-Arrogate, subjugate & dictate! Two months after RBI Governor changes, #DeMonetisation happens. Can you imagine what will happen after CJI Thakur retires on 4 January 2017?
  • 10. Typical Text Pre-processing Methods • Given a raw text (in a corpus), we typically pre-process the text by applying either of the following methods: 1. Part-Of-Speech (POS) tagging – assign a POS to every word in a sentence in the text 2. Named Entity Recognition (NER) – identify named entities (proper nouns and some common nouns which are relevant in the domain of 10 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) nouns and some common nouns which are relevant in the domain of the text) 3. Information Extraction (IE) – identify relations between phrases, and extract the relevant/significant “information” described in the text
  • 11. 1. Part-Of-Speech (POS) Tagging • POS tagging is a process of assigning a POS or lexical class marker to each word in a sentence (and all sentences in a corpus). Input: the lead paint is unsafe Output: the/Det lead/N paint/N is/V unsafe/Adj 2. Named Entity Recognition (NER) 11 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) 2. Named Entity Recognition (NER) • NER is to process a text and identify named entities in a sentence e.g. “U.N. official Ekeus heads for Baghdad.”
  • 12. 3. Information Extraction (IE) • Identify specific pieces of information (data) in an unstructured or semi-structured text • Transform unstructured information in a corpus of texts or web pages into a structured database (or templates) • Applied to various types of text, e.g. 12 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) – Newspaper articles – Scientific articles – Web pages – etc.
  • 13. Overview • Tokenization • Bag of words • N-Grams • TF*IDF 13 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) • TF*IDF • Topic modeling LDA (Latent Dirichlet allocation)