SlideShare a Scribd company logo
1 of 13
Text Mining
Independent Course Study
Under the guidance of
Prof. Ranganathan Chandraskaren
1Aadish Chopra
Text Mining
Problem Statement :Classify tweets into categories such as Health ,
General Information
Data Set: Collected through web-scraping
• Twitter API is used widely but limitation of Twitter API is that you can
only collect upto a few thousand tweets.
• Another limitation of twitter API is that you can’t extract all the
information that is on the website
Tweets were only collected for Health institutions. There are 790 health
institutions or handles.
2Aadish Chopra
How the twitter website looks like ?
3Aadish Chopra
Twitter
Name
Twitter
Handle
Tweet
Web scraper
Web scraper was built using R. Package used is Rselenium*
https://drive.google.com/drive/folders/0Bzq4aAyBiuFQdHd0Y2JobmhjY
1k
Rselenium is based on Selenium server and automates the web browser
and takes control over it .You can then specify how you want to interact
with the web browser by specifying the action e.g. scrolling, pressing
the end key or pressing some specific button on the website
4Aadish Chopra
*I did web scraping using R. Earlier teams did it using Java and Python
How the collected data looks like ?
5Aadish Chopra
https://drive.google.com/drive/folders/0B-x4pbFFhIICWmdvWnBwX1VwZzA
Text Cleaning
Example Tweet :
Don't forget to register for the American Heart Association’s Heartsaver® First Aid
Training offered by the...http://fb.me/6PjnRWaMAÂ
Steps taken to clean tweet
• Tweets can be cleaned using the “tm” package in R . The functions are as follows
1. removeNumbers
2. removePunctuation
3. Stopwords
4. removeWords
5. stripWhitespace
6. stemDocument
6Aadish Chopra
Text Cleaning
However if one wants more control over the cleaning of the data. For this purpose regex
expressions can be used
1. Punctuation replacement: LaSalletwe<-gsub(pattern = "W",replacement = "
",LaSalletwe)
2. Digits replacement: LaSalletwe<-gsub(pattern="d",replacement = "",LaSalletwe)
3. To remove a single letter: LaSalletwe<-gsub(pattern="b[A-
z]b{1}",replace="",LaSalletwe)
4. The list of stopwords* can be modified in the English.dat file which can be opened. The
stopwords can be created for any language in R. It can be helpful for instance in cities
like Chicago where people speak languages like Spanish
*Changed the English.dat file in retrospective manner.
7Aadish Chopra
Text Cleaning
8Aadish Chopra
English.dat file can
be changed for
stopwords
Topic Modelling
Topic Modelling:
• Classify documents using topic models.
• Get themes from documents
• Get topic distributions from corpus
• Get word distributions within topics
A technique which is popular is known as Latent Dirichlet allocation(LDA) was used to find
the topic distributions within documents.
Various versions of LDA like sLDA(supervised), Gibbs sampling are also present and may be
used interchangeably depending on the goals.
9Aadish Chopra
Topic Modelling
10Aadish Chopra
α is the parameter of the Dirichlet prior on the per-document topic distributions,
β is the parameter of the Dirichlet prior on the per-topic word distribution,
Ɵ is the topic distribution for document m,
ψ is the word distribution for topic k,
z is the topic for the n-th word in document m, and
w is the specific word.
N is the number of
words in a
document
M is the number of
documents
Topic Modelling
LDA gives the themes of the document or the corpus. LDA can be used
in various ways.
1. On a single tweet
2. On a single handle
3. On the entire corpus
11Aadish Chopra
Word Frequency -On a single handle
We are illustrating by taking example of LaSalleCoHealth
Total no of Observations: 691
Total no of words after data cleaning: 1126 observations
Top 16 words which occurs most in their tweet are
12Aadish Chopra
Word Frequency-On the corpus
Total no of Observations: 16449
Total no of words after data cleaning: 13665
observations
13Aadish Chopra

More Related Content

What's hot

Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin YahooPeter Mika
 
Otsuka Talk in Dec 2017
Otsuka Talk in Dec 2017Otsuka Talk in Dec 2017
Otsuka Talk in Dec 2017Fariz Darari
 
Knowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesKnowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesFariz Darari
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
2 Hka Researching
2 Hka Researching2 Hka Researching
2 Hka Researchingaptwano
 
Terrorism 15 May 2009
Terrorism 15 May 2009Terrorism 15 May 2009
Terrorism 15 May 2009karsalan
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
2014 CrossRef Workshops: Support Update and Multiple Resolution Overview
2014 CrossRef Workshops: Support Update and Multiple Resolution Overview2014 CrossRef Workshops: Support Update and Multiple Resolution Overview
2014 CrossRef Workshops: Support Update and Multiple Resolution OverviewCrossref
 
Search strategies – subject searching
Search strategies – subject searchingSearch strategies – subject searching
Search strategies – subject searchingdoverlibrary
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic  Web and Linked DataAn introduction to Semantic  Web and Linked Data
An introduction to Semantic Web and Linked DataGabriela Agustini
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyPeter Mika
 
Productive Searching
Productive SearchingProductive Searching
Productive Searchingjskotnicki
 
What google scholar can do for you
What google scholar can do for youWhat google scholar can do for you
What google scholar can do for youpvhead123
 

What's hot (18)

Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin Yahoo
 
Otsuka Talk in Dec 2017
Otsuka Talk in Dec 2017Otsuka Talk in Dec 2017
Otsuka Talk in Dec 2017
 
Knowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesKnowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and Challenges
 
PIR advanced information skills 2018
PIR advanced information skills 2018PIR advanced information skills 2018
PIR advanced information skills 2018
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Hello data
Hello dataHello data
Hello data
 
2 Hka Researching
2 Hka Researching2 Hka Researching
2 Hka Researching
 
Terrorism 15 May 2009
Terrorism 15 May 2009Terrorism 15 May 2009
Terrorism 15 May 2009
 
Search strategy
Search strategySearch strategy
Search strategy
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
2014 CrossRef Workshops: Support Update and Multiple Resolution Overview
2014 CrossRef Workshops: Support Update and Multiple Resolution Overview2014 CrossRef Workshops: Support Update and Multiple Resolution Overview
2014 CrossRef Workshops: Support Update and Multiple Resolution Overview
 
Search strategies – subject searching
Search strategies – subject searchingSearch strategies – subject searching
Search strategies – subject searching
 
E-LEARN: Brainstorming
E-LEARN: BrainstormingE-LEARN: Brainstorming
E-LEARN: Brainstorming
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic  Web and Linked DataAn introduction to Semantic  Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
 
Productive Searching
Productive SearchingProductive Searching
Productive Searching
 
What google scholar can do for you
What google scholar can do for youWhat google scholar can do for you
What google scholar can do for you
 

Similar to Text minings

Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchErudite
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
How to get_community_support
How to get_community_supportHow to get_community_support
How to get_community_supportipsitamishra
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & WhyRachael L Moore
 
TLA Webinar: Introduction to Drupal -- part 3 of 3
TLA Webinar: Introduction to Drupal -- part 3 of 3TLA Webinar: Introduction to Drupal -- part 3 of 3
TLA Webinar: Introduction to Drupal -- part 3 of 3cherryhillco
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Ted Drake
 
Copyright © 2017, 2018 Sinclair Community College. All Right.docx
Copyright © 2017, 2018 Sinclair Community College. All Right.docxCopyright © 2017, 2018 Sinclair Community College. All Right.docx
Copyright © 2017, 2018 Sinclair Community College. All Right.docxbobbywlane695641
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentationPriyaj Kumar
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsGabriela Agustini
 
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from TextUse Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from TextAmazon Web Services
 
Resource Mining for Effective Research
Resource Mining for Effective ResearchResource Mining for Effective Research
Resource Mining for Effective ResearchAndrew Sorensen
 
8 programming concepts_you_should_know_in_2017
8 programming concepts_you_should_know_in_20178 programming concepts_you_should_know_in_2017
8 programming concepts_you_should_know_in_2017Rajesh Shirsagar
 
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdcppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdcashimavashisth001
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 

Similar to Text minings (20)

Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
Aws r
Aws rAws r
Aws r
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
How to get_community_support
How to get_community_supportHow to get_community_support
How to get_community_support
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & Why
 
TLA Webinar: Introduction to Drupal -- part 3 of 3
TLA Webinar: Introduction to Drupal -- part 3 of 3TLA Webinar: Introduction to Drupal -- part 3 of 3
TLA Webinar: Introduction to Drupal -- part 3 of 3
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
 
Copyright © 2017, 2018 Sinclair Community College. All Right.docx
Copyright © 2017, 2018 Sinclair Community College. All Right.docxCopyright © 2017, 2018 Sinclair Community College. All Right.docx
Copyright © 2017, 2018 Sinclair Community College. All Right.docx
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of Tweets
 
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from TextUse Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
 
Resource Mining for Effective Research
Resource Mining for Effective ResearchResource Mining for Effective Research
Resource Mining for Effective Research
 
8 programming concepts_you_should_know_in_2017
8 programming concepts_you_should_know_in_20178 programming concepts_you_should_know_in_2017
8 programming concepts_you_should_know_in_2017
 
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdcppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Twitter System Design
Twitter System DesignTwitter System Design
Twitter System Design
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Social Networks at Scale
Social Networks at ScaleSocial Networks at Scale
Social Networks at Scale
 

More from University of Illinois,Chicago (15)

Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Focus vision
Focus visionFocus vision
Focus vision
 
Health informationexchangeacrossus healthinstitution (1)
Health informationexchangeacrossus healthinstitution (1)Health informationexchangeacrossus healthinstitution (1)
Health informationexchangeacrossus healthinstitution (1)
 
Bank of America
Bank of AmericaBank of America
Bank of America
 
Healthymagination
HealthymaginationHealthymagination
Healthymagination
 
Sephora
SephoraSephora
Sephora
 
Uber
UberUber
Uber
 
PBM 2
PBM 2PBM 2
PBM 2
 
IDS 570 project presentation
IDS 570 project presentationIDS 570 project presentation
IDS 570 project presentation
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
OLED
OLEDOLED
OLED
 
Microsoft power point Face recognition
Microsoft power point   Face recognitionMicrosoft power point   Face recognition
Microsoft power point Face recognition
 
(485226650) OLED 3
(485226650) OLED 3(485226650) OLED 3
(485226650) OLED 3
 
REIL Presentation
REIL PresentationREIL Presentation
REIL Presentation
 

Recently uploaded

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Text minings

  • 1. Text Mining Independent Course Study Under the guidance of Prof. Ranganathan Chandraskaren 1Aadish Chopra
  • 2. Text Mining Problem Statement :Classify tweets into categories such as Health , General Information Data Set: Collected through web-scraping • Twitter API is used widely but limitation of Twitter API is that you can only collect upto a few thousand tweets. • Another limitation of twitter API is that you can’t extract all the information that is on the website Tweets were only collected for Health institutions. There are 790 health institutions or handles. 2Aadish Chopra
  • 3. How the twitter website looks like ? 3Aadish Chopra Twitter Name Twitter Handle Tweet
  • 4. Web scraper Web scraper was built using R. Package used is Rselenium* https://drive.google.com/drive/folders/0Bzq4aAyBiuFQdHd0Y2JobmhjY 1k Rselenium is based on Selenium server and automates the web browser and takes control over it .You can then specify how you want to interact with the web browser by specifying the action e.g. scrolling, pressing the end key or pressing some specific button on the website 4Aadish Chopra *I did web scraping using R. Earlier teams did it using Java and Python
  • 5. How the collected data looks like ? 5Aadish Chopra https://drive.google.com/drive/folders/0B-x4pbFFhIICWmdvWnBwX1VwZzA
  • 6. Text Cleaning Example Tweet : Don't forget to register for the American Heart Association’s Heartsaver® First Aid Training offered by the...http://fb.me/6PjnRWaMA Steps taken to clean tweet • Tweets can be cleaned using the “tm” package in R . The functions are as follows 1. removeNumbers 2. removePunctuation 3. Stopwords 4. removeWords 5. stripWhitespace 6. stemDocument 6Aadish Chopra
  • 7. Text Cleaning However if one wants more control over the cleaning of the data. For this purpose regex expressions can be used 1. Punctuation replacement: LaSalletwe<-gsub(pattern = "W",replacement = " ",LaSalletwe) 2. Digits replacement: LaSalletwe<-gsub(pattern="d",replacement = "",LaSalletwe) 3. To remove a single letter: LaSalletwe<-gsub(pattern="b[A- z]b{1}",replace="",LaSalletwe) 4. The list of stopwords* can be modified in the English.dat file which can be opened. The stopwords can be created for any language in R. It can be helpful for instance in cities like Chicago where people speak languages like Spanish *Changed the English.dat file in retrospective manner. 7Aadish Chopra
  • 8. Text Cleaning 8Aadish Chopra English.dat file can be changed for stopwords
  • 9. Topic Modelling Topic Modelling: • Classify documents using topic models. • Get themes from documents • Get topic distributions from corpus • Get word distributions within topics A technique which is popular is known as Latent Dirichlet allocation(LDA) was used to find the topic distributions within documents. Various versions of LDA like sLDA(supervised), Gibbs sampling are also present and may be used interchangeably depending on the goals. 9Aadish Chopra
  • 10. Topic Modelling 10Aadish Chopra α is the parameter of the Dirichlet prior on the per-document topic distributions, β is the parameter of the Dirichlet prior on the per-topic word distribution, Ɵ is the topic distribution for document m, ψ is the word distribution for topic k, z is the topic for the n-th word in document m, and w is the specific word. N is the number of words in a document M is the number of documents
  • 11. Topic Modelling LDA gives the themes of the document or the corpus. LDA can be used in various ways. 1. On a single tweet 2. On a single handle 3. On the entire corpus 11Aadish Chopra
  • 12. Word Frequency -On a single handle We are illustrating by taking example of LaSalleCoHealth Total no of Observations: 691 Total no of words after data cleaning: 1126 observations Top 16 words which occurs most in their tweet are 12Aadish Chopra
  • 13. Word Frequency-On the corpus Total no of Observations: 16449 Total no of words after data cleaning: 13665 observations 13Aadish Chopra