SlideShare a Scribd company logo
1 of 29
Download to read offline
Tweet Mining: Is It Useful and
Should We Bother?
Nils C. Newman
Alan L. Porter & Jon Garner
Science and Social Media – The New Frontier
Background
Treat Twitter as a new data source for S&T analysis
• Think of Twitter in terms of any traditional data
source – Patents, Scientific Publications, etc..
• Use our standard analysis techniques
(VantagePoint) to look at search results on
Graphene and Nano Enhanced Drug Delivery
The only difference is…
• Every abstract is only 140 characters long
The Premise of our Pilot Project:
There is a bit more than 140 characters
of content to work with
But….
Anatomy of Tweet
Tweet Sender
Directed Tweet
Hashtags
Twitter
Shorthand
Links
Retweet data
And more!
Given the combinations of names, links, re-
tweet information, and other Twitter data, in
theory we could:
• Find key influence leaders
• Discover emerging terminology
• Track geographic spread
• Track time trends
• Etc…
Things we could do….
Now for the messy bits
However….
With the Twitter API you can search all of Twitter
but the API only provides access to the last 8
days of data.
If you want more data, you can
• Build your own twitter database going forward
• Purchase access to the Twitter “Firehose” to go
back in time
Twitter Data:
Now you see it, now you don’t
Who actually has access
to the Twitter Firehose?
• Yahoo, Google, MS
• In 2010, seven
companies where given
access to the Firehose
• In 2013, of those seven
companies, none are
still around
The Quest for the Twitter Firehose
After a bit of digging, we finally
found current firehose
providers who were still in
business
• One wouldn’t respond to
inquiries
• One has embedded it into
their own analysis products
• But finally, one did respond
Our Firehose Odyssey
Graphene Pilot
With Topsy, we were able to
• Get a key to access their Otter API
• Use their search interface to search for
“Graphene”
• Successfully download 34,586 Tweets with
coverage back to 2006
• Import the Tweets into VantagePoint for
analysis
Graphene: Progress with Topsy
We were happy!
Then we looked at the data and found
more messy bits…
The first issue we ran into was translating Topsy
Twitterese into something we could understand
• Twitter specific jargon
– Hashtag, directed tweet, RT, etc…
• Date codes
– In Unix Timestamp format
• Topsy specific jargon and vaguely defined indicators
– Hits
– Score
– Trackback totals
You call this documentation?
But eventually we sorted most of it out
And we were able to do actual analysis
And create analytical output
And produce reportable output
• The data have a lot of noise
– Order your Graphene t-shirt today!
– Maria Sharapova wins with Graphene Instinct racket
• There is a lot of gray data that are challenging to
interpret
– Graphene jobs available
• There is also a reasonable amount of interesting
stuff
– Research funding announcements
– Business information
– Technical content
But is it meaningful?
NEDD Pilot
NEDD presented more of a
issue.
• The Topsy search interface is
a bit limited
• Our Nano Enhanced Drug
Delivery search strategy
required complex Boolean,
wildcards and nesting
strategies
• Topsy only allows simple
boolean
NEDD: Progress with Topsy
Was more than messy…
Our attempt
The NEDD experiment produced a number of major
issues
• Search terms such as RNAi are words in that have
other meanings in other languages so you have to
control for language (which doesn’t always seem to
work)
• No wildcards or truncation - presented problems
• Limitations on the Boolean was an issue
NEDD “Results”
The NEDD search was basically unusable without
significant additional effort
The Result
The results of the pilot were a little more than mixed
• The Graphene pilot was a positive experience
• The NEDD pilot was pretty negative
• We can see the potential but it is going to take a bit
more work
• However, the difficulty in accessing the data, the
unknown cost, and the weakness in the search
interface are major issues
Conclusions
There is potential.
So, Is Twitter mining useful?
Not yet.
So, Is it worth it?
Thank you!
Questions?

More Related Content

What's hot

Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 

What's hot (6)

Natural Language Processing with Graphs
Natural Language Processing with GraphsNatural Language Processing with Graphs
Natural Language Processing with Graphs
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 

Similar to II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?

In search for a good practice of finding information
In search for a good practice of finding informationIn search for a good practice of finding information
In search for a good practice of finding information
Kristian Norling
 

Similar to II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother? (20)

How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Make Enterprise Search Less Broken
Make Enterprise Search Less BrokenMake Enterprise Search Less Broken
Make Enterprise Search Less Broken
 
Make Enterprise Search Less Broken
Make Enterprise Search Less BrokenMake Enterprise Search Less Broken
Make Enterprise Search Less Broken
 
Word Cloud Plus with Will and Ray Poynter
Word Cloud Plus with Will and Ray PoynterWord Cloud Plus with Will and Ray Poynter
Word Cloud Plus with Will and Ray Poynter
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
Make Enterprise Search Less Broken
Make Enterprise Search Less BrokenMake Enterprise Search Less Broken
Make Enterprise Search Less Broken
 
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdf
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdfData_Science_Generating_Value_From_Data_Course_Slides_red.pdf
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdf
 
In search for a good practice of finding information
In search for a good practice of finding informationIn search for a good practice of finding information
In search for a good practice of finding information
 
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
 
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
 
BrightonSEO 2019 - Mining the SERP for SEO, Content & Customer Insights
BrightonSEO 2019 - Mining the SERP for SEO, Content & Customer InsightsBrightonSEO 2019 - Mining the SERP for SEO, Content & Customer Insights
BrightonSEO 2019 - Mining the SERP for SEO, Content & Customer Insights
 
Data Mining & Engineering
Data Mining & EngineeringData Mining & Engineering
Data Mining & Engineering
 
Going To A Data Hack - Govhack 2015
Going To A Data Hack - Govhack 2015 Going To A Data Hack - Govhack 2015
Going To A Data Hack - Govhack 2015
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Growing Your SharePoint Toddler to a Mature Solution
Growing Your SharePoint Toddler to a Mature SolutionGrowing Your SharePoint Toddler to a Mature Solution
Growing Your SharePoint Toddler to a Mature Solution
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah Guido
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 

More from Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 

Recently uploaded (20)

2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 

II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?

  • 1. Tweet Mining: Is It Useful and Should We Bother? Nils C. Newman Alan L. Porter & Jon Garner
  • 2. Science and Social Media – The New Frontier Background
  • 3. Treat Twitter as a new data source for S&T analysis • Think of Twitter in terms of any traditional data source – Patents, Scientific Publications, etc.. • Use our standard analysis techniques (VantagePoint) to look at search results on Graphene and Nano Enhanced Drug Delivery The only difference is… • Every abstract is only 140 characters long The Premise of our Pilot Project:
  • 4. There is a bit more than 140 characters of content to work with But….
  • 5. Anatomy of Tweet Tweet Sender Directed Tweet Hashtags Twitter Shorthand Links Retweet data And more!
  • 6. Given the combinations of names, links, re- tweet information, and other Twitter data, in theory we could: • Find key influence leaders • Discover emerging terminology • Track geographic spread • Track time trends • Etc… Things we could do….
  • 7. Now for the messy bits However….
  • 8. With the Twitter API you can search all of Twitter but the API only provides access to the last 8 days of data. If you want more data, you can • Build your own twitter database going forward • Purchase access to the Twitter “Firehose” to go back in time Twitter Data: Now you see it, now you don’t
  • 9. Who actually has access to the Twitter Firehose? • Yahoo, Google, MS • In 2010, seven companies where given access to the Firehose • In 2013, of those seven companies, none are still around The Quest for the Twitter Firehose
  • 10. After a bit of digging, we finally found current firehose providers who were still in business • One wouldn’t respond to inquiries • One has embedded it into their own analysis products • But finally, one did respond Our Firehose Odyssey
  • 12. With Topsy, we were able to • Get a key to access their Otter API • Use their search interface to search for “Graphene” • Successfully download 34,586 Tweets with coverage back to 2006 • Import the Tweets into VantagePoint for analysis Graphene: Progress with Topsy
  • 14. Then we looked at the data and found more messy bits…
  • 15. The first issue we ran into was translating Topsy Twitterese into something we could understand • Twitter specific jargon – Hashtag, directed tweet, RT, etc… • Date codes – In Unix Timestamp format • Topsy specific jargon and vaguely defined indicators – Hits – Score – Trackback totals You call this documentation?
  • 16. But eventually we sorted most of it out
  • 17. And we were able to do actual analysis
  • 20. • The data have a lot of noise – Order your Graphene t-shirt today! – Maria Sharapova wins with Graphene Instinct racket • There is a lot of gray data that are challenging to interpret – Graphene jobs available • There is also a reasonable amount of interesting stuff – Research funding announcements – Business information – Technical content But is it meaningful?
  • 22. NEDD presented more of a issue. • The Topsy search interface is a bit limited • Our Nano Enhanced Drug Delivery search strategy required complex Boolean, wildcards and nesting strategies • Topsy only allows simple boolean NEDD: Progress with Topsy
  • 23. Was more than messy… Our attempt
  • 24. The NEDD experiment produced a number of major issues • Search terms such as RNAi are words in that have other meanings in other languages so you have to control for language (which doesn’t always seem to work) • No wildcards or truncation - presented problems • Limitations on the Boolean was an issue NEDD “Results”
  • 25. The NEDD search was basically unusable without significant additional effort The Result
  • 26. The results of the pilot were a little more than mixed • The Graphene pilot was a positive experience • The NEDD pilot was pretty negative • We can see the potential but it is going to take a bit more work • However, the difficulty in accessing the data, the unknown cost, and the weakness in the search interface are major issues Conclusions
  • 27. There is potential. So, Is Twitter mining useful?
  • 28. Not yet. So, Is it worth it?