SlideShare a Scribd company logo
1 of 18
We are losing our tweets! An analysis, a prototype, lessons learned, and proposed third party solution to the problem John O’Brien III @jobrieniii http://www.linkedin.com/in/jobrieniii
Twitter “Primer” Social network / micro blogging site Send / read 140 character messages You can follow anyone, and they can follow you Sent messages are delivered to all your followers Sent messages are also publically indexed and searchable Permissions can be established to restrict delivery, but this is not the norm
Problem As the usage of Twitter has exploded, Twitter’s ability to provide long term access to tweets that mention key events (typically #hashtag’ed) has eroded
First, who cares? Individuals Bloggers Conference Attendees / Leaders Academia / “Web” Ecologists Media Outlets Companies Government
So lets dive into the problem... Followers Search
Search UI / API Constraints Limited to keywords, #hashtags, or @mentions within 140 char body of tweet  100 tweets x 15 pages = 1500 per search term For a given keyword, exists in search for “around 1.5 weeks but is dynamic and subject to shrink as the number of tweets per day continues to grow.” – Twitter website
Hmmmm…. No other ‘in the cloud’ sites were found back in June, only client side applications and ‘hacked’ custom scripts RSS feeds were considered but initially dismissed because they typically require an end user client Decision was to “build our own” and see if we can solve the problem
A little bit about my thoughts on the SDLC process… **FOCUS** ON LEARNING “Minimally Viable” PROTOTYPE
“Minimally Viable” Micro App What if we could get ahead of the problem and store the data before Twitter “loses” it? Functional Requirements Ability for user to define #hashtags of importance Create a background script that leverages the Twitter /search REST API to keep an eye on each hash tag and store data in local database **Sweep, grab, and record…** Must be running at all times and publically available Technical Specs Build on LAMP stack, put into the cloud, running 24/7/365
“Minimally Viable” Micro App internet php script to  query each #hashtag Twitter /search API Our Database
TwapperKeeper.com “BETA”was born on Saturday and released to public on Sunday…
And we started to grow and get customer feedback…
And we lived through a key world event… http://mashable.com/2009/09/16/white-house-records/
So what did we learn? We need to be whitelisted People often don’t start the archiving until after they start using #hashtags Thus, point forward solution not enough, need to reach back as well While hashtags are the norm, some people would just like to track keywords Velocity of tweets can be a major issue What if a hashtag results are greater than 1500 tweets per minute?   Hashtags of archive interest typically spike in velocity and die off in traffic. However some archives get VERY, VERY big!
And more learning… URL shortening services are of long time concern to users and archiving community Twitter /search REST API periodically is unresponsive  Twitter /search REST API sometimes glitches and returns duplicate data People want not only output in html, but raw exports for publication, analysis and real time consumption (txt, csv, xml, json, etc) Twitter engineers contact us and recommend also incorporating newly releasedreal time streams  /track, /sample , /firehose
Recommended “out-of-beta” V2.0 Anticipate #hashtags to archive based upon Twitter trending stats and autocreate archives Hybrid approach of using /search and /track (real time stream) APIs to handle velocity issues Check for duplicates “before” inserts Implement monitoring and “self healing” services Shortened URLs should be resolved into fully qualified URLs and stored separately for reference (at time of capture) Create TwapperKeeper API by modularizing the archiving engine into a SOA architecture (/create, /info, /get) for internal and external consumption Include additional output formats to be provided for download “Extracts” of large archives should be automatically generated  on a daily basis and made available for download VERSION  2.0
Recommended “out-of-beta” V2.0 Twitter /track API hybrid php / curl script to archive per #hashtag Monitor Health and Self Heal Twitter /search API auto create trends Twitter /trends API Our Database File extractor api /create /info /get external sites short url lookup
Questions?

More Related Content

Similar to We are losing our tweets!

Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Matthew Russell
 
The Open Source... Behind the Tweets
The Open Source... Behind the TweetsThe Open Source... Behind the Tweets
The Open Source... Behind the TweetsChris Aniszczyk
 
APIs 101: What are they? What do they have to do with genealogy?
APIs 101: What are they? What do they have to do with genealogy?APIs 101: What are they? What do they have to do with genealogy?
APIs 101: What are they? What do they have to do with genealogy?Colleen Greene
 
Microformats 101 Workshop
Microformats 101 WorkshopMicroformats 101 Workshop
Microformats 101 WorkshopKelley Howell
 
Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at TwitterChris Aniszczyk
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
How PR can profit from RSS (March 2007)
How PR can profit from RSS (March 2007)How PR can profit from RSS (March 2007)
How PR can profit from RSS (March 2007)David Strom
 
Real-Time Web Overview
Real-Time Web OverviewReal-Time Web Overview
Real-Time Web OverviewColin Nekritz
 
@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets
@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets
@twitter Try out #Grabeeter to Export, Archive and Search Your TweetsMartin Ebner
 
Big Data Week 2013 Flow
Big Data Week 2013 FlowBig Data Week 2013 Flow
Big Data Week 2013 FlowVictor Anjos
 
CSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterCSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterMarcello Tomasini
 
Intranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital MediaIntranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital MediaPrescient Digital Media
 
What your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA ContentWhat your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA Contentctnitchie
 
Documenting APIs: Sample Code and More (with many pictures of cats)
Documenting APIs: Sample Code and More (with many pictures of cats)Documenting APIs: Sample Code and More (with many pictures of cats)
Documenting APIs: Sample Code and More (with many pictures of cats)Anya Stettler
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Why Twitter’s New Product “Curator” Could Save Social Marketers Hours
Why Twitter’s New Product “Curator” Could Save Social Marketers HoursWhy Twitter’s New Product “Curator” Could Save Social Marketers Hours
Why Twitter’s New Product “Curator” Could Save Social Marketers HoursMohamed Mahdy
 
"Pimp Up Your Stuff!": How To Exploit The Social Web
"Pimp Up Your Stuff!": How To Exploit The Social Web"Pimp Up Your Stuff!": How To Exploit The Social Web
"Pimp Up Your Stuff!": How To Exploit The Social Weblisbk
 

Similar to We are losing our tweets! (20)

Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
 
How to start using Twitter
How to start using TwitterHow to start using Twitter
How to start using Twitter
 
The Open Source... Behind the Tweets
The Open Source... Behind the TweetsThe Open Source... Behind the Tweets
The Open Source... Behind the Tweets
 
APIs 101: What are they? What do they have to do with genealogy?
APIs 101: What are they? What do they have to do with genealogy?APIs 101: What are they? What do they have to do with genealogy?
APIs 101: What are they? What do they have to do with genealogy?
 
Microformats 101 Workshop
Microformats 101 WorkshopMicroformats 101 Workshop
Microformats 101 Workshop
 
Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at Twitter
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Hacking For Innovation
Hacking For InnovationHacking For Innovation
Hacking For Innovation
 
How PR can profit from RSS (March 2007)
How PR can profit from RSS (March 2007)How PR can profit from RSS (March 2007)
How PR can profit from RSS (March 2007)
 
Andy McGregor, JISC
Andy McGregor, JISCAndy McGregor, JISC
Andy McGregor, JISC
 
Real-Time Web Overview
Real-Time Web OverviewReal-Time Web Overview
Real-Time Web Overview
 
@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets
@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets
@twitter Try out #Grabeeter to Export, Archive and Search Your Tweets
 
Big Data Week 2013 Flow
Big Data Week 2013 FlowBig Data Week 2013 Flow
Big Data Week 2013 Flow
 
CSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterCSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from Twitter
 
Intranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital MediaIntranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital Media
 
What your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA ContentWhat your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA Content
 
Documenting APIs: Sample Code and More (with many pictures of cats)
Documenting APIs: Sample Code and More (with many pictures of cats)Documenting APIs: Sample Code and More (with many pictures of cats)
Documenting APIs: Sample Code and More (with many pictures of cats)
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Why Twitter’s New Product “Curator” Could Save Social Marketers Hours
Why Twitter’s New Product “Curator” Could Save Social Marketers HoursWhy Twitter’s New Product “Curator” Could Save Social Marketers Hours
Why Twitter’s New Product “Curator” Could Save Social Marketers Hours
 
"Pimp Up Your Stuff!": How To Exploit The Social Web
"Pimp Up Your Stuff!": How To Exploit The Social Web"Pimp Up Your Stuff!": How To Exploit The Social Web
"Pimp Up Your Stuff!": How To Exploit The Social Web
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

We are losing our tweets!

  • 1. We are losing our tweets! An analysis, a prototype, lessons learned, and proposed third party solution to the problem John O’Brien III @jobrieniii http://www.linkedin.com/in/jobrieniii
  • 2. Twitter “Primer” Social network / micro blogging site Send / read 140 character messages You can follow anyone, and they can follow you Sent messages are delivered to all your followers Sent messages are also publically indexed and searchable Permissions can be established to restrict delivery, but this is not the norm
  • 3. Problem As the usage of Twitter has exploded, Twitter’s ability to provide long term access to tweets that mention key events (typically #hashtag’ed) has eroded
  • 4. First, who cares? Individuals Bloggers Conference Attendees / Leaders Academia / “Web” Ecologists Media Outlets Companies Government
  • 5. So lets dive into the problem... Followers Search
  • 6. Search UI / API Constraints Limited to keywords, #hashtags, or @mentions within 140 char body of tweet 100 tweets x 15 pages = 1500 per search term For a given keyword, exists in search for “around 1.5 weeks but is dynamic and subject to shrink as the number of tweets per day continues to grow.” – Twitter website
  • 7. Hmmmm…. No other ‘in the cloud’ sites were found back in June, only client side applications and ‘hacked’ custom scripts RSS feeds were considered but initially dismissed because they typically require an end user client Decision was to “build our own” and see if we can solve the problem
  • 8. A little bit about my thoughts on the SDLC process… **FOCUS** ON LEARNING “Minimally Viable” PROTOTYPE
  • 9. “Minimally Viable” Micro App What if we could get ahead of the problem and store the data before Twitter “loses” it? Functional Requirements Ability for user to define #hashtags of importance Create a background script that leverages the Twitter /search REST API to keep an eye on each hash tag and store data in local database **Sweep, grab, and record…** Must be running at all times and publically available Technical Specs Build on LAMP stack, put into the cloud, running 24/7/365
  • 10. “Minimally Viable” Micro App internet php script to query each #hashtag Twitter /search API Our Database
  • 11. TwapperKeeper.com “BETA”was born on Saturday and released to public on Sunday…
  • 12. And we started to grow and get customer feedback…
  • 13. And we lived through a key world event… http://mashable.com/2009/09/16/white-house-records/
  • 14. So what did we learn? We need to be whitelisted People often don’t start the archiving until after they start using #hashtags Thus, point forward solution not enough, need to reach back as well While hashtags are the norm, some people would just like to track keywords Velocity of tweets can be a major issue What if a hashtag results are greater than 1500 tweets per minute? Hashtags of archive interest typically spike in velocity and die off in traffic. However some archives get VERY, VERY big!
  • 15. And more learning… URL shortening services are of long time concern to users and archiving community Twitter /search REST API periodically is unresponsive Twitter /search REST API sometimes glitches and returns duplicate data People want not only output in html, but raw exports for publication, analysis and real time consumption (txt, csv, xml, json, etc) Twitter engineers contact us and recommend also incorporating newly releasedreal time streams /track, /sample , /firehose
  • 16. Recommended “out-of-beta” V2.0 Anticipate #hashtags to archive based upon Twitter trending stats and autocreate archives Hybrid approach of using /search and /track (real time stream) APIs to handle velocity issues Check for duplicates “before” inserts Implement monitoring and “self healing” services Shortened URLs should be resolved into fully qualified URLs and stored separately for reference (at time of capture) Create TwapperKeeper API by modularizing the archiving engine into a SOA architecture (/create, /info, /get) for internal and external consumption Include additional output formats to be provided for download “Extracts” of large archives should be automatically generated on a daily basis and made available for download VERSION 2.0
  • 17. Recommended “out-of-beta” V2.0 Twitter /track API hybrid php / curl script to archive per #hashtag Monitor Health and Self Heal Twitter /search API auto create trends Twitter /trends API Our Database File extractor api /create /info /get external sites short url lookup

Editor's Notes

  1. Love the circle!