Your SlideShare is downloading. ×
0
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

More Than Websites: PHP And The Firehose @DataSift (2013)

14,094

Published on

PHP is the world's #1 programming language for creating websites. But it's capable of so much more. How about real-time processing the social firehose? :)

PHP is the world's #1 programming language for creating websites. But it's capable of so much more. How about real-time processing the social firehose? :)

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,094
On Slideshare
0
From Embeds
0
Number of Embeds
51
Actions
Shares
0
Downloads
50
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. More Than Websites And The Firehose @Saturday, 23 March 13
  • 2. Introduce Yourselves @Saturday, 23 March 13
  • 3. @stuherbert @Saturday, 23 March 13
  • 4. What is @Saturday, 23 March 13
  • 5. Sift through social data Twitter firehose, Facebook, bitly clicks, news, videos, comments and more @Saturday, 23 March 13
  • 6. Gain insights using augmentations Language, gender, trends, links, sentiment, salience & entity analysis and more @Saturday, 23 March 13
  • 7. Realtime Get matching data within seconds of it being posted @Saturday, 23 March 13
  • 8. Historics Search our social data archive going back to January 2010 @Saturday, 23 March 13
  • 9. Pull the data from our servers via HTTP/1.1 streaming or websockets @Saturday, 23 March 13
  • 10. Let us push data to you Have the data delivered directly to your servers or into your databases @Saturday, 23 March 13
  • 11. in numbers @Saturday, 23 March 13
  • 12. 30 Sources of social data and data augmentations @Saturday, 23 March 13
  • 13. Up to 20,000 Number of new pieces of data ingested into DataSift every second @Saturday, 23 March 13
  • 14. 3 Terabytes Amount of new data added to the Historics archive every week @Saturday, 23 March 13
  • 15. 12 Different ways we can deliver data to you @Saturday, 23 March 13
  • 16. 1 Average number of seconds to pass the data through DataSift @Saturday, 23 March 13
  • 17. 12 Number of services data passes through inside DataSift @Saturday, 23 March 13
  • 18. 25 Number of engineers who write code for the DataSift platform @Saturday, 23 March 13
  • 19. 5 Primary programming languages: C++, Node, PHP, Python, Scala @Saturday, 23 March 13
  • 20. 154 Private GitHub repos @Saturday, 23 March 13
  • 21. PHP Java & Scala C & C++ JS & Node Unclassified Python Shell Script Ruby C# VimL 0 15 30 45 60 Our GitHub Repositories @Saturday, 23 March 13
  • 22. Architecture @Saturday, 23 March 13
  • 23. Three major data pipelines + supporting services @Saturday, 23 March 13
  • 24. Data Archiving Adds new data to the Historics Archive @Saturday, 23 March 13
  • 25. Filtering Pipeline Filtering and delivery of data in realtime @Saturday, 23 March 13
  • 26. Playback Pipeline Filtering and delivery of data from the Historics Archive @Saturday, 23 March 13
  • 27. DataSift Architecture 2.2 HBase Cluster @lorenzoalberton Data ingestion + Augmentation Ultrahose HDFS Kafka Input Streams Goblin Head Goblin Tail Ultrahose Region 1 Region 2 ... Region N Msg splitter Archiver Archiver Archiver Stream Twitter Goblin Head Goblin Tail Splitter/Joiner Deduper Goblin Head Goblin Tail Augmentation Pipeline Bit.ly HttpStreaming, PuSH, Search Redis Deletes Ogre Ogre Ogre Processor Language Hadoop Facebook Detection 100% Data Node Data Node Data Node Data Node Data Node Wikipedia Ogre Interaction ... Interaction Ogre Ogre Ogre Ogre Trends Sentiment Targets Targets Reddit Ogre Demographics Analysis Analysis Mapping Mapping LexisNexis Interaction Interaction Ogre Ogre Ogre Ogre Ogre Ogre ... Generation Generation Meltwater Ogre Ogre Ogre Ogre Ogre Topics Klout Named Estimize Ogre Analysis Score + Profile Entities Filtering Filtering Digg Ogre Tardis ... Tardis Ogre Ogre Ogre Ogre Links Resolution Ogre 3rd party APIs Pickle Pickle + OpenGraph Stream Kafka Ogre Ogre Ogre Ogre Ogre Ogre + Twitter Cards NewsCred Ogre Recorder Map/Reduce Historical Queries Ogre Ogre Ogre Ogre + Metadata BoardReader Ogre 100% MySpace Titan Historics Stream results SuperFeeder Prism Control jobs chunks chunk job 100% 100% Channels DB DB selector tracker Historics s ult Time Machine + Insights Scheduler PickleDB . res Post-Processing, Stream Analytics DB Node Shard Node Shard Node Shard Node Shard am re St Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Recording Node Node Node Node Node Node Node Node CSDL Compiler, Filtering Scheduler push push Validator, Pickle push Pickle Pickle Engine Pickle push Pickle Pickle Pickle Pickle Normaliser Node Node Node Node Node Node Node Node Push Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Scheduler Node Node Node Node Exports and Node Node Node Node Analytics (D5) Hardware Definition . Load Balancer DB Meteor Node Manager . WebSockets Node @datasift Real-time Streams HTTPStreaming Node Stream . ACL ACL DB EDRs (with interaction API Manager . (with interaction (licensed counter) content counter) metrics) Mask Snapshotter Worker DB WEB Manager HTTP Request Worker Buffered Redis Streams GET batch Worker Delivery Subscriptions Monitoring Kafka Connection Queue Manager Authentication PUSH PUSH job queue DB Manager Producer Scheduler tracker Billing Pipeline DB Connections Storage Subscriptions HTTP(S) POST Events DB (S)FTP Notification License Storage Service Manager DB Amazon S3 Cloud Storage DynamoDB kafka-consumer subscription X Microsoft Azure DBs Limit Monitoring Audit DB MongoDB Manager Aggregator BI tools subscription Y Oracle PUSH CouchDB Stop Delivery IBM Cognos PUB Google BigQuery DataSift Technical Architecture @Saturday, 23 March 13
  • 28. DataSift Architecture 2.2 HBase Cluster @lorenzoalberton Data ingestion + Augmentation Ultrahose HDFS Kafka Input Streams Goblin Head Goblin Tail Ultrahose Region 1 Region 2 ... Region N Msg splitter Archiver Archiver Archiver Stream Twitter Goblin Head Goblin Tail Splitter/Joiner Deduper Goblin Head Goblin Tail Augmentation Pipeline Bit.ly HttpStreaming, PuSH, Search Redis Deletes Ogre Ogre Ogre Processor Language Hadoop Facebook Detection 100% Data Node Data Node Data Node Data Node Data Node Wikipedia Ogre Interaction ... Interaction Ogre Ogre Ogre Ogre Trends Sentiment Targets Targets Reddit Ogre Demographics Analysis Analysis Mapping Mapping LexisNexis Interaction Interaction Ogre Ogre Ogre Ogre Ogre Ogre ... Generation Generation Meltwater Ogre Ogre Ogre Ogre Ogre Topics Klout Named Estimize Ogre Analysis Score + Profile Entities Filtering Filtering Digg Ogre Tardis ... Tardis Ogre Ogre Ogre Ogre Links Resolution Ogre 3rd party APIs Pickle Pickle + OpenGraph Stream Kafka Ogre Ogre Ogre Ogre Ogre Ogre + Twitter Cards NewsCred Ogre Recorder Map/Reduce Historical Queries Ogre Ogre Ogre Ogre + Metadata BoardReader Ogre 100% MySpace Titan Historics Stream results SuperFeeder Prism Control jobs chunks chunk job 100% 100% Channels DB DB selector tracker Historics s ult Time Machine + Insights Scheduler PickleDB . res Post-Processing, Stream Analytics DB Node Shard Node Shard Node Shard Node Shard am re St Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Recording Node Node Node Node Node Node Node Node CSDL Compiler, Filtering Scheduler push push Validator, Pickle push Pickle Pickle Engine Pickle push Pickle Pickle Pickle Pickle Normaliser Node Node Node Node Node Node Node Node Push Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Scheduler Node Node Node Node Exports and Node Node Node Node Analytics (D5) Hardware Definition . Load Balancer DB Meteor Node Manager . WebSockets Node @datasift Real-time Streams HTTPStreaming Node Stream . ACL ACL DB EDRs (with interaction API Manager . (with interaction (licensed counter) content counter) metrics) Mask Snapshotter Worker DB WEB Manager HTTP Request Worker Buffered Redis Streams GET batch Worker Delivery Subscriptions Monitoring Kafka Connection Queue Manager Authentication PUSH PUSH job queue DB Manager Producer Scheduler tracker Billing Pipeline DB Connections Storage Subscriptions HTTP(S) POST Events DB (S)FTP Notification License Storage Service Manager DB Amazon S3 Cloud Storage DynamoDB kafka-consumer kafka-consumer subscription X Microsoft Azure DBs Limit Monitoring Audit DB MongoDB Manager Aggregator BI tools subscription Y Oracle PUSH CouchDB Stop Delivery IBM Cognos PUB Google BigQuery Filtering Pipeline @Saturday, 23 March 13
  • 29. DataSift Architecture 2.2 HBase Cluster @lorenzoalberton Data ingestion + Augmentation Ultrahose HDFS Kafka Kafka Input Streams Goblin Head Goblin Tail Ultrahose Region 1 Region 2 ... Region N Msg splitter Archiver Archiver Archiver Stream Twitter Goblin Head Goblin Tail Splitter/Joiner Deduper Goblin Head Goblin Tail Augmentation Pipeline Bit.ly HttpStreaming, PuSH, Search Redis Deletes Ogre Ogre Ogre Processor Language Hadoop Facebook Detection 100% Data Node Data Node Data Node Data Node Data Node Wikipedia Ogre Interaction ... Interaction Ogre Ogre Ogre Ogre Trends Sentiment Targets Targets Reddit Ogre Demographics Analysis Analysis Mapping Mapping LexisNexis Interaction Interaction Ogre Ogre Ogre Ogre Ogre Ogre ... Generation Generation Meltwater Ogre Ogre Ogre Ogre Ogre Topics Klout Named Estimize Ogre Analysis Score + Profile Entities Filtering Filtering Digg Ogre Tardis ... Tardis Ogre Ogre Ogre Ogre Links Resolution Ogre 3rd party APIs Pickle Pickle + OpenGraph Stream Kafka Ogre Ogre Ogre Ogre Ogre Ogre + Twitter Cards NewsCred Ogre Recorder Map/Reduce Historical Queries Ogre Ogre Ogre Ogre + Metadata BoardReader Ogre 100% MySpace Titan Historics Stream results SuperFeeder Prism Control jobs chunks chunk job 100% 100% Channels DB DB selector tracker Historics s ult Time Machine + Insights Scheduler PickleDB . res Post-Processing, Stream Analytics DB Node Shard Node Shard Node Shard Node Shard am re St Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Recording Node Node Node Node Node Node Node Node CSDL Compiler, Filtering Scheduler push push Validator, Pickle push Pickle Pickle Engine Pickle push Pickle Pickle Pickle Pickle Normaliser Node Node Node Node Node Node Node Node Push Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Scheduler Node Node Node Node Exports and Node Node Node Node Analytics (D5) Hardware Definition . Load Balancer DB Meteor Node Manager . WebSockets Node @datasift Real-time Streams HTTPStreaming Node Stream . ACL ACL DB EDRs (with interaction API Manager . (with interaction (licensed counter) content counter) metrics) Mask Snapshotter Worker DB WEB Manager HTTP Request Worker Buffered Redis Streams GET batch Worker Delivery Subscriptions Monitoring Kafka Connection Queue Manager Authentication PUSH PUSH job queue DB Manager Producer Scheduler tracker Billing Pipeline DB Connections Storage Subscriptions HTTP(S) POST Events DB (S)FTP Notification License Storage Service Manager DB Amazon S3 Cloud Storage DynamoDB kafka-consumer subscription X Microsoft Azure DBs Limit Monitoring Audit DB MongoDB Manager Aggregator BI tools subscription Y Oracle PUSH CouchDB Stop Delivery IBM Cognos PUB Google BigQuery Data Archiving Pipeline @Saturday, 23 March 13
  • 30. DataSift Architecture 2.2 HBase Cluster @lorenzoalberton Data ingestion + Augmentation Ultrahose HDFS Kafka Input Streams Goblin Head Goblin Tail Ultrahose Region 1 Region 2 ... Region N Msg splitter Archiver Archiver Archiver Stream Twitter Goblin Head Goblin Tail Splitter/Joiner Deduper Goblin Head Goblin Tail Augmentation Pipeline Bit.ly HttpStreaming, PuSH, Search Redis Deletes Ogre Ogre Ogre Processor Language Hadoop Facebook Detection 100% Data Node Data Node Data Node Data Node Data Node Wikipedia Ogre Interaction ... Interaction Ogre Ogre Ogre Ogre Trends Sentiment Targets Targets Reddit Ogre Demographics Analysis Analysis Mapping Mapping LexisNexis Interaction Interaction Ogre Ogre Ogre Ogre Ogre Ogre ... Generation Generation Meltwater Ogre Ogre Ogre Ogre Ogre Topics Klout Named Estimize Ogre Analysis Score + Profile Entities Filtering Filtering Digg Ogre Tardis ... Tardis Ogre Ogre Ogre Ogre Links Resolution Ogre 3rd party APIs Pickle Pickle + OpenGraph Stream Kafka Ogre Ogre Ogre Ogre Ogre Ogre + Twitter Cards NewsCred Ogre Recorder Map/Reduce Historical Queries Ogre Ogre Ogre Ogre + Metadata BoardReader Ogre 100% MySpace Titan Historics Stream results SuperFeeder Prism Control jobs chunks chunk job 100% 100% Channels DB DB selector tracker Historics s ult Time Machine + Insights Scheduler PickleDB . res Post-Processing, Stream Analytics DB Node Shard Node Shard Node Shard Node Shard am re St Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Recording Node Node Node Node Node Node Node Node CSDL Compiler, Filtering Scheduler push push Validator, Pickle push Pickle Pickle Engine Pickle push Pickle Pickle Pickle Pickle Normaliser Node Node Node Node Node Node Node Node Push Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Scheduler Node Node Node Node Exports and Node Node Node Node Analytics (D5) Hardware Definition . Load Balancer DB Meteor Node Manager . WebSockets Node @datasift Real-time Streams HTTPStreaming Node Stream . ACL ACL DB EDRs (with interaction API Manager . (with interaction (licensed counter) content counter) metrics) Mask Snapshotter Worker DB WEB Manager HTTP Request Worker Buffered Redis Streams GET batch Worker Delivery Subscriptions Monitoring Kafka Connection Queue Manager Authentication PUSH PUSH job queue DB Manager Producer Scheduler tracker Billing Pipeline DB Connections Storage Subscriptions HTTP(S) POST Events DB (S)FTP Notification License Storage Service Manager DB Amazon S3 Cloud Storage DynamoDB kafka-consumer kafka-consumer subscription X Microsoft Azure DBs Limit Monitoring Audit DB MongoDB Manager Aggregator BI tools subscription Y Oracle PUSH CouchDB Stop Delivery IBM Cognos PUB Google BigQuery Playback Pipeline @Saturday, 23 March 13
  • 31. DataSift Architecture 2.2 HBase Cluster @lorenzoalberton Data ingestion + Augmentation Ultrahose HDFS Kafka Input Streams Goblin Head Goblin Tail Ultrahose Region 1 Region 2 ... Region N Msg splitter Archiver Archiver Archiver Stream Twitter Goblin Head Goblin Tail Splitter/Joiner Deduper Goblin Head Goblin Tail Augmentation Pipeline Bit.ly HttpStreaming, PuSH, Search Redis Deletes Ogre Ogre Ogre Processor Language Hadoop Facebook Detection 100% Data Node Data Node Data Node Data Node Data Node Wikipedia Ogre Interaction ... Interaction Ogre Ogre Ogre Ogre Trends Sentiment Targets Targets Reddit Ogre Demographics Analysis Analysis Mapping Mapping LexisNexis Interaction Interaction Ogre Ogre Ogre Ogre Ogre Ogre ... Generation Generation Meltwater Ogre Ogre Ogre Ogre Ogre Topics Klout Named Estimize Ogre Analysis Score + Profile Entities Filtering Filtering Digg Ogre Tardis ... Tardis Ogre Ogre Ogre Ogre Links Resolution Ogre 3rd party APIs Pickle Pickle + OpenGraph Stream Kafka Ogre Ogre Ogre Ogre Ogre Ogre + Twitter Cards NewsCred Ogre Recorder Map/Reduce Historical Queries Ogre Ogre Ogre Ogre + Metadata BoardReader Ogre 100% MySpace Titan Historics Stream results SuperFeeder Prism Control jobs chunks chunk job 100% 100% Channels DB DB selector tracker Historics s ult Time Machine + Insights Scheduler PickleDB . res Post-Processing, Stream Analytics DB Node Shard Node Shard Node Shard Node Shard am re St Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Recording Node Node Node Node Node Node Node Node CSDL Compiler, Filtering Scheduler push push Validator, Pickle push Pickle Pickle Engine Pickle push Pickle Pickle Pickle Pickle Normaliser Node Node Node Node Node Node Node Node Push Pickle Pickle Pickle Pickle Pickle Pickle Pickle Pickle Scheduler Node Node Node Node Exports and Node Node Node Node Analytics (D5) Hardware Definition . Load Balancer DB Meteor Node Manager . WebSockets Node @datasift Real-time Streams HTTPStreaming Node Stream . ACL ACL DB EDRs (with interaction API Manager . (with interaction (licensed counter) content counter) metrics) Mask Snapshotter Worker DB WEB Manager HTTP Request Worker Buffered Redis Streams GET batch Worker Delivery Subscriptions Monitoring Kafka Connection Queue Manager Authentication PUSH PUSH job queue DB Manager Producer Scheduler tracker Billing Pipeline DB Connections Storage Subscriptions HTTP(S) POST Events DB (S)FTP Notification License Storage Service Manager DB Amazon S3 Cloud Storage DynamoDB kafka-consumer kafka-consumer subscription X Microsoft Azure DBs Limit Monitoring Audit DB MongoDB Manager Aggregator BI tools subscription Y Oracle PUSH CouchDB Stop Delivery IBM Cognos PUB Google BigQuery Written In PHP @Saturday, 23 March 13
  • 32. 100% Every piece of data is handled by our PHP code in realtime @Saturday, 23 March 13
  • 33. What we do in @Saturday, 23 March 13
  • 34. Marketing website Runs on Drupal @Saturday, 23 March 13
  • 35. Our main webapp Customer signup, stream creation, account management @Saturday, 23 March 13
  • 36. Our external API Our main interface with customers @Saturday, 23 March 13
  • 37. Boring! That’s all very standard stuff, well understood The interesting uses are behind the scenes @Saturday, 23 March 13
  • 38. Behind the scenes? Are you mad?!? Everyone knows that PHP is only for building websites! @Saturday, 23 March 13
  • 39. Internal services APIs that support our data pipelines User management, billing, data security @Saturday, 23 March 13
  • 40. Data assembly Convert incoming data into common ‘interaction’ structure @Saturday, 23 March 13
  • 41. 100% Every piece of data is handled by our PHP code in realtime @Saturday, 23 March 13
  • 42. Push delivery Outbound delivery of data to customers’ servers and into their databases @Saturday, 23 March 13
  • 43. 1 MP3/sec How much data we can deliver to a single EC2 micro-instance @Saturday, 23 March 13
  • 44. 500 Number of simultaneous deliveries to customers every second @Saturday, 23 March 13
  • 45. Hornet Our EvilTestTool(tm) Designed to melt the data centre @Saturday, 23 March 13
  • 46. Storyteller Our functional test tool Brings user stories to life Fires up VMs, deploys code, tests services Reproducibly @Saturday, 23 March 13
  • 47. Why @Saturday, 23 March 13
  • 48. Our History DataSift grew out of TweetMeme @Saturday, 23 March 13
  • 49. Our Product PHP is superb at handling unstructured data @Saturday, 23 March 13
  • 50. Our Customers PHP can talk to any server, database / datastore that we want to deliver data to @Saturday, 23 March 13
  • 51. Our People Several ‘names’ from PHP community PHP is a language most engineers know @Saturday, 23 March 13
  • 52. Our Time PHP is a great language to build high-quality code very very quickly @Saturday, 23 March 13
  • 53. Our Performance PHP is fast enough for data assembly work and is getting faster with every major release @Saturday, 23 March 13
  • 54. Our Sanity Our PHP applications require less Ops time than any of the others @Saturday, 23 March 13
  • 55. frameworks @Saturday, 23 March 13
  • 56. Rolled our own Frink & Stone @Saturday, 23 March 13
  • 57. Right choice for us We’re not part of the target demographic for the major PHP frameworks (nor the minor ones, tbh) @Saturday, 23 March 13
  • 58. Frink Tweetmeme’s framework built to handle millions of tweeted links a day @Saturday, 23 March 13
  • 59. Built for speed Stripped down to the bare essentials a reaction to experience with early Zend Framework @Saturday, 23 March 13
  • 60. Jobqueues Long-running daemon processes Worker processes handle data queues Manager process monitors workers @Saturday, 23 March 13
  • 61. Stone Foundation of our in-house test tools Hornet and Storyteller @Saturday, 23 March 13
  • 62. Built for speed Powers our fake Twitter firehose used for testing @Saturday, 23 March 13
  • 63. Built for inspection Allows us to measure activity normally hidden by libraries and PHP extensions @Saturday, 23 March 13
  • 64. tools & utilities @Saturday, 23 March 13
  • 65. PHP 5.3.latest Compiled in-house Extensions statically-linked for performance @Saturday, 23 March 13
  • 66. ZeroMQ extension Transport layer for our pipelines @Saturday, 23 March 13
  • 67. APC extension Shared memory for app metrics PHP is too slow without an opcache Lack of APC has prevented us moving to PHP 5.4 @Saturday, 23 March 13
  • 68. XHProf extension For profiling code Skews the results less than Xdebug @Saturday, 23 March 13
  • 69. Redis extension Buffering and queueing (being phased out) @Saturday, 23 March 13
  • 70. Xdebug For code coverage metrics (and readable vardump()s!) @Saturday, 23 March 13
  • 71. PHPunit For all our unit tests @Saturday, 23 March 13
  • 72. phpdoc2 For code documentation (although nobody reads it - code is king) @Saturday, 23 March 13
  • 73. Maven For building all release RPM packages @Saturday, 23 March 13
  • 74. Jenkins Continuous integration @Saturday, 23 March 13
  • 75. RPM Packages for deployment into dev, test, staging, and production @Saturday, 23 March 13
  • 76. Thank you PS: We’re hiring :-) @Saturday, 23 March 13

×