MetadataMore specifically reference information or reference dataAPI payload scalability
Social media dataRealtimeMany publishersMany data points to look at, many variables in terms of size of payload, number of activities, etcAlways building in scalable sense
Twitter firehose5000 – 7000 activites/secondSee spikes regularly of 20k + for backfill from twitter after an issueLarge social media events like the superbowl and MJ’s DeathHistorical accessEntire twitter archiveStore 1 PETABYTE of historical data in S3Ingesting a new 2 TB / DayCLOUD and DEDICATED hardwareServe over 95% of the fortune 500 with social data – RELIABILITYOver 120 billion activities every month
Rest and Streaming APIsMostly streaming is consumed today, might be migrating towards more use of our REST APIs
Jack Dorsey’s first tweetGnip’s historical version vs Twitter 1.1 api version ~24% payload increaseJack dorsey’s first tweet vs my tweet today - ~50% payload increase