Every minute 8-10 months ago:• 48 hours of video are downloaded on Youtube• 320 new accounts and 98,000 tweets appear on Twitter• 168,000,000 million emails are sent• 20,000 new posts on Tumblr• 6,600 photos appear on Flickr• Over 20% of all websites are CMS/wordpress/etc…
Every minute today:• 60 hours of video are downloaded on Youtube• ??? new accounts and 236,000 tweets appear on Twitter• 204,000,000 million emails are sent• 28,000 new posts on Tumblr• 1,600 photos appear on Flickr !!! No shit!
But…• Facebook has lost 1.5 million users in Canada and 6 million in the United States• Yahoo study: 50% of the content that is read and shared by humans is produced by only 20, 000 accounts 0.05% @cgtheoret
In a lot of ways “Big Data” is like Oil…• Difficult and expensive to extract• Difficult and expensive to store and distribute• Cheapest in its unrefined form• More expensive at every step of refinement• Produces a plethora of derived products• and it’s actually quite “dirty”!!!! @cgtheoret
Social Data Analytics = Oil Refineries @cgtheoret
Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition VERACITY @cgtheoret
6 factors affect Data Veracity …1. Accuracy: Is it true?2. Precision: If true, error margin?3. Reliability: Is it there all the time?4. Provenance: Can you trace the source?5. Fidelity: Did it change from the source?6. Permission: Can you use it for the context? @cgtheoret