Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The opportunity for Social Data Scientists
@cgtheoret
Part 1
The Explosion
@cgtheoret
@cgtheoret
Every minute 8-10 months ago:
• 48 hours of video are downloaded on Youtube
• 320 new accounts and 98,000 tweets appear
on...
Every minute today:
• 100 hours of video are downloaded on
Youtube
• ??? new accounts and 236,000 tweets appear
on Twitter...
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
But…
• Facebook has lost 1.5 million users in Canada
and 6 million in the United States
• Yahoo study: 50% of the content ...
@cgtheoret
@cgtheoret
Gartner is predicting an explosion in Social
Media Analytics It spending
@cgtheoret
@cgtheoret
In a lot of ways Social “Big Data” is like Oil…
• Difficult and expensive to extract
@cgtheoret
Difficult and expensive to extract
@cgtheoret
Difficult and expensive to store and distribute
@cgtheoret
Cheapest (and least useful) when its unrefined
@cgtheoret
@cgtheoret
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Can’t be used by consumers unless refined
• More expensive at every step of ref...
The Market is Producing a plethora of derived
higher value data products
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Difficult and expensive to extract
• Difficult and expensive to store and distr...
@cgtheoret
Part 2
Social Data is one of the reasons why IBM added
a 4th V to the Big Data Definition
VERACITY
@cgtheoret
Social Data Analytics = Oil Refineries
@cgtheoret
6 factors affect Data Veracity …
1. Accuracy: Is it true?
2. Precision: If true, error margin?
3. Reliability: Is it there...
Black Hat SEO : Blogs
Twitter: 46% of brand followers are bots
Black Hat Social Marketing : Twitter
Or in some cases over 90 %…
Dissapearing Romney: FB as well…
And it is getting worse …
Trying to solve the Veracity problem …
Trying to solve the Veracity problem …
The Big Guys are now doing Veracity …
Murali Krishnam
<murali.krishnam@saama.com>Murali
Krishnam <murali.krishnam@saama.co...
@cgtheoret
Part 3
The Opportunity for Social Data Scientists
@cgtheoret
@cgtheoret
“McKinsey Global Institute
estimated that by 2018
there will be 4 million big
data related positions in
the U.S...
Zeitgeist
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret
cg.theoret@nexalogy.com
@cgtheoret
Merci!
École d'été: Web Science and the Mind :UQAM
Upcoming SlideShare
Loading in …5
×

École d'été: Web Science and the Mind :UQAM

281 views

Published on

Presentation to the Web Science summer school at UQAM, on the rise of the data scientist in the new economy

Published in: Internet
  • Be the first to comment

  • Be the first to like this

École d'été: Web Science and the Mind :UQAM

  1. 1. The opportunity for Social Data Scientists
  2. 2. @cgtheoret Part 1 The Explosion
  3. 3. @cgtheoret
  4. 4. @cgtheoret
  5. 5. Every minute 8-10 months ago: • 48 hours of video are downloaded on Youtube • 320 new accounts and 98,000 tweets appear on Twitter • 168,000,000 million emails are sent • 20,000 new posts on Tumblr • 6,600 photos appear on Flickr • Over 20% of all websites are CMS/wordpress/etc…
  6. 6. Every minute today: • 100 hours of video are downloaded on Youtube • ??? new accounts and 236,000 tweets appear on Twitter • 204,000,000 million emails are sent • 28,000 new posts on Tumblr • 1,600 photos appear on Flickr !!! No shit!
  7. 7. @cgtheoret
  8. 8. @cgtheoret
  9. 9. @cgtheoret
  10. 10. @cgtheoret
  11. 11. @cgtheoret
  12. 12. But… • Facebook has lost 1.5 million users in Canada and 6 million in the United States • Yahoo study: 50% of the content that is read and shared by humans is produced by only 20, 000 accounts 0.05% @cgtheoret
  13. 13. @cgtheoret
  14. 14. @cgtheoret Gartner is predicting an explosion in Social Media Analytics It spending
  15. 15. @cgtheoret
  16. 16. @cgtheoret
  17. 17. In a lot of ways Social “Big Data” is like Oil… • Difficult and expensive to extract @cgtheoret
  18. 18. Difficult and expensive to extract @cgtheoret
  19. 19. Difficult and expensive to store and distribute @cgtheoret
  20. 20. Cheapest (and least useful) when its unrefined @cgtheoret
  21. 21. @cgtheoret
  22. 22. @cgtheoret
  23. 23. In a lot of ways “Big Data” is like Oil… • Can’t be used by consumers unless refined • More expensive at every step of refinement @cgtheoret
  24. 24. The Market is Producing a plethora of derived higher value data products @cgtheoret
  25. 25. In a lot of ways “Big Data” is like Oil… • Difficult and expensive to extract • Difficult and expensive to store and distribute • Cheapest in its unrefined form • More expensive at every step of refinement • Produces a plethora of derived products • and it’s actually quite “dirty”!!!! @cgtheoret
  26. 26. @cgtheoret Part 2
  27. 27. Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition VERACITY @cgtheoret
  28. 28. Social Data Analytics = Oil Refineries @cgtheoret
  29. 29. 6 factors affect Data Veracity … 1. Accuracy: Is it true? 2. Precision: If true, error margin? 3. Reliability: Is it there all the time? 4. Provenance: Can you trace the source? 5. Fidelity: Did it change from the source? 6. Permission: Can you use it for the context? @cgtheoret
  30. 30. Black Hat SEO : Blogs
  31. 31. Twitter: 46% of brand followers are bots
  32. 32. Black Hat Social Marketing : Twitter
  33. 33. Or in some cases over 90 %…
  34. 34. Dissapearing Romney: FB as well…
  35. 35. And it is getting worse …
  36. 36. Trying to solve the Veracity problem …
  37. 37. Trying to solve the Veracity problem …
  38. 38. The Big Guys are now doing Veracity … Murali Krishnam <murali.krishnam@saama.com>Murali Krishnam <murali.krishnam@saama.com>
  39. 39. @cgtheoret Part 3 The Opportunity for Social Data Scientists
  40. 40. @cgtheoret
  41. 41. @cgtheoret “McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the U.S. that require quantitative and analytical skills. However, there will be a potential shortfall of 1.5 million data-savvy managers and analysts to fill these positions”
  42. 42. Zeitgeist @cgtheoret @fffady
  43. 43. @cgtheoret @fffady
  44. 44. @cgtheoret @fffady
  45. 45. @cgtheoret @fffady
  46. 46. @cgtheoret @fffady
  47. 47. @cgtheoret @fffady
  48. 48. @cgtheoret @fffady
  49. 49. @cgtheoret @fffady
  50. 50. @cgtheoret @fffady
  51. 51. @cgtheoret @fffady
  52. 52. @cgtheoret @fffady
  53. 53. @cgtheoret @fffady
  54. 54. @cgtheoret @fffady
  55. 55. @cgtheoret @fffady
  56. 56. @cgtheoret cg.theoret@nexalogy.com @cgtheoret Merci!

×