The opportunity for Social Data Scientists
@cgtheoret
Part 1
The Explosion
@cgtheoret
@cgtheoret
Every minute 8-10 months ago:
• 48 hours of video are downloaded on Youtube
• 320 new accounts and 98,000 tweets appear
on...
Every minute today:
• 100 hours of video are downloaded on
Youtube
• ??? new accounts and 236,000 tweets appear
on Twitter...
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
But…
• Facebook has lost 1.5 million users in Canada
and 6 million in the United States
• Yahoo study: 50% of the content ...
@cgtheoret
@cgtheoret
Gartner is predicting an explosion in Social
Media Analytics It spending
@cgtheoret
@cgtheoret
In a lot of ways Social “Big Data” is like Oil…
• Difficult and expensive to extract
@cgtheoret
Difficult and expensive to extract
@cgtheoret
Difficult and expensive to store and distribute
@cgtheoret
Cheapest (and least useful) when its unrefined
@cgtheoret
@cgtheoret
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Can’t be used by consumers unless refined
• More expensive at every step of ref...
The Market is Producing a plethora of derived
higher value data products
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Difficult and expensive to extract
• Difficult and expensive to store and distr...
@cgtheoret
Part 2
Social Data is one of the reasons why IBM added
a 4th V to the Big Data Definition
VERACITY
@cgtheoret
Social Data Analytics = Oil Refineries
@cgtheoret
6 factors affect Data Veracity …
1. Accuracy: Is it true?
2. Precision: If true, error margin?
3. Reliability: Is it there...
Black Hat SEO : Blogs
Twitter: 46% of brand followers are bots
Black Hat Social Marketing : Twitter
Or in some cases over 90 %…
Dissapearing Romney: FB as well…
And it is getting worse …
Trying to solve the Veracity problem …
Trying to solve the Veracity problem …
The Big Guys are now doing Veracity …
Murali Krishnam
<murali.krishnam@saama.com>Murali
Krishnam <murali.krishnam@saama.co...
@cgtheoret
Part 3
The Opportunity for Social Data Scientists
@cgtheoret
@cgtheoret
“McKinsey Global Institute
estimated that by 2018
there will be 4 million big
data related positions in
the U.S...
Zeitgeist
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret
cg.theoret@nexalogy.com
@cgtheoret
Merci!
École d'été: Web Science and the Mind :UQAM
Upcoming SlideShare
Loading in...5
×

École d'été: Web Science and the Mind :UQAM

114
-1

Published on

Presentation to the Web Science summer school at UQAM, on the rise of the data scientist in the new economy

Published in: Internet
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
114
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The Social Data Revolution
  • If we look at the relationships of what people are saying we can start to spot memes… memes are proto-ideas that are taking shape in society
  • With enough of these we can cleary see the connections and patterns in the chaos, we can actually start to measure the « zeitgeist »
  • So that everyone with a social sciences degree with these tools will be able to see new connexions and play a vital role in exploring the largest human behavior dataset we have ever reated….
  • Our current understaning of human behavior is based on surveys, polls that try to use people’s race, gender, religion, age and income to classify their behavior….
  • But with the social graph we can address people by their passions, their interests, we just have to be able to exploit and understand the interest graph
  • Initially the people who can do this are being called « social data scientists » this is a brand new field… there isn’t really a name yet… I prefer to call them Anthropomant people… because the understanding of human behaior that we have from fields such as Sociology, poli sci, semiotics, history, anthropolgy are essential to intepreting this mass of social data….
  • Initially the people who can do this are being called « social data scientists » this is a brand new field… there isn’t really a name yet… I prefer to call them Anthropomant people… because the understanding of human behaior that we have from fields such as Sociology, poli sci, semiotics, history, anthropolgy are essential to intepreting this mass of social data….
  • Initially the people who can do this are being called « social data scientists » this is a brand new field… there isn’t really a name yet… I prefer to call them Anthropomant people… because the understanding of human behaior that we have from fields such as Sociology, poli sci, semiotics, history, anthropolgy are essential to intepreting this mass of social data….
  • The good news is that all of the hard stuff is now being coded by teams of engineers….
  • École d'été: Web Science and the Mind :UQAM

    1. 1. The opportunity for Social Data Scientists
    2. 2. @cgtheoret Part 1 The Explosion
    3. 3. @cgtheoret
    4. 4. @cgtheoret
    5. 5. Every minute 8-10 months ago: • 48 hours of video are downloaded on Youtube • 320 new accounts and 98,000 tweets appear on Twitter • 168,000,000 million emails are sent • 20,000 new posts on Tumblr • 6,600 photos appear on Flickr • Over 20% of all websites are CMS/wordpress/etc…
    6. 6. Every minute today: • 100 hours of video are downloaded on Youtube • ??? new accounts and 236,000 tweets appear on Twitter • 204,000,000 million emails are sent • 28,000 new posts on Tumblr • 1,600 photos appear on Flickr !!! No shit!
    7. 7. @cgtheoret
    8. 8. @cgtheoret
    9. 9. @cgtheoret
    10. 10. @cgtheoret
    11. 11. @cgtheoret
    12. 12. But… • Facebook has lost 1.5 million users in Canada and 6 million in the United States • Yahoo study: 50% of the content that is read and shared by humans is produced by only 20, 000 accounts 0.05% @cgtheoret
    13. 13. @cgtheoret
    14. 14. @cgtheoret Gartner is predicting an explosion in Social Media Analytics It spending
    15. 15. @cgtheoret
    16. 16. @cgtheoret
    17. 17. In a lot of ways Social “Big Data” is like Oil… • Difficult and expensive to extract @cgtheoret
    18. 18. Difficult and expensive to extract @cgtheoret
    19. 19. Difficult and expensive to store and distribute @cgtheoret
    20. 20. Cheapest (and least useful) when its unrefined @cgtheoret
    21. 21. @cgtheoret
    22. 22. @cgtheoret
    23. 23. In a lot of ways “Big Data” is like Oil… • Can’t be used by consumers unless refined • More expensive at every step of refinement @cgtheoret
    24. 24. The Market is Producing a plethora of derived higher value data products @cgtheoret
    25. 25. In a lot of ways “Big Data” is like Oil… • Difficult and expensive to extract • Difficult and expensive to store and distribute • Cheapest in its unrefined form • More expensive at every step of refinement • Produces a plethora of derived products • and it’s actually quite “dirty”!!!! @cgtheoret
    26. 26. @cgtheoret Part 2
    27. 27. Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition VERACITY @cgtheoret
    28. 28. Social Data Analytics = Oil Refineries @cgtheoret
    29. 29. 6 factors affect Data Veracity … 1. Accuracy: Is it true? 2. Precision: If true, error margin? 3. Reliability: Is it there all the time? 4. Provenance: Can you trace the source? 5. Fidelity: Did it change from the source? 6. Permission: Can you use it for the context? @cgtheoret
    30. 30. Black Hat SEO : Blogs
    31. 31. Twitter: 46% of brand followers are bots
    32. 32. Black Hat Social Marketing : Twitter
    33. 33. Or in some cases over 90 %…
    34. 34. Dissapearing Romney: FB as well…
    35. 35. And it is getting worse …
    36. 36. Trying to solve the Veracity problem …
    37. 37. Trying to solve the Veracity problem …
    38. 38. The Big Guys are now doing Veracity … Murali Krishnam <murali.krishnam@saama.com>Murali Krishnam <murali.krishnam@saama.com>
    39. 39. @cgtheoret Part 3 The Opportunity for Social Data Scientists
    40. 40. @cgtheoret
    41. 41. @cgtheoret “McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the U.S. that require quantitative and analytical skills. However, there will be a potential shortfall of 1.5 million data-savvy managers and analysts to fill these positions”
    42. 42. Zeitgeist @cgtheoret @fffady
    43. 43. @cgtheoret @fffady
    44. 44. @cgtheoret @fffady
    45. 45. @cgtheoret @fffady
    46. 46. @cgtheoret @fffady
    47. 47. @cgtheoret @fffady
    48. 48. @cgtheoret @fffady
    49. 49. @cgtheoret @fffady
    50. 50. @cgtheoret @fffady
    51. 51. @cgtheoret @fffady
    52. 52. @cgtheoret @fffady
    53. 53. @cgtheoret @fffady
    54. 54. @cgtheoret @fffady
    55. 55. @cgtheoret @fffady
    56. 56. @cgtheoret cg.theoret@nexalogy.com @cgtheoret Merci!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×