5. Every minute 8-10 months ago:
• 48 hours of video are downloaded on Youtube
• 320 new accounts and 98,000 tweets appear
on Twitter
• 168,000,000 million emails are sent
• 20,000 new posts on Tumblr
• 6,600 photos appear on Flickr
• Over 20% of all websites are
CMS/wordpress/etc…
6. Every minute today:
• 100 hours of video are downloaded on
Youtube
• ??? new accounts and 236,000 tweets appear
on Twitter
• 204,000,000 million emails are sent
• 28,000 new posts on Tumblr
• 1,600 photos appear on Flickr !!! No shit!
12. But…
• Facebook has lost 1.5 million users in Canada
and 6 million in the United States
• Yahoo study: 50% of the content that is read
and shared by humans is produced by only
20, 000 accounts 0.05%
@cgtheoret
24. In a lot of ways “Big Data” is like Oil…
• Can’t be used by consumers unless refined
• More expensive at every step of refinement
@cgtheoret
25. The Market is Producing a plethora of derived
higher value data products
@cgtheoret
26. In a lot of ways “Big Data” is like Oil…
• Difficult and expensive to extract
• Difficult and expensive to store and distribute
• Cheapest in its unrefined form
• More expensive at every step of refinement
• Produces a plethora of derived products
• and it’s actually quite “dirty”!!!!
@cgtheoret
30. 6 factors affect Data Veracity …
1. Accuracy: Is it true?
2. Precision: If true, error margin?
3. Reliability: Is it there all the time?
4. Provenance: Can you trace the source?
5. Fidelity: Did it change from the
source?
6. Permission: Can you use it for the context?
@cgtheoret
42. @cgtheoret
“McKinsey Global Institute
estimated that by 2018
there will be 4 million big
data related positions in
the U.S. that require
quantitative and
analytical skills. However,
there will be a potential
shortfall of 1.5 million
data-savvy managers and
analysts to fill these
positions”
If we look at the relationships of what people are saying we can start to spot memes… memes are proto-ideas that are taking shape in society
With enough of these we can cleary see the connections and patterns in the chaos, we can actually start to measure the « zeitgeist »
So that everyone with a social sciences degree with these tools will be able to see new connexions and play a vital role in exploring the largest human behavior dataset we have ever reated….
Our current understaning of human behavior is based on surveys, polls that try to use people’s race, gender, religion, age and income to classify their behavior….
But with the social graph we can address people by their passions, their interests, we just have to be able to exploit and understand the interest graph
Initially the people who can do this are being called « social data scientists » this is a brand new field… there isn’t really a name yet… I prefer to call them Anthropomant people… because the understanding of human behaior that we have from fields such as Sociology, poli sci, semiotics, history, anthropolgy are essential to intepreting this mass of social data….
Initially the people who can do this are being called « social data scientists » this is a brand new field… there isn’t really a name yet… I prefer to call them Anthropomant people… because the understanding of human behaior that we have from fields such as Sociology, poli sci, semiotics, history, anthropolgy are essential to intepreting this mass of social data….
Initially the people who can do this are being called « social data scientists » this is a brand new field… there isn’t really a name yet… I prefer to call them Anthropomant people… because the understanding of human behaior that we have from fields such as Sociology, poli sci, semiotics, history, anthropolgy are essential to intepreting this mass of social data….
The good news is that all of the hard stuff is now being coded by teams of engineers….