Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data    big problems.
What is Big Data?VolumeVelocityVariety
VolumeBillions of Things:    Posts, Tweets and Likes    Web Transactions    Sensor Readings
VelocityStreaming Data:   Twitter: 500,000,000 TPD   Walmart: 20,000,000 TPD   Hopper: 750,000,000 TPD
VarietyIntegrating Many Sources of Data:   Unstructured Web Content   Semi-structured Logs   Relational Databases   Images...
So What’s Changed?Mobile devicesSocial WebSensors, MetricsDigitization of everything
Open Source Tools•   Hadoop: distributed processing•   R: predictive analytics for big data•   Hive, Pig: ad-hoc analytics...
"The best minds of my generation arethinking about how to make people clickads"- Jeff Hammerbacher (Facebook, Accel,Cloude...
Big Minds + Big DataAggregate, SummarizeDetect PatternsModel, SimulateForecast, Predict
Open DataReportsRequest/Response APIsSmall Data
TextText
Hack/reduceOpen Hackspace in BostonHome for Pre-seed projects,Community eventsNot-for-profit sponsored bylocal industry an...
Hack/reduce Cluster240-core cluster sponsoredby GoGrid, a cloudcomputing company.Available for use at today’sOpen Data Day.
What do you with a240-core Cluster?Use the power of manymachines to analyze BigData sets.
How do you get computers towork together like that??That’s what Hadoop is for.
An ExampleDaily Hansard: transcript ofCanadian parliament since 1994Swearwords.txt (http://www.bannedwordlist.com)Who are ...
Results• 20 years of House of Commons statements• 511,341 Statements analyzed• 121,985,310 Words spoken• 3,839 Swearwords ...
Top 5 Swearers       (absolute)   Pat Martin         NDP          98  Randy White      Conservative    88Alexa McDonough  ...
Top 5 Swearers             (relative)Randy White     Conservative   0.037%   88   299,114 Dennis Mills     Liberal      0....
Top 5 Words Spoken   Paul Szabo    1,482,106   Pat Martin    1,053,365  Don Boudria    867,204  Yvan Loubier   861,888  Pe...
Prime MinistersJean Chrétien    11   604,431  Paul Martin    6    485,990Stephen Harper   22   620,999
"The best minds of my generation arethinking about how to make people clickads"- Jeff Hammerbacher (Facebook, Accel,Cloude...
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Joost ouwerkerk
Upcoming SlideShare
Loading in …5
×

Joost ouwerkerk

473 views

Published on

  • Be the first to comment

  • Be the first to like this

Joost ouwerkerk

  1. 1. Big Data big problems.
  2. 2. What is Big Data?VolumeVelocityVariety
  3. 3. VolumeBillions of Things: Posts, Tweets and Likes Web Transactions Sensor Readings
  4. 4. VelocityStreaming Data: Twitter: 500,000,000 TPD Walmart: 20,000,000 TPD Hopper: 750,000,000 TPD
  5. 5. VarietyIntegrating Many Sources of Data: Unstructured Web Content Semi-structured Logs Relational Databases Images,Video, Audio
  6. 6. So What’s Changed?Mobile devicesSocial WebSensors, MetricsDigitization of everything
  7. 7. Open Source Tools• Hadoop: distributed processing• R: predictive analytics for big data• Hive, Pig: ad-hoc analytics for Hadoop• Mahout: machine learning for Hadoop• HBase, Cassandra: distributed databases• ElasticSearch: distributed search engine• Storm: distributed processing for data streams
  8. 8. "The best minds of my generation arethinking about how to make people clickads"- Jeff Hammerbacher (Facebook, Accel,Cloudera)
  9. 9. Big Minds + Big DataAggregate, SummarizeDetect PatternsModel, SimulateForecast, Predict
  10. 10. Open DataReportsRequest/Response APIsSmall Data
  11. 11. TextText
  12. 12. Hack/reduceOpen Hackspace in BostonHome for Pre-seed projects,Community eventsNot-for-profit sponsored bylocal industry and government
  13. 13. Hack/reduce Cluster240-core cluster sponsoredby GoGrid, a cloudcomputing company.Available for use at today’sOpen Data Day.
  14. 14. What do you with a240-core Cluster?Use the power of manymachines to analyze BigData sets.
  15. 15. How do you get computers towork together like that??That’s what Hadoop is for.
  16. 16. An ExampleDaily Hansard: transcript ofCanadian parliament since 1994Swearwords.txt (http://www.bannedwordlist.com)Who are the most foul-mouthedFederal MPs?
  17. 17. Results• 20 years of House of Commons statements• 511,341 Statements analyzed• 121,985,310 Words spoken• 3,839 Swearwords spoken• 1 in 133 statements has a swearword
  18. 18. Top 5 Swearers (absolute) Pat Martin NDP 98 Randy White Conservative 88Alexa McDonough NDP 52 Jim Silye Conservative 50 Yvan Loubier Bloc Quebecois 49
  19. 19. Top 5 Swearers (relative)Randy White Conservative 0.037% 88 299,114 Dennis Mills Liberal 0.023% 14 62,221 Gerry Ritz Conservative 0.022% 22 99,037John McCallum Conservative 0.017% 38 226,155 John McKay Liberal 0.016% 44 268,188
  20. 20. Top 5 Words Spoken Paul Szabo 1,482,106 Pat Martin 1,053,365 Don Boudria 867,204 Yvan Loubier 861,888 Peter McKay 844,130
  21. 21. Prime MinistersJean Chrétien 11 604,431 Paul Martin 6 485,990Stephen Harper 22 620,999
  22. 22. "The best minds of my generation arethinking about how to make people clickads"- Jeff Hammerbacher (Facebook, Accel,Cloudera)

×