Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hack reduce introduction


Published on

  • Be the first to comment

  • Be the first to like this

Hack reduce introduction

  1. 1. What is hack/reduce?• A Home for the Big Data Community• 24/7 Access to Cluster Compute Power• Regular Hackathons
  2. 2. hack/reduce2011 Montreal Toronto Boston Ottawa2012 hack/reduce Boston’s Big Data Hackspace
  3. 3. Why should you care?• Work with Millions and Billions of records• Find patterns in Big Data sets• Use data to detect, predict, forecast• Extract new information from raw data
  4. 4. APIs SuckIn Big data there are: • no requests, • no predefined parameters • no structured responses.You are free to intersect anything with anything.You can analyse, mutate, group, split, reorder in anyway you can imagine.
  5. 5. What you can do today• Access the hack/reduce GoGrid Cluster: • 240 Cores • 240GB of RAM • 10TB of Disk
  6. 6. What you can do todayUse Hadoop to Explore big Open Data sets, like: • 20 Years of the Federal Parliament Hansard • Hourly Canadian Weather 1953 to 2001 • The 1881 Census. Details about 4.3M people • One Summer of Bixi Station Status Updates
  7. 7. What is Map/Reduce?• Framework for distributed computing on large data sets on clusters of computers• MapReduce patented by Google• Hadoop implementation is Googlesque• Michael Stonebraker hates it
  8. 8. What is Map/Reduce?• Map = function applied in parallel to every item in the dataset• Reduce = function applied in parallel to groups of values emitted by Map function
  9. 9. What is Map/Reduce?map(String docId, String document): for each word w in document: emit(w, 1);reduce(String word, Iterator counts): int sum = 0; for each count in counts: sum += count; emit(word, sum);
  10. 10. private key (“hackreduce”): -i hackreduce hackreduce@cluster-MapReduce: