Hack reduce introduction

415 views
356 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
415
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • We are hopper. Hopper is using Big Data to solve travel planning.
  • Hopper ’ s Montreal office was home to the inaugural Hack/Reduce event two years ago.
  • Hack/reduce is a community We held 4 events, in Montreal, Toronto, Boston and Ottawa. More than 300 hackers participated. Now we ’ re building a permanent Hack/Reduce community hackspace in Boston.
  • We are hopper. Hopper is using Big Data to solve travel planning.
  • GoGrid is sponsoring the cluster
  • GoGrid is sponsoring the cluster
  • If you ’ re interested in learning something different. Come talk to us.
  • If you ’ re interested in learning something different. Come talk to us.
  • Hack reduce introduction

    1. 1. What is hack/reduce?• A Home for the Big Data Community• 24/7 Access to Cluster Compute Power• Regular Hackathons
    2. 2. hack/reduce2011 Montreal Toronto Boston Ottawa2012 hack/reduce Boston’s Big Data Hackspace
    3. 3. Why should you care?• Work with Millions and Billions of records• Find patterns in Big Data sets• Use data to detect, predict, forecast• Extract new information from raw data
    4. 4. APIs SuckIn Big data there are: • no requests, • no predefined parameters • no structured responses.You are free to intersect anything with anything.You can analyse, mutate, group, split, reorder in anyway you can imagine.
    5. 5. What you can do today• Access the hack/reduce GoGrid Cluster: • 240 Cores • 240GB of RAM • 10TB of Disk
    6. 6. What you can do todayUse Hadoop to Explore big Open Data sets, like: • 20 Years of the Federal Parliament Hansard • Hourly Canadian Weather 1953 to 2001 • The 1881 Census. Details about 4.3M people • One Summer of Bixi Station Status Updates
    7. 7. What is Map/Reduce?• Framework for distributed computing on large data sets on clusters of computers• MapReduce patented by Google• Hadoop implementation is Googlesque• Michael Stonebraker hates it
    8. 8. What is Map/Reduce?• Map = function applied in parallel to every item in the dataset• Reduce = function applied in parallel to groups of values emitted by Map function
    9. 9. What is Map/Reduce?map(String docId, String document): for each word w in document: emit(w, 1);reduce(String word, Iterator counts): int sum = 0; for each count in counts: sum += count; emit(word, sum);
    10. 10. private key (“hackreduce”):http://bit.ly/X13pNhwiki:http://github.com/hackreduce/HackathonSSH:ssh -i hackreduce hackreduce@cluster-MapReduce:http://cluster-1-master.gg.hackreduce

    ×