WHAT STARTS HERE CHANGES THE WORLD           and MapReduceHemanth Kumar Mantri  Graduate Student     UT-Austin   November ...
WHAT STARTS HERE CHANGES THE WORLD                 Agenda•   What is Hadoop?•   Where is MapReduce used?•   HDFS and MapRe...
WHAT STARTS HERE CHANGES THE WORLD            What is Hadoop?• Inspired by Google File System (GFS) and  MapReduce.• Suppo...
WHAT STARTS HERE CHANGES THE WORLDTypical Hadoop Cluster!
WHAT STARTS HERE CHANGES THE WORLDWho Uses Hadoop?
WHAT STARTS HERE CHANGES THE WORLD                    Who Uses Hadoop?•   At Google:     – Index construction for Google S...
WHAT STARTS HERE CHANGES THE WORLDFaceBook Lexicon
WHAT STARTS HERE CHANGES THE WORLD                           Yelp!• Uses Amazon S3 to store daily logs and photos,   – gen...
WHAT STARTS HERE CHANGES THE WORLD          Hadoop Components• Distributed file system (HDFS)  – Single namespace for enti...
WHAT STARTS HERE CHANGES THE WORLDHadoop Architecture
WHAT STARTS HERE CHANGES THE WORLDThe Big Picture
WHAT STARTS HERE CHANGES THE WORLD                         Using the HDFS• hadoop dfs   –   [-ls <path>]   –   [-du <path>...
WHAT STARTS HERE CHANGES THE WORLDAWS and Cloud
WHAT STARTS HERE CHANGES THE WORLD           Amazon Web Services• Collection of services – Pay as you use!   – S3 (Simple ...
WHAT STARTS HERE CHANGES THE WORLD       Map Reduce on EC2 Cluster• Create AWS account and get the keys for authentication...
WHAT STARTS HERE CHANGES THE WORLD                References• Hadoop Project Page:  – http://hadoop.apache.org/• Amazon We...
WHAT STARTS HERE CHANGES THE WORLDThank You!
Upcoming SlideShare
Loading in …5
×

Hadoop and MapReduce

2,016 views

Published on

Short Overview of Hadoop and MapReduce Usecases.

Published in: Technology, News & Politics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,016
On SlideShare
0
From Embeds
0
Number of Embeds
29
Actions
Shares
0
Downloads
10
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hadoop and MapReduce

  1. 1. WHAT STARTS HERE CHANGES THE WORLD and MapReduceHemanth Kumar Mantri Graduate Student UT-Austin November 9th 2011
  2. 2. WHAT STARTS HERE CHANGES THE WORLD Agenda• What is Hadoop?• Where is MapReduce used?• HDFS and MapReduce• Amazon Web Services• Map Reduce Demo on Hadoop
  3. 3. WHAT STARTS HERE CHANGES THE WORLD What is Hadoop?• Inspired by Google File System (GFS) and MapReduce.• Supports data-intensive distributed applications.• Thousands of nodes and PBytes of data.• Apache project – Open Source• Implemented in Java• Yahoo! - largest contributor
  4. 4. WHAT STARTS HERE CHANGES THE WORLDTypical Hadoop Cluster!
  5. 5. WHAT STARTS HERE CHANGES THE WORLDWho Uses Hadoop?
  6. 6. WHAT STARTS HERE CHANGES THE WORLD Who Uses Hadoop?• At Google: – Index construction for Google Search – Popular Passages in Google Books – Article clustering for Google News• At Yahoo!: – “Web map” powering Yahoo! Search – Spam detection for Yahoo! Mail – More than 100,000 CPUs in >36,000 computers• At Facebook: – Used in reporting/analytics and machine learning • Data Mining, Spam detection – as storage engine for logs. – 1100-machine cluster with 8800 cores and about 12 PB raw storage.
  7. 7. WHAT STARTS HERE CHANGES THE WORLDFaceBook Lexicon
  8. 8. WHAT STARTS HERE CHANGES THE WORLD Yelp!• Uses Amazon S3 to store daily logs and photos, – generating around 100GB of logs per day.• Amazon Elastic MapReduce for: – People Who Viewed this Also Viewed – Review highlights – Auto complete as you type on search – Search spelling suggestions – Top searches – Ads• Yelp runs approximately 200 Elastic MapReduce jobs processing 3TB of data per day.
  9. 9. WHAT STARTS HERE CHANGES THE WORLD Hadoop Components• Distributed file system (HDFS) – Single namespace for entire cluster – Almost same as GFS – Replicates data 3x for fault-tolerance• MapReduce framework – Executes user jobs specified as “map” and “reduce” functions – Manages work distribution & fault-tolerance
  10. 10. WHAT STARTS HERE CHANGES THE WORLDHadoop Architecture
  11. 11. WHAT STARTS HERE CHANGES THE WORLDThe Big Picture
  12. 12. WHAT STARTS HERE CHANGES THE WORLD Using the HDFS• hadoop dfs – [-ls <path>] – [-du <path>] – [-cp <src> <dst>] – [-rm <path>] – [-put <localsrc> <dst>] – [-copyFromLocal <localsrc> <dst>] – [-moveFromLocal <localsrc> <dst>] – [-get [-crc] <src> <localdst>] – [-cat <src>] – [-copyToLocal [-crc] <src> <localdst>] – [-moveToLocal [-crc] <src> <localdst>] – [-mkdir <path>] – [-touchz <path>] – [-test -[ezd] <path>] – [-stat [format] <path>] – [-help [cmd]]
  13. 13. WHAT STARTS HERE CHANGES THE WORLDAWS and Cloud
  14. 14. WHAT STARTS HERE CHANGES THE WORLD Amazon Web Services• Collection of services – Pay as you use! – S3 (Simple Storage Service) Storage in the Cloud ($0.140/GB/Month) Key Value Store (Big HashMap!) – EC2 (Elastic Compute Cloud) Compute in the Cloud ($0.085 - $2.6 /computing hour) – Elastic MapReduce Run Hadoop Jobs on EC2 using Data stored in S3 – Email Service – …. Many more
  15. 15. WHAT STARTS HERE CHANGES THE WORLD Map Reduce on EC2 Cluster• Create AWS account and get the keys for authentication• Go to src/contrib/ec2 in Hadoop directory• Launch a cluster on EC2 – % bin/hadoop-ec2 launch-cluster <cluster-name> <#nodes>• Login to the cluster – % bin/hadoop-ec2 login test-cluster• Start Computation – # cd /usr/local/hadoop-* – # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000• Terminate the Cluster after use!!!!! – % bin/hadoop-ec2 terminate-cluster test-cluster
  16. 16. WHAT STARTS HERE CHANGES THE WORLD References• Hadoop Project Page: – http://hadoop.apache.org/• Amazon Web Services: – http://aws.amazon.com/
  17. 17. WHAT STARTS HERE CHANGES THE WORLDThank You!

×