• Save
Hadoop and MapReduce
Upcoming SlideShare
Loading in...5
×
 

Hadoop and MapReduce

on

  • 1,524 views

Short Overview of Hadoop and MapReduce Usecases.

Short Overview of Hadoop and MapReduce Usecases.

Statistics

Views

Total Views
1,524
Slideshare-icon Views on SlideShare
1,506
Embed Views
18

Actions

Likes
1
Downloads
10
Comments
0

2 Embeds 18

http://www.linkedin.com 16
https://www.linkedin.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop and MapReduce Hadoop and MapReduce Presentation Transcript

    • WHAT STARTS HERE CHANGES THE WORLD and MapReduceHemanth Kumar Mantri Graduate Student UT-Austin November 9th 2011
    • WHAT STARTS HERE CHANGES THE WORLD Agenda• What is Hadoop?• Where is MapReduce used?• HDFS and MapReduce• Amazon Web Services• Map Reduce Demo on Hadoop
    • WHAT STARTS HERE CHANGES THE WORLD What is Hadoop?• Inspired by Google File System (GFS) and MapReduce.• Supports data-intensive distributed applications.• Thousands of nodes and PBytes of data.• Apache project – Open Source• Implemented in Java• Yahoo! - largest contributor
    • WHAT STARTS HERE CHANGES THE WORLDTypical Hadoop Cluster!
    • WHAT STARTS HERE CHANGES THE WORLDWho Uses Hadoop?
    • WHAT STARTS HERE CHANGES THE WORLD Who Uses Hadoop?• At Google: – Index construction for Google Search – Popular Passages in Google Books – Article clustering for Google News• At Yahoo!: – “Web map” powering Yahoo! Search – Spam detection for Yahoo! Mail – More than 100,000 CPUs in >36,000 computers• At Facebook: – Used in reporting/analytics and machine learning • Data Mining, Spam detection – as storage engine for logs. – 1100-machine cluster with 8800 cores and about 12 PB raw storage.
    • WHAT STARTS HERE CHANGES THE WORLDFaceBook Lexicon
    • WHAT STARTS HERE CHANGES THE WORLD Yelp!• Uses Amazon S3 to store daily logs and photos, – generating around 100GB of logs per day.• Amazon Elastic MapReduce for: – People Who Viewed this Also Viewed – Review highlights – Auto complete as you type on search – Search spelling suggestions – Top searches – Ads• Yelp runs approximately 200 Elastic MapReduce jobs processing 3TB of data per day.
    • WHAT STARTS HERE CHANGES THE WORLD Hadoop Components• Distributed file system (HDFS) – Single namespace for entire cluster – Almost same as GFS – Replicates data 3x for fault-tolerance• MapReduce framework – Executes user jobs specified as “map” and “reduce” functions – Manages work distribution & fault-tolerance
    • WHAT STARTS HERE CHANGES THE WORLDHadoop Architecture
    • WHAT STARTS HERE CHANGES THE WORLDThe Big Picture
    • WHAT STARTS HERE CHANGES THE WORLD Using the HDFS• hadoop dfs – [-ls <path>] – [-du <path>] – [-cp <src> <dst>] – [-rm <path>] – [-put <localsrc> <dst>] – [-copyFromLocal <localsrc> <dst>] – [-moveFromLocal <localsrc> <dst>] – [-get [-crc] <src> <localdst>] – [-cat <src>] – [-copyToLocal [-crc] <src> <localdst>] – [-moveToLocal [-crc] <src> <localdst>] – [-mkdir <path>] – [-touchz <path>] – [-test -[ezd] <path>] – [-stat [format] <path>] – [-help [cmd]]
    • WHAT STARTS HERE CHANGES THE WORLDAWS and Cloud
    • WHAT STARTS HERE CHANGES THE WORLD Amazon Web Services• Collection of services – Pay as you use! – S3 (Simple Storage Service) Storage in the Cloud ($0.140/GB/Month) Key Value Store (Big HashMap!) – EC2 (Elastic Compute Cloud) Compute in the Cloud ($0.085 - $2.6 /computing hour) – Elastic MapReduce Run Hadoop Jobs on EC2 using Data stored in S3 – Email Service – …. Many more
    • WHAT STARTS HERE CHANGES THE WORLD Map Reduce on EC2 Cluster• Create AWS account and get the keys for authentication• Go to src/contrib/ec2 in Hadoop directory• Launch a cluster on EC2 – % bin/hadoop-ec2 launch-cluster <cluster-name> <#nodes>• Login to the cluster – % bin/hadoop-ec2 login test-cluster• Start Computation – # cd /usr/local/hadoop-* – # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000• Terminate the Cluster after use!!!!! – % bin/hadoop-ec2 terminate-cluster test-cluster
    • WHAT STARTS HERE CHANGES THE WORLD References• Hadoop Project Page: – http://hadoop.apache.org/• Amazon Web Services: – http://aws.amazon.com/
    • WHAT STARTS HERE CHANGES THE WORLDThank You!