• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to hadoop
 

Introduction to hadoop

on

  • 344 views

The is an introduction to Hadoop I gave at the company I work at.

The is an introduction to Hadoop I gave at the company I work at.
I give a general introduction to Hadoop core - HDFS & MapReduce

Statistics

Views

Total Views
344
Views on SlideShare
339
Embed Views
5

Actions

Likes
2
Downloads
21
Comments
0

2 Embeds 5

http://www.linkedin.com 4
https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to hadoop Introduction to hadoop Presentation Transcript

    • Introduction to Hadoop Ron Sher
    • Agenda • • • • • Big data - big issues Hadoop to the rescue Storage - HDFS Processing - MapReduce Hadoop ecosystem
    • Big Data - Big Issues ● Volume, Velocity, Variability ● Lots of data - logs, sensors, social, pictures, video, etc. ● May not fit a single machine ● Access to data is slow ● Hardware may fail ● Network errors happen
    • Hadoop to the rescue • • • • • • Distributed “operating system” Scalable - many servers of commodity hardware with lots of cores and disks Reliable - detect failures, redundant storage Fault-tolerant - auto-retry, self-healing Simple - use many servers as one really big computer Suitable for batch processing (throughput over
    • Storage - HDFS • • • • Hadoop Distributed File System Replicated (3 default) fixed size blocks (64MB default) runs on large clusters of commodity machines Optimized for write once - read many throughput of large files
    • HDFS Architecture http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/images/hdfsarchitecture.png
    • Useful HDFS commands • • • • • • • • hdfs dfs -get <file name> - copy a file from hdfs to local hdfs dfs -put <file name> [destination]- copy a file from local to hdfs in the specified destination hdfs dfs -cat <file name> - prints a file to stdout hdfs dfs -ls <dir name> - show all files under the specified directory hdfs dfs -mv <file name> <changed name> - rename a file hdfs dfs -rm <file name> - remove a file hdfs dfs -rmr <directory name> - remove a directory hdfs dfs -mkdir <dir name> - creates a directory
    • Processing - MapReduce • • • • A distributed data processing model and execution environment that runs on large clusters of commodity machines Responsible for running a job in parallel on many servers Handles re-trying a task that fails, validating complete results Computation moved to the data
    • MapReduce Sample - Word Count input Ini Mini Miny Mo Mo Miny Ini Mo Mini
    • MapReduce Sample - Word Count input splitting Ini Mini Miny Ini Mini Miny Mo Mo Miny Ini Mo Mini Mo Mo Miny Ini Mo Mini
    • MapReduce Sample - Word Count input splitting Ini Mini Miny Ini Mini Miny Mo Mo Miny Ini Mo Mini Mo Mo Miny Ini Mo Mini mapping Ini, 1 Mini, 1 Miny,1 Mo, 1 Mo, 1 Miny,1 Ini, 1 Mo, 1 Mini, 1
    • MapReduce Sample - Word Count input splitting Ini Mini Miny Ini Mini Miny Mo Mo Miny Ini Mo Mini Mo Mo Miny Ini Mo Mini mapping Ini, 1 Mini, 1 Miny,1 Mo, 1 Mo, 1 Miny,1 Ini, 1 Mo, 1 Mini, 1 shuffling Ini, 1 Ini, 1 Mini, 1 Mini, 1 Miny, 1 Miny, 1 Mo, 1 Mo, 1 Mo, 1
    • MapReduce Sample - Word Count input splitting Ini Mini Miny Ini Mini Miny Mo Mo Miny Ini Mo Mini Mo Mo Miny Ini Mo Mini mapping Ini, 1 Mini, 1 Miny,1 Mo, 1 Mo, 1 Miny,1 Ini, 1 Mo, 1 Mini, 1 shuffling reducing Ini, 1 Ini, 1 Ini, [1,1] Mini, 1 Mini, 1 Mini, [1,1] Miny, 1 Miny, 1 Miny, [1,1] Mo, 1 Mo, 1 Mo, 1 Mo, [1,1,1]
    • MapReduce Sample - Word Count input splitting Ini Mini Miny Ini Mini Miny Mo Mo Miny Ini Mo Mini Mo Mo Miny Ini Mo Mini mapping Ini, 1 Mini, 1 Miny,1 Mo, 1 Mo, 1 Miny,1 Ini, 1 Mo, 1 Mini, 1 shuffling reducing Ini, 1 Ini, 1 Ini, [1,1] Mini, 1 Mini, 1 Mini, [1,1] Miny, 1 Miny, 1 Miny, [1,1] Mo, 1 Mo, 1 Mo, 1 Mo, [1,1,1] final result Ini, 2 Mini, 2 Miny,2 Mo, 3
    • http://answers.oreilly.com/uploads/monthly_10_2009/post-118-125676084924_thumb.png How a MapReduce Job Runs in Hadoop
    • Monitoring MR jobs (machine:50030)
    • Monitoring MR jobs (machine:50030)
    • Monitoring MR jobs (machine:50030)
    • Monitoring MR jobs (machine:50030)
    • Useful Commands • • mapred job -kill <job id> - kill a running job mapred job -status <job id> - show status of a job
    • Useful Commands • • mapred job -kill <job id> - kill a running job mapred job -status <job id> - show status of a job
    • Word Count Mapper public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
    • Word Count Reducer public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
    • Hadoop Ecosystem • • • • • • • • Hive - SQL like language over big data using MR HBase - distributed, column-oriented database ZooKeeper - coordination service Avro - cross language serialization Pig - language for exploring big data Impala - SQL like directly over HDFS Sqoop - tool for moving data from DBs to HDFS Mahout - machine learning and data mining library
    • Some resources • • • • • • Motivation about hadoop and where it’s going video and whitepaper HDFS Architecture Guide How MapReduce Works With Hadoop HDFS shell commands VM MapReduce tutorial