An Introduction to Hadoop

Hello ,[object Object],[object Object],[object Object]

Goals ,[object Object],[object Object],[object Object]

Data Everywhere “ Every two days now we create as much information as we did from the dawn of civilization up until 2003” ,[object Object],[object Object],[object Object]

The Hadoop Project ,[object Object],[object Object],[object Object],[object Object],[object Object]

Hadoop Components Storage Self-healing high-bandwidth clustered storage Processing Fault-tolerant distributed processing HDFS MapReduce

Typical Cluster ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

2 Kinds of Nodes Master Nodes Slave Nodes

Master Nodes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Slave Nodes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

HDFS Basics ,[object Object],[object Object],[object Object],[object Object]

HDFS Data ,[object Object],[object Object],[object Object],[object Object],[object Object]

NameNode ,[object Object],[object Object],[object Object],[object Object]

SecondaryNameNode ,[object Object],[object Object],[object Object]

Data Node ,[object Object],[object Object],[object Object],[object Object]

Self-healing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

HDFS Data Storage ,[object Object],[object Object],[object Object],[object Object],NameNode foo.txt: blk_1, blk_2, blk_3 bar.txt: blk_4, blk_5 DataNodes blk_1 blk_2 blk_3 blk_5 blk_1 blk_3 blk_4 blk_1 blk_4 blk_5 blk_2 blk_4 blk_2 blk_3 blk_5

What is MapReduce? ,[object Object],[object Object],[object Object]

Features of MapReduce ,[object Object],[object Object],[object Object]

JobTracker ,[object Object],[object Object],[object Object],[object Object],[object Object]

Two Parts ,[object Object],[object Object],[object Object],[object Object]

map() ,[object Object],[object Object],map(key_in, value_in) -> (key_out, value_out)

reduce() ,[object Object],[object Object],[object Object],[object Object]

map() Word Count map(String input_key, String input_value) foreach word w in input_value emit(w, 1) (1234, “to be or not to be”) (5678, “to see or not to see”) (“to”,1),(“be”,1),(“or”,1),(“not”,1), (“to”,1),(“be”,1), (“to”,1),(“see”,1), (“or”,1),(“not”,1),(“to”,1),(“see”,1)

reduce() Word Count reduce(String output_key, List middle_vals) set count = 0 foreach v in intermediate_vals: count += v emit(output_key, count) (“to”, [1,1,1,1]) (“be”,[1,1]) (“or”,[1,1]) (“not”,[1,1]) (“see”,[1,1]) (“to”, 4) (“be”,2) (“or”,2) (“not”,2) (“see”,2)

Resources http://hadoop.apache.org/ http://developer.yahoo.com/hadoop/ http://www.cloudera.com/resources/?media=Video

An Introduction to Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An Introduction to Hadoop

Similar to An Introduction to Hadoop (20)

Recently uploaded

Recently uploaded (20)

An Introduction to Hadoop

Editor's Notes