hadoop introduce

What mapreduce is ?
• Origin from Google (Operating Systems
Design and Implementation 04)
• A sample programming model for data
processing
• For large dataset processing

MapReduce feature
• Parallel
• Run on commodity hardware
• Fault Tolerance

Three phase of MR
• Map
• Shuffle
• Reduce

Example for map
• Let map(k, v) =
•
foreach char c in v:
•
emit(k, c)
• (“A”, “cats”) -> (“A”, “c”), (“A”, “a”),
(“A”, “t”), (“A”, “s”)

Double example
• Let map(k, v) =
•
emit(k.toUpper(), v.toUpper())
• (“foo”, “bar”) -> (“FOO”, “BAR”)
• (“Foo”, “other”) -> (“FOO”, “OTHER”)

Triple example
• Let map(k, v) =
•
if (isPrime(v)) then emit(k, v)
• (“foo”, 7) -> (“foo”, 7)
• (“test”, 10) -> (nothing)

Reduce example
let reduce(k, vals) =
sum = 0
foreach int v in vals:
sum +=
emit(k, sum)
(“A”, [42, 100, 312]) -> (“A”, 454)
(“B”, [12, 6, -2]) -> (“B”, 16)

Interface InputFormat
•
•

Two methods

getSplits
How to split the input data
• getRecordReader
How to read the input data

Caculate the map tasks we need
• Goalsize = Totalsize/mapred.map.tasks
• Mapred.map.tasks(defined in job
configuration ,just a hint)

Reduce number
• 0.95 ? 1.75 ?
• At 0.95 all of the reduces can launch
immediately and start transfering map
outputs as the maps finish.
• At 1.75 the faster nodes will finish their
first round of reduces and launch a
second round of reduces doing a much
better job of load balancing.

What HDFS is ?
• Origin from Google again [SOSP’03]
Symposium on Operating Systems
Principles
• Redundant storage of massive amounts of
data on cheap and unreliable computers

HDFS feature
• Files stored as blocks
• Reliability through replication
• Single master(NN) coordinates
access,metadata
• No data caching
• Familiar interface ,

NN SPOF and failure resistance
• Store metadata in different place
(local disk / share storage)
Secondary NN
Merge edit log with Fsimage
Reduce recovery time
NN HA

Resource & Event
• http://class10e.com/Cloudera/
• http://blog.cloudera.com/blog/
• Hadoop Summit
http://hadoopsummit.org/
• Hadoop World
http://www.hadoopworld.com/

hadoop introduce

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to hadoop introduce

Similar to hadoop introduce (20)

Recently uploaded

Recently uploaded (20)

hadoop introduce