Big Data & Hadoop

BIG DATA & HADOOP
The future of the information economy

by Thanakrit Lersmethasakul
lersmethasakul@live.com

Big Data Life-cycle Management

Hadoop Architecture
Hadoop Client
Contacts Name Node for data or
Job Tracker to submit jobs

Name Node

Job Tracker

Maintains mapping of file
blocks to data node slaves

Schedules jobs across task
tracker slaves

Data Node

Task Tracker

Stores and serves
blocks of data

Runs tasks (work units)
within a job

Share Physical Node

Hadoop Process
MapReduce Example for Word Count

cat *.txt | mapper.pl | sort | reducer.pl > out.txt
Split 1

(docid, text)

Map 1

(words, counts)

(sorted words, counts)

Be, 5

Reduce 1

“To Be
Or Not
To Be?”

(sorted words,
sum of counts)

Output
File 1

Be, 30
Be, 12

Split i

(docid, text)

Reduce i

Map i

Be, 7
Be, 6
Split N

(docid, text)

Map M

(sorted words,
sum of counts)

Reduce R

(sorted words,
sum of counts)

Shuffle

(words, counts)

Map(in_key, in_value) => list of (out_key, intermediate_value)

(sorted words, counts)

Output
File i

Output
File R

Reduce(out_key, list of intermediate_values) => out_value(s)

Big Data & Hadoop

More Related Content

What's hot

Viewers also liked

Similar to Big Data & Hadoop

More from Thanakrit Lersmethasakul

Recently uploaded

Big Data & Hadoop