BIG DATA & HADOOP
The future of the information economy

by Thanakrit Lersmethasakul
lersmethasakul@live.com
A Technology Blueprint
Big Data Storymap
Big Data Concept
Big Data Concept
Big Data Concept
Big Data Architecture
Big Data Ecosystem
Big Data Landscape
Big Data Life-cycle Management
Hadoop Concept
Hadoop Concept
Hadoop Concept
Hadoop Architecture
Hadoop Architecture
Hadoop Client
Contacts Name Node for data or
Job Tracker to submit jobs

Name Node

Job Tracker

Maintains mapping of file
blocks to data node slaves

Schedules jobs across task
tracker slaves

Data Node

Task Tracker

Stores and serves
blocks of data

Runs tasks (work units)
within a job

Share Physical Node
Hadoop Process
MapReduce Example for Word Count

cat *.txt | mapper.pl | sort | reducer.pl > out.txt
Split 1

(docid, text)

Map 1

(words, counts)

(sorted words, counts)

Be, 5

Reduce 1

“To Be
Or Not
To Be?”

(sorted words,
sum of counts)

Output
File 1

Be, 30
Be, 12

Split i

(docid, text)

Reduce i

Map i

Be, 7
Be, 6
Split N

(docid, text)

Map M

(sorted words,
sum of counts)

Reduce R

(sorted words,
sum of counts)

Shuffle

(words, counts)

Map(in_key, in_value) => list of (out_key, intermediate_value)

(sorted words, counts)

Output
File i

Output
File R

Reduce(out_key, list of intermediate_values) => out_value(s)
Hadoop Ecosystem
Hadoop Ecosystem
Hadoop Ecosystem
Thank You

Big Data & Hadoop