Hadoop MapReduce

SEMINAR ONSEMINAR ON
Android App DevelopmentAndroid App Development
Trained by-Trained by-
Hewlett-Packard Education Services,Hewlett-Packard Education Services,
MumbaiMumbai
Presented to-
Mr. R.K. Banyal By-
Mr. Hukum Chand Saini Urvashi Kataria

About HPES:About HPES:
• American global IT company headquartered in Palo-
Alto, California, US.
• Provider of products, soft wares, technologies,
solutions and services to individual as well as small
& medium sized business.
• Major operations include- HP Software, HP Financial
Services & Corporate Investments
• Provides practical training in fields like Big Data,
Android App Dev, Embedded Systems etc.

An android application that allows you to enjoy your as well as
your dear ones birthday.
Save the days, get reminded of them, capture moments on the
day itself, get greeted by the app, and celebrate!!
About Birthday Bash:About Birthday Bash:

The home screen:The home screen:

Calculating age and further:Calculating age and further:

Saving name for specified date:Saving name for specified date:

Happy Birthday!Happy Birthday!

Hadoop Map Reduce
(Map + reduce)
Presentation on:Presentation on:

Why MapReduce?Why MapReduce?
• Large scale data processing was difficult!
 Managing hundreds or thousands of processors
 Managing parallelization and distribution
 Reliable execution with easy data access
MapReduce provides all of these, easily!

What is Hadoop MapReduce?What is Hadoop MapReduce?

Hadoop ClusterHadoop Cluster HDFS (Physical)HDFS (Physical) StorageStorage

MapReduce ObjectsMapReduce Objects

How Map and Reduce WorkHow Map and Reduce Work
TogetherTogether

Hadoop MapReduce: A Closer LookHadoop MapReduce: A Closer Look
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
Node 1 Node 2
Shuffling
Process
Intermediate
(K,V) pairs
exchanged by
all nodes

AlgorithmAlgorithm
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key,result)
map(key=url, val=contents):
for each word w in contents:
emit (w, “1”)
reduce(key=word, values=uniq_counts):
//Sum all “1”s in values list
emit result “(word, sum)”

The very famous:The very famous:
Word Count ExampleWord Count Example

Ways to MapReduceWays to MapReduce
Libraries Languages
Note: Java is most common, but other languages can be used

Common Data Sources forCommon Data Sources for
MapReduce JobsMapReduce Jobs

Service ProvidersService Providers
• Open Source
o Apache
• Commercial
o Cloudera
o Hortonworks
o MapR
o AWS MapReduce
o Microsoft HDInsight (Beta)

Advancements:Advancements:
MRV1 & MRV2MRV1 & MRV2
MRV2 (MAPREDUCE VERSION 2)
•Splits the existing JobTracker’s roles
o Resource management
o Job lifecycle management
•MapReduce 2.0 provides many benefits over the existing
MapReduce framework:
o Better scalability
o Through distributed job lifecycle management
o Support for multiple Hadoop MapReduce API versions in a
single cluster

Better MapReduce - OptimizationsBetter MapReduce - Optimizations

Advantages of MapReduceAdvantages of MapReduce
• Distributed data and computation.
• Tasks are independent. Entire nodes can fail and restart.
• Linear scaling in the idle case. It’s used to design cheap
commodity, hardware.
• Simple programming model. The end-user programmer
only writes map reduce task.

Disadvantages/ Cases where MR isn’tDisadvantages/ Cases where MR isn’t
a suitable choice:a suitable choice:
• Real time processing
• It is not always very easy to implement each and every
thing as a map reduce program
• When your intermediate processes need to talk to each
other
• When your processing requires lot of data to be shuffled
over the network
• When you need to handle streaming data. MR is best suited
to batch process huge amount of data which you already
have

Limitations of MapReduceLimitations of MapReduce

RDBMS vs. HadoopRDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch
processing)

ReferencesReferences
• J. Dean and S. Ghemawat. “MapReduce: Simplified Data
Processing on Large Clusters.” Proceedings of the 6th
Symposium on Operating System Design and Implementation
(OSDI 2004), pages 137-150. 2004.
• S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File
System.” OSDI 200?
• http://hadoop.apache.org/common/docs/current/mapred_tutori
al.html. “Map/Reduce Tutorial”. Fetched January 21, 2010.
• Tom White. Hadoop: The Definitive Guide. O'Reilly Media.
June 5, 2009
• http://developer.yahoo.com/hadoop/tutorial/module4.html
• J. Lin and C. Dyer. Data-Intensive Text Processing with
MapReduce, Book Draft. February 7, 2010.

Hadoop MapReduce

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Hadoop MapReduce

Similar to Hadoop MapReduce (20)

Recently uploaded

Recently uploaded (20)

Hadoop MapReduce

Editor's Notes