Hadoop
MapReduce
HELLO!
I am Yogender Singh
Working with Entrench Electronic and penthao
as big data and spark developer.
Map reduce
Introduction
Agenda For Today’s Session
◉ What is hadoop MapReduce ?
◉ MapReduce In Nutshell
◉ Two Advantages of MapReduce
◉ Hadoop MapReduce Approach with an Example
What is mapreduce?
Components of mapreduce
Storage
Hadoop Components
Processing
2 main Hadoop Components
MapReduce Data Processing and
Programming
◉ MapReduce is the processing components
of Apache Hadoop
◉ It process data parallelly in distributed
environment
Result
MapReduce In Nutshell
2 Biggest Adavantages
Of MapReduce
Advantage 1 : Parallel Processing
◉ Data is processed in parallel.
◉ Processing become fast.
Data
Advantage 2 : Data Locality – Processing to Storage
◉ Moving Data to processing is very
costly.
◉ In MapReduce , we move processing to
data.
Data
Traditional vs Mapreduce
Election votes counting : Traditional way
Election Vote Casting
◉ vote is stored at different Booths
◉ Result Center has the details of
all the Booths
Election votes counting : MapReduce way
Counting - MapReduce Approach
◉ Votes are counted at individual booths.
◉ Booth-wise result are send back to the result
Centre.
◉ Final Result is declared easily and quickly
using this way.
MapReduce way :
INPUT
OUTPUT
Map()
Map()
Map()
Reduce()
Reduce()
Anatomy of a MapReduce Program
MapReduce
ValueKey
(k1,v1)
List(k3,v3)(k2,list(v2))
List(K2,v2)
Map :
Reduce :
Example (MapReduce Problem)
MapReduce way – Word Count Process
The overall MapReduce Word Count Process
Input Split
Deer Bear River
Car Car River
Deer Car Bear
Deer Bear River
Car Car River
Splitting
MapReduce way – Word Count Process
The overall MapReduce Word Count Process
Input Split
Deer Bear River
Car Car River
Deer Car Bear
Deer Bear River
Car Car River
Deer Car Bear
Splitting
Deer,1
Bear,1
River,1
Car,1
Car,1
River,1
Mapping
MapReduce way – Word Count Process
The overall MapReduce Word Count Process
Input Split
Deer Bear River
Car Car River
Deer Car Bear
Deer Bear River
Car Car River
Deer Car Bear
Splitting
Deer,1
Bear,1
River,1
Car,1
Car,1
River,1
Deer,1
Car,1
Bear,1
Bear,(1,1)
Car,(1,1,1)
Dear,(1,1)
Mapping Shuffling
MapReduce way – Word Count Process
The overall MapReduce Word Count Process
Input Split
Deer Bear River
Car Car River
Deer Car Bear
Deer Bear River
Car Car River
Deer Car Bear
Splitting
Deer,1
Bear,1
River,1
Car,1
Car,1
River,1
Deer,1
Car,1
Bear,1
Bear,(1,1)
Car,(1,1,1)
Dear,(1,1)
River,(1,1)
Mapping
Bear,2
Car,3
Dear,2
Shuffling Reducing
MapReduce way – Word Count Process
The overall MapReduce Word Count Process
Input Split
Deer Bear River
Car Car River
Deer Car Bear
Deer Bear River
Car Car River
Deer Car Bear
Splitting
Deer,1
Bear,1
River,1
Car,1
Car,1
River,1
Deer,1
Car,1
Bear,1
Bear,(1,1)
Car,(1,1,1)
Dear,(1,1)
River,(1,1)
Mapping
Bear,2
Car,3
Dear,2
River,2
Shuffling Reducing Final Result
Bear,2
Car,3
Dear,2
River,2
THANKS!
Any questions?
You can reach me
yash991314@gmail.com / yogi@entrench.org

Mapreduce introduction

  • 1.
  • 2.
    HELLO! I am YogenderSingh Working with Entrench Electronic and penthao as big data and spark developer.
  • 3.
  • 4.
    Agenda For Today’sSession ◉ What is hadoop MapReduce ? ◉ MapReduce In Nutshell ◉ Two Advantages of MapReduce ◉ Hadoop MapReduce Approach with an Example
  • 5.
  • 6.
  • 7.
    MapReduce Data Processingand Programming ◉ MapReduce is the processing components of Apache Hadoop ◉ It process data parallelly in distributed environment Result
  • 8.
  • 9.
  • 10.
    Advantage 1 :Parallel Processing ◉ Data is processed in parallel. ◉ Processing become fast. Data
  • 11.
    Advantage 2 :Data Locality – Processing to Storage ◉ Moving Data to processing is very costly. ◉ In MapReduce , we move processing to data. Data
  • 12.
  • 13.
    Election votes counting: Traditional way Election Vote Casting ◉ vote is stored at different Booths ◉ Result Center has the details of all the Booths
  • 14.
    Election votes counting: MapReduce way Counting - MapReduce Approach ◉ Votes are counted at individual booths. ◉ Booth-wise result are send back to the result Centre. ◉ Final Result is declared easily and quickly using this way.
  • 15.
  • 16.
    Anatomy of aMapReduce Program MapReduce ValueKey (k1,v1) List(k3,v3)(k2,list(v2)) List(K2,v2) Map : Reduce :
  • 17.
  • 18.
    MapReduce way –Word Count Process The overall MapReduce Word Count Process Input Split Deer Bear River Car Car River Deer Car Bear Deer Bear River Car Car River Splitting
  • 19.
    MapReduce way –Word Count Process The overall MapReduce Word Count Process Input Split Deer Bear River Car Car River Deer Car Bear Deer Bear River Car Car River Deer Car Bear Splitting Deer,1 Bear,1 River,1 Car,1 Car,1 River,1 Mapping
  • 20.
    MapReduce way –Word Count Process The overall MapReduce Word Count Process Input Split Deer Bear River Car Car River Deer Car Bear Deer Bear River Car Car River Deer Car Bear Splitting Deer,1 Bear,1 River,1 Car,1 Car,1 River,1 Deer,1 Car,1 Bear,1 Bear,(1,1) Car,(1,1,1) Dear,(1,1) Mapping Shuffling
  • 21.
    MapReduce way –Word Count Process The overall MapReduce Word Count Process Input Split Deer Bear River Car Car River Deer Car Bear Deer Bear River Car Car River Deer Car Bear Splitting Deer,1 Bear,1 River,1 Car,1 Car,1 River,1 Deer,1 Car,1 Bear,1 Bear,(1,1) Car,(1,1,1) Dear,(1,1) River,(1,1) Mapping Bear,2 Car,3 Dear,2 Shuffling Reducing
  • 22.
    MapReduce way –Word Count Process The overall MapReduce Word Count Process Input Split Deer Bear River Car Car River Deer Car Bear Deer Bear River Car Car River Deer Car Bear Splitting Deer,1 Bear,1 River,1 Car,1 Car,1 River,1 Deer,1 Car,1 Bear,1 Bear,(1,1) Car,(1,1,1) Dear,(1,1) River,(1,1) Mapping Bear,2 Car,3 Dear,2 River,2 Shuffling Reducing Final Result Bear,2 Car,3 Dear,2 River,2
  • 23.
    THANKS! Any questions? You canreach me yash991314@gmail.com / yogi@entrench.org