SlideShare a Scribd company logo
1 of 18
Module 04 - MapReduce Framework
NPN TrainingTraining is the essence of success &
we are committed to it
www.npntraining.com
Course Topics `
Understanding Big Data
Module - I
Hadoop 1.x & 2.x Architecture
Module - II
Hadoop Setup and Configuration
Module - III
MapReduce Framework – I
Module - IV
Hive and Hive Query Language
Module - V
Advance Hive using Java
Module - VI
Advance HBase using Java
Module - IX
Module - X
MR Unit
Module - XI
Pig and Pig Latin
Module - XII
MapReduce Framework – III
MapReduce Framework – II
Module - VII
No SQL & HBase
Module - VIII
Advance Pig using Java
Module - XIII
Hue
Module - XV
Project Discussion & Use case
Module - XVI
Sqoop
Module - XIV
www.npntraining.com/courses/big-data-and-hadoop.php
Map, Reduce Paradigm
Overview of Record Reader & Input Splits
Executing Map Reduce programs
Data Flow in MapReduce
What is MapReduce Framework
Topics for the Module `
Word Count Implementation
Relation between InputSplits & HDFS BlockRole of Key and Pairs
Exploring different command line options
In executing MapReduce programs
Hadoop Datatypes
www.npntraining.com/courses/big-data-and-hadoop.php
Hadoop MapReduce Framework
MapReduce is a programming model for processing large data sets with a parallel ,
distributed on a cluster.
In MapReduce Programming model, work is divided into two phases:
a) Map Phase
b) Reduce Phase
The map phase takes a piece of input and performs some operations on it (e.g. extracting a field) .
The Reduce phase aggregates similar pieces of information that are produced by the map phase
(e.g. averaging fields with the same name).
These piece of information are represented by key-value pairs.
www.npntraining.com/courses/big-data-and-hadoop.php
Developer
Map
Reduce
Employee.dat
100 MB
64
MB
36
MB
Map Class
Reduce Class
1 Read the data from input file
2 Write Business Logic for processing the data
3 Send the result (Output) (Intermediate data –
Temporary Data stored on local FS)
1 Read all the output of maps
Aggregation or Consolidation Logic2
3 Final Output to HDFS Blocks and Replication
will be there
Hadoop
Framework does
sort and shuffling
of Map data to
Reducer
Map Task
Map Task
InputSplits
InputSplits
3000
2000
9000
6000
www.npntraining.com/courses/big-data-and-hadoop.php
Sample.dat
100 MB
Well come to NPN Training
we promise we teach you the best
We teach various course like Java,J2EE,Selenium,Hadoop
64
MB
36
MB
Well come to NPN Training
we promise we teach you the best
We train various course like Java,J2EE,Selenium,Hadoop
Every time Map Task reads individual key value pair
Key --> Byte Offset
Value --> Entire line as value
1block
0, Well come to NPN Training
25,we promise we teach you the best
2block
0, We teach various course like Java,J2EE,Selenium, Hadoop
By default Hadoop has TextInputFormat class(responsible
for creating InputSplits and also divides into records)
TextInputFormat class creates key value pairs.
www.npntraining.com/courses/big-data-and-hadoop.php
Word Count Use case `
Let’s assume we have a large collection of text documents in a folder
(Let’s say we have 1000 documents each with average of 1 million words)
We have to count how many times each word is repeated in the documents
How would you solve
this using simple Java
program?
How many lines of
code will you write?
How much will be the
program execution
time?
www.npntraining.com/courses/big-data-and-hadoop.php
MapReduce Paradigm
C++ J2EE Python LISP
Python Java Python
JSP Python LISP
Python Servlet JSP
Input Split
(K1,V1)
(Framework)
< 0, C++ J2EE Python LISP >
< 20, Python Java Python>
Mapper
< C++, 1 >
<J2EE, 1>
<Python,1>
<LISP,1>
<Python,1>
<Java,1>
<Python,1>
Mapper < JSP, 1 >
<Python,1>
<LISP,1>
<Python,1>
<Servlet,1>
<JSP,1>
< 0, JSP Python LISP >
< 14, Python Servlet JSP>
(C++, 1)
(Java, 1)
(J2EE, 1)
(LISP, 2)
(Python,5)
(JSP,1)
(K2,list(V1, v2, v3)
(Framework)
List (K2,V2)
List (K3,V3)
Input
Reducer
< C++ ,[1]
<Java ,[1]
<J2EE ,[1]
<LISP ,[1, 1]
<Python ,[1,1,1,1,1]
<JSP ,[1]
<Servlet ,[1]
Shuffle&Sort
www.npntraining.com/courses/big-data-and-hadoop.php
Word Count Implementation `
Map Class
Reduce Class
Driver Class
Why MapReduce
Processing data in parallel
Taking processing to the data
Map TaskHDFS Block
Node Rack
Data Center
www.npntraining.com/courses/big-data-and-hadoop.php
``
Input Splits
InputSplits Logical Division HDFS Blocks
Physical
Division
www.npntraining.com/courses/big-data-and-hadoop.php
``
Relational Between Input Splits and HDFS Blocks
1 2 3 4 5 6 7 8 9 10 11
64 MB
Blocks are cut in
between the
records
64 MB 64 MB
Last record may
cross across the
boundary of the
block
Split Split
Splits will be aware of
the positions
InputFormat is responsible
for creating InputSplits and
dividing into records
www.npntraining.com/courses/big-data-and-hadoop.php
``
Relational Between Input Splits and HDFS Blocks
Block Map TaskInputSplits
Block is the physical representation of data. Split is the logical representation of data present in Block.
Block and split size can be changed in properties.
Map reads data from Block through splits i.e. split act as a broker between Block and Mapper.
Now map reads block 1 till aa to JJ and doesn't know how to read block 2 i.e. block
doesn't know how to process different block of information. Here comes a Split it
will form a Logical grouping of Block 1 and Block 2 as single Block, then it forms
offset(key) and line (value) using inputformat and record reader and send map to
process further processing.
www.npntraining.com/courses/big-data-and-hadoop.php
``
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
Node 1 Node 2
Shuffling
Process
Intermediate
(K,V) pairs
exchanged by
all nodes
Hadoop MapReduce: A Closer Look``
Data Flow in MapReduce
Input data is distributed to nodes
Each map task works on a “split” of data
Mapper outputs intermediate data
Data exchange between nodes in a “shuffle” process
Intermediate data of the same key goes to the same reducer
Reducer output is stored
www.npntraining.com/courses/big-data-and-hadoop.php
``
`Agenda for Next Class
 Overview of Hive and its Architecture
 Understanding Hive Metastore
 Schema on Read and Schema on Write
 Hive Data Model and Complex Data types
 Internal VS External Tables
 Exporting Data from Hive
www.npntraining.com/courses/big-data-and-hadoop.php
``
www.npntraining.com +91 9535584691
www.npntraining.com +91 9535584691

More Related Content

What's hot

Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 

What's hot (18)

lec6_ref.pdf
lec6_ref.pdflec6_ref.pdf
lec6_ref.pdf
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Characterization of hadoop jobs using unsupervised learning
Characterization of hadoop jobs using unsupervised learningCharacterization of hadoop jobs using unsupervised learning
Characterization of hadoop jobs using unsupervised learning
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
BDAS RDD study report v1.2
BDAS RDD study report v1.2BDAS RDD study report v1.2
BDAS RDD study report v1.2
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Ieeepro techno solutions ieee java project - nc cloud applying network codi...
Ieeepro techno solutions   ieee java project - nc cloud applying network codi...Ieeepro techno solutions   ieee java project - nc cloud applying network codi...
Ieeepro techno solutions ieee java project - nc cloud applying network codi...
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
lec2_ref.pdf
lec2_ref.pdflec2_ref.pdf
lec2_ref.pdf
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 

Similar to Module IV - MapReduce Programming - I

Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
Supriya .
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 

Similar to Module IV - MapReduce Programming - I (20)

Big Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectraBig Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectra
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Sector CloudSlam 09
Sector CloudSlam 09Sector CloudSlam 09
Sector CloudSlam 09
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
final report
final reportfinal report
final report
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 

Recently uploaded

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Recently uploaded (20)

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

Module IV - MapReduce Programming - I

  • 1. Module 04 - MapReduce Framework NPN TrainingTraining is the essence of success & we are committed to it www.npntraining.com
  • 2. Course Topics ` Understanding Big Data Module - I Hadoop 1.x & 2.x Architecture Module - II Hadoop Setup and Configuration Module - III MapReduce Framework – I Module - IV Hive and Hive Query Language Module - V Advance Hive using Java Module - VI Advance HBase using Java Module - IX Module - X MR Unit Module - XI Pig and Pig Latin Module - XII MapReduce Framework – III MapReduce Framework – II Module - VII No SQL & HBase Module - VIII Advance Pig using Java Module - XIII Hue Module - XV Project Discussion & Use case Module - XVI Sqoop Module - XIV www.npntraining.com/courses/big-data-and-hadoop.php
  • 3. Map, Reduce Paradigm Overview of Record Reader & Input Splits Executing Map Reduce programs Data Flow in MapReduce What is MapReduce Framework Topics for the Module ` Word Count Implementation Relation between InputSplits & HDFS BlockRole of Key and Pairs Exploring different command line options In executing MapReduce programs Hadoop Datatypes www.npntraining.com/courses/big-data-and-hadoop.php
  • 4. Hadoop MapReduce Framework MapReduce is a programming model for processing large data sets with a parallel , distributed on a cluster. In MapReduce Programming model, work is divided into two phases: a) Map Phase b) Reduce Phase The map phase takes a piece of input and performs some operations on it (e.g. extracting a field) . The Reduce phase aggregates similar pieces of information that are produced by the map phase (e.g. averaging fields with the same name). These piece of information are represented by key-value pairs. www.npntraining.com/courses/big-data-and-hadoop.php
  • 5. Developer Map Reduce Employee.dat 100 MB 64 MB 36 MB Map Class Reduce Class 1 Read the data from input file 2 Write Business Logic for processing the data 3 Send the result (Output) (Intermediate data – Temporary Data stored on local FS) 1 Read all the output of maps Aggregation or Consolidation Logic2 3 Final Output to HDFS Blocks and Replication will be there Hadoop Framework does sort and shuffling of Map data to Reducer Map Task Map Task InputSplits InputSplits 3000 2000 9000 6000 www.npntraining.com/courses/big-data-and-hadoop.php
  • 6. Sample.dat 100 MB Well come to NPN Training we promise we teach you the best We teach various course like Java,J2EE,Selenium,Hadoop 64 MB 36 MB Well come to NPN Training we promise we teach you the best We train various course like Java,J2EE,Selenium,Hadoop Every time Map Task reads individual key value pair Key --> Byte Offset Value --> Entire line as value 1block 0, Well come to NPN Training 25,we promise we teach you the best 2block 0, We teach various course like Java,J2EE,Selenium, Hadoop By default Hadoop has TextInputFormat class(responsible for creating InputSplits and also divides into records) TextInputFormat class creates key value pairs. www.npntraining.com/courses/big-data-and-hadoop.php
  • 7. Word Count Use case ` Let’s assume we have a large collection of text documents in a folder (Let’s say we have 1000 documents each with average of 1 million words) We have to count how many times each word is repeated in the documents How would you solve this using simple Java program? How many lines of code will you write? How much will be the program execution time? www.npntraining.com/courses/big-data-and-hadoop.php
  • 8. MapReduce Paradigm C++ J2EE Python LISP Python Java Python JSP Python LISP Python Servlet JSP Input Split (K1,V1) (Framework) < 0, C++ J2EE Python LISP > < 20, Python Java Python> Mapper < C++, 1 > <J2EE, 1> <Python,1> <LISP,1> <Python,1> <Java,1> <Python,1> Mapper < JSP, 1 > <Python,1> <LISP,1> <Python,1> <Servlet,1> <JSP,1> < 0, JSP Python LISP > < 14, Python Servlet JSP> (C++, 1) (Java, 1) (J2EE, 1) (LISP, 2) (Python,5) (JSP,1) (K2,list(V1, v2, v3) (Framework) List (K2,V2) List (K3,V3) Input Reducer < C++ ,[1] <Java ,[1] <J2EE ,[1] <LISP ,[1, 1] <Python ,[1,1,1,1,1] <JSP ,[1] <Servlet ,[1] Shuffle&Sort www.npntraining.com/courses/big-data-and-hadoop.php
  • 9. Word Count Implementation ` Map Class Reduce Class Driver Class
  • 10. Why MapReduce Processing data in parallel Taking processing to the data Map TaskHDFS Block Node Rack Data Center www.npntraining.com/courses/big-data-and-hadoop.php ``
  • 11. Input Splits InputSplits Logical Division HDFS Blocks Physical Division www.npntraining.com/courses/big-data-and-hadoop.php ``
  • 12. Relational Between Input Splits and HDFS Blocks 1 2 3 4 5 6 7 8 9 10 11 64 MB Blocks are cut in between the records 64 MB 64 MB Last record may cross across the boundary of the block Split Split Splits will be aware of the positions InputFormat is responsible for creating InputSplits and dividing into records www.npntraining.com/courses/big-data-and-hadoop.php ``
  • 13. Relational Between Input Splits and HDFS Blocks Block Map TaskInputSplits Block is the physical representation of data. Split is the logical representation of data present in Block. Block and split size can be changed in properties. Map reads data from Block through splits i.e. split act as a broker between Block and Mapper. Now map reads block 1 till aa to JJ and doesn't know how to read block 2 i.e. block doesn't know how to process different block of information. Here comes a Split it will form a Logical grouping of Block 1 and Block 2 as single Block, then it forms offset(key) and line (value) using inputformat and record reader and send map to process further processing. www.npntraining.com/courses/big-data-and-hadoop.php ``
  • 14. file file InputFormat Split Split Split RR RR RR Map Map Map Input (K, V) pairs Partitioner Intermediate (K, V) pairs Sort Reduce OutputFormat Files loaded from local HDFS store RecordReaders Final (K, V) pairs Writeback to local HDFS store file file InputFormat Split Split Split RR RR RR Map Map Map Input (K, V) pairs Partitioner Intermediate (K, V) pairs Sort Reduce OutputFormat Files loaded from local HDFS store RecordReaders Final (K, V) pairs Writeback to local HDFS store Node 1 Node 2 Shuffling Process Intermediate (K,V) pairs exchanged by all nodes Hadoop MapReduce: A Closer Look``
  • 15. Data Flow in MapReduce Input data is distributed to nodes Each map task works on a “split” of data Mapper outputs intermediate data Data exchange between nodes in a “shuffle” process Intermediate data of the same key goes to the same reducer Reducer output is stored www.npntraining.com/courses/big-data-and-hadoop.php ``
  • 16. `Agenda for Next Class  Overview of Hive and its Architecture  Understanding Hive Metastore  Schema on Read and Schema on Write  Hive Data Model and Complex Data types  Internal VS External Tables  Exporting Data from Hive www.npntraining.com/courses/big-data-and-hadoop.php ``