Your SlideShare is downloading. ×
0
HADOOP MAPREDUCE
Darwade Sandip
MNIT Jaipur
December 25, 2013
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 1 /...
Outline
What is HADOOP
What is MapReduce
Componants of Hadoop
Architecture
Implementation
Bibliography
Darwade Sandip (MNI...
What is Hadoop?
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data...
MapReduce
Programming model for data processing
Hadoop can run MapReduce programs written in various
languages Java,Python...
Componants Of Hadoop
Two Main Components of Hadoop
HDFS
MAPREDUCE
HDFS
Files are stored in HDFS and divided into blocks, w...
HDFS
NameNode
Run on a separate machine
Manage the file system namespace,
and control access of external clients
Store file ...
MapReduce
Files are split into fixed sized blocks
and stored on data nodes (Default 64MB)
Programs written, can process on ...
MapReduce (continue...)
Figure: MapReduce Process Architecture
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 8 ...
MapReduce (continue...)
Map
Map process each block separately in parallel
Generate an intermediate key/value pairs set
Res...
How Hadoop runs a MapReduce.
The client, which submits the MapReduce job.
The jobtracker, which coordinates the job.
The t...
How Hadoop runs a MapReduce.
Job Submission
Job Initialization
Task Assignment
Task Execution
Job Completion
Darwade Sandi...
How Hadoop runs a MapReduce
Figure: How Hadoop runs a MapReduce job using the classic framework
Darwade Sandip (MNIT) HADO...
How Hadoop runs a MapReduce.
Job Submission
submit() method creates an internal JobSummitter calls
submitJobInternal()
The...
How Hadoop runs a MapReduce.
Job Initialization
When the JobTracker receives a call submitJob(), it puts it into an
intern...
How Hadoop runs a MapReduce.
Job Execution
Next step for the TaskTracker is to run the task.
It localizes the job JAR by c...
Implementation
Figure: Minimum Tempurature
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 16 / 21
Implementation
Figure: Maximum Tempurature
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 17 / 21
Implementation (continue...)
Figure: Word Count
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 18 / 21
Implementation (continue...)
Figure: Word Count
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 19 / 21
Bibliography I
G. Yang, “The application of mapreduce in the cloud computing,” Intelligence
Information Processing and Tru...
Bibliography II
Z. Gua, M. Pierce, G. Fox, and M. Zhou, “Automatic task re-organization in
mapreduce,” 2011 IEEE Internati...
Upcoming SlideShare
Loading in...5
×

Hadoop Mapreduce

351

Published on

Big data Analysis using Hadoop Mapreduce

Published in: Engineering, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
351
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop Mapreduce"

  1. 1. HADOOP MAPREDUCE Darwade Sandip MNIT Jaipur December 25, 2013 Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 1 / 21
  2. 2. Outline What is HADOOP What is MapReduce Componants of Hadoop Architecture Implementation Bibliography Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 2 / 21
  3. 3. What is Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is best known for MapReduce and its distributed filesystem (HDFS),and large-scale data processing. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 3 / 21
  4. 4. MapReduce Programming model for data processing Hadoop can run MapReduce programs written in various languages Java,Python Parallel Processing,put Mapreduce in very large-scale data analysis Mapper produce intermediate results Reducer aggregates the results Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 4 / 21
  5. 5. Componants Of Hadoop Two Main Components of Hadoop HDFS MAPREDUCE HDFS Files are stored in HDFS and divided into blocks, which are then copied to multiple Data Nodes Hadoop cluster contains only one Name Node and many DataNodes Data blocks are replicated for High Availability and fast access Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 5 / 21
  6. 6. HDFS NameNode Run on a separate machine Manage the file system namespace, and control access of external clients Store file system Meta-data in memory File information, each block information of files, and every file block information in Data Node DataNode Run on Separate machine,which is the basic unit of file storage Sent all messages of existing Blocks periodically to Name Node Data Node response read and write request from the Name Node, and also respond, create, delete, and copy the block command from Name Node Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 6 / 21
  7. 7. MapReduce Files are split into fixed sized blocks and stored on data nodes (Default 64MB) Programs written, can process on distributed clusters in parallel Input data is a set of key / value pairs, the output is also the key / value pairs Mainly Two Phase Map and Reduce Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 7 / 21
  8. 8. MapReduce (continue...) Figure: MapReduce Process Architecture Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 8 / 21
  9. 9. MapReduce (continue...) Map Map process each block separately in parallel Generate an intermediate key/value pairs set Results of these logic blocks are reassembled Reduce Accepts an intermediate key and related value Processed the intermediate key and value Form a set of relatively small value set Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 9 / 21
  10. 10. How Hadoop runs a MapReduce. The client, which submits the MapReduce job. The jobtracker, which coordinates the job. The tasktrackers, which run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker. The distributed filesystem, which is used for sharing job files between the other entities. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 10 / 21
  11. 11. How Hadoop runs a MapReduce. Job Submission Job Initialization Task Assignment Task Execution Job Completion Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 11 / 21
  12. 12. How Hadoop runs a MapReduce Figure: How Hadoop runs a MapReduce job using the classic framework Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 12 / 21
  13. 13. How Hadoop runs a MapReduce. Job Submission submit() method creates an internal JobSummitter calls submitJobInternal() The job, waitForCompletion() polls the jobs progress once per second JobSummitter does Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker Checks the output specification of the job Computes the input splits for the job. Copies the resources. Tells the jobtracker that the job is ready for execution by calling submitJob() . Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 13 / 21
  14. 14. How Hadoop runs a MapReduce. Job Initialization When the JobTracker receives a call submitJob(), it puts it into an internal queue. retrieves the input splits computed by the client from the shared filesystem Job Assignment Tasktrackers periodically sends heartbeat. Assign task to Tasktracker Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 14 / 21
  15. 15. How Hadoop runs a MapReduce. Job Execution Next step for the TaskTracker is to run the task. It localizes the job JAR by copying it from local HDFS Creates an instance of TaskRunner to run the task. Job completion When the jobtracker receives a notification that the last task for a job is complete, it changes the status for the job to ”successful”. And tell the user that it returns from the waitForCompletion() method. The jobtracker cleans up its working state Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 15 / 21
  16. 16. Implementation Figure: Minimum Tempurature Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 16 / 21
  17. 17. Implementation Figure: Maximum Tempurature Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 17 / 21
  18. 18. Implementation (continue...) Figure: Word Count Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 18 / 21
  19. 19. Implementation (continue...) Figure: Word Count Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 19 / 21
  20. 20. Bibliography I G. Yang, “The application of mapreduce in the cloud computing,” Intelligence Information Processing and Trusted Computing (IPTC) 2011, vol. 9, pp. 154–156, Oct 2011. X. Zhang, G. Wang, Z. Yang, and Y. Ding, “A two-phase execution engine of reduce tasks in hadoop mapreduce.,” 2012 International Conference on Systems and Informatics (ICSAI 2012), pp. 858–864, May 2012. T. White, Hadoop:The Definitive Guide, Third Edition. 1005 Gravenstein Highway North, Sebastopol, CA 95472: OReilly Media, Inc., 2012. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Operating System Design and Implementation (OSDI 2004), vol. 6, pp. 137–150, 2004. X. Lin, Z. Meng, C. Xu, and M. Wang, “A practical performance model for hadoop mapreduce,” 2012 IEEE International Conference on Cluster Computing Workshops, pp. 231–239, Sept 2012. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 20 / 21
  21. 21. Bibliography II Z. Gua, M. Pierce, G. Fox, and M. Zhou, “Automatic task re-organization in mapreduce,” 2011 IEEE International Conference on Cluster Computing, pp. 335–343, May 2011. K. Wang, X. Lin, and W. Tang, “An experience guided configuration optimizer for hadoop mapreduce,” Cloud Computing Technology and Science (CloudCom), pp. 419–426, Dec 2012. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 21 / 21
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×