Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References End       ...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndOutline...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIntrodu...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIntrodu...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivat...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivat...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivat...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivat...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIssuesI...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExample...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExample...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIssuesI...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproac...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExample...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExample...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndCompari...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndConclus...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndReferen...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndReferen...
Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References End       ...
Upcoming SlideShare
Loading in...5
×

Application of MapReduce in Cloud Computing

10,004

Published on

3 Comments
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
10,004
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
255
Comments
3
Likes
3
Embeds 0
No embeds

No notes for slide

Application of MapReduce in Cloud Computing

  1. 1. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References End MapReduce in Cloud Computing Mohammad Mustaqeem M.Tech 2nd Year Reg No: 2011CS17 Computer Science and Engineering Department Motilal Nehru National Institute of Technology Allahabad November 8, 2012
  2. 2. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndOutline 1 Introduction 2 Motivation 3 Description of First Paper Issues Approach Used HDFS MapReduce Progamming Model Example: Word Count 4 Description of Second Paper Issues Approach Used Architecture System Mechanism Example 5 Comparison 6 Conclusion
  3. 3. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIntroduction MapReduce is a general-purpose programming model for data-intensive computing. It was introduced by Google in 2004 to construct its web index. It is also used at Yahoo, Facebook etc. It uses a parallel computing model that distributes computational tasks to large number of nodes(approx 1000-10000 nodes.) It is fault-tolerable. It can work even when 1600 nodes among 1800 nodes fails. Return
  4. 4. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIntroduction In MapReduce model, user has to write only two functions- map and reduce. Few examples that can be easily expressed as MapReduce computations: Distributed Grep Count of URL Access Frequency Inverted Index Mining Return
  5. 5. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivation Cloud Computing refers to services that are offered by cluster having 1000 to 10000 machines[6]. e.g. services offered by Yahoo, Google etc. Cloud computing deliveres computing resources as a service. It may be - Infrastructure as a Service (IaaS). Platform as a Service (PaaS). Software as a Service (SaaS). Storage as a Service (STaaS). etc. Return
  6. 6. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivation cont.. Cloud Service is different from traditional hosting service in following ways[6] - It is sold on demand, typically by the minute or the hour. It is elastic - a user can have as much or as little of a service as they want at any given time. It is fully managed by provider (the consumer needs nothing but a personal computer and Internet access) Amazon Web Services is the largest public cloud provider. Return
  7. 7. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivation cont.. MapReduce is a programming model for large-scale computing[3]. It uses distributed environment of the cloud to process large amount of data in reasonable amount of time. It was inspired by map and reduce function of Functional Programming Language(like LISP, scheme, racket)[3]. Map and Reduce in Racket (Functional Programming Language)[4]: Map: (map f list1) → list2 e.g. (map square ’(1 2 3 4 5)) → ’(1 4 9 16 25) Reduce: (foldl f init list1) → any e.g. (foldl + 0 ’(1 2 3 4 5)) → 15 Return
  8. 8. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndMotivation cont.. Although, the map and reduce functions in MapReduce model is not exactly same as in functional programming. Map and Reduce functions in MapReduce model: Map: It process a (key, value) pair and returns a list of (intermediate key, value) pairs- map(k1, v1) → list(k2, v2) Reduce: It merges all intermediate values having the same intermediate key- reduce(k2, list(v2)) → list(v3) Return
  9. 9. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIssuesIssues Gaizhen Yang, "The Application of MapReduce in the Cloud Computing" It analyzes Hadoop. Hadoop is the implementation of MapReduce Model. It process data parallely in distributed manner. It divides the data into different logical blocks and process these logical blocks in parallel on different machines and at last combines all the results to produce the final result[1]. It is fault-tolerable. One attractive feature of Hadoop is that user can write the map and reduce functions in any programming langauge. Return
  10. 10. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedApproach Used Hadoop is an open source Java framework for processing large amount of data on the clusters of machines[1]. Hadoop is the implementation of Google’s MapReduce programming model. Yahoo is the biggest contributor of Hadoop[5]. Hadoop has mainly two components: Hadoop Distributed File System (HDFS) MapReduce Return
  11. 11. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedHDFS HDFS provides support for distributed storage[1]. Like traditional File System, the files can be deleted, renamed etc. HDFS has two types of nodes: Name Node Data Node Figure: HDFS Architecture
  12. 12. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedHDFS cont.. Name Node: Name Node provides the main data services. It is a process that runs on a separate machine. It stores only the meta-data of the files and directories. Programmer access files through it. For reliablity of the file system, it keeps multiple copies of the same file blocks. Data Node: Data Node is a process that runs on individual machine of the cluster. The file blocks are stored in the local file system of these nodes. It periodically send the meta-data of the stored blocks to the Name Node. Return
  13. 13. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedMapReduce Progamming Model MapReduce is the key concept behind the Hadoop. It is a technique for dividing the work across a distributed system. The user has to define only two functions: Map: It process a (key, value) pair and returns a list of (intermediate key, value) pairs- map(k1, v1) → list(k2, v2) Reduce: It merges all intermediate values having the same intermediate key- reduce(k2, list(v2)) → list(v3) Return
  14. 14. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedMapReduce Progamming Model cont.. Execution phase of a MapReduce Application 1 MapReduce library splits files into M pieces and copies these pieces into multiple machines. 2 Master picks the idle workers and assigns a map task. 3 The map workers process key-value pairs of the input data and passes each pair to the user-defined map function and produces the intermediate key-value pairs. 4 The map worker buffers the output key-value pairs in the local memory. It passes these memory locations to the master and then master forwards it to the reducer. 5 After reading the intermediate key-value pairs, the reducer sorts these pairs by the intermediate key. 6 For each intermediate key, the user defined reduce function is applied to the corresponding intermediate values. Return
  15. 15. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedMapReduce Progamming Model cont.. 7 When all map tasks and reduce tasks have been completed. Master gives the final output to the user. Figure: Execution phase of a generic MapReduce Application Return
  16. 16. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExample: Word CountExample: Word Count The pseudo code of map and reduce function for word count problem is - Algorithm 3.1: MAPPER(filename, file − contents) for each word ∈ file − contents do EMIT(word, 1) Algorithm 3.2: REDUCER(word, values) sum ← 0 for each value ∈ values do sum ← sum + value EMIT(word, sum)
  17. 17. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExample: Word CountExample: Word Count cont.. Figure: Word Count Execution Return
  18. 18. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndIssuesIssues Fabrizio Marozzo, Domenico Talia, Paolo Trunfioa, "P2P-MapReduce: Parallel data processing in dynamic Cloud environments" The discussed MapReduce is centralized. It can’t deal with master failure. Since the nodes joins and leaves the cloud dynamically, we need a P2P-MapReduce model. This paper descibes an adaptive P2P-MapReduce system that can handle the master failure. Return
  19. 19. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedApproach Used P2P-MapReduce is a programming model in which nodes may join and leave the cluster dynamically. The nodes act as either master or slave at a time. The master and slave interchange to each other dynamically such that the master/slave ratio remains constant. To prevent the loss of computation in case of master failure, there are some backup masters for each masters. The master responsible for a job J is referred as the primary master for J. The primary master dynamically updates the job state on its backup nodes, which are referred as backup masters for J. When a primary master fails, its place is taken by one of its backup masters. Return
  20. 20. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedArchitecture There are three type of nodes in P2P-MapReduce architecture: User Master Slave The masters and slaves nodes form two logical peer-to-peer network M-net and S-net respectively. The composition of M-net and S-net changes dynamically. User node submits the MapReduce job to one of the available master nodes. The selection of master node is done by current workload of the available master nodes. Return
  21. 21. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedArchitecture cont.. Master nodes perform three type of operations[2]: Management: A master node that is acting as primary master for one or more jobs, executes management operation. Recovery: A master node that is acting as backup master for one or more jobs, executes recovery operation. Coordination: The coordinator operation changes slaves into masters and vice-versa, so as to keep the desired master/slave ratio. The slave executes tasks that are assigned to it by one or more primary masters. Return
  22. 22. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedArchitecture cont.. For each managed jobs, primary master runs one Job Manager. Backup masters runs Backup Job Manager. For each assigned tasks, slave runs one Task Manager. The task manager keeps informing to its job manager. The information includes the status of the slave(ACTIVE or DEAD) and howmuch computation has been done. If a master doesn’t get the signal from a task manager, then it reschedules that assigned task on another idle slave. In addition to this condition, if a slave works slowly, then also the master node reschedules that assigned task on another idle slave and consider that output which comes first and discards other. Return
  23. 23. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndApproach UsedSystem Mechanism The mechanism of a generic node can be understood by UML state diagram[2]. Figure: Behaviour of a generic node described by an UML State Diagram
  24. 24. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExampleExample Figure: P2P-MapReduce example
  25. 25. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndExampleExample cont.. The following recovery procedure takes place when a primary master Node1 fails[2]: Backup masters Node2 and Node3 detect the failure of Node1 and starts a distributed procedure to elect the new primary master among them. Assuming that Node3 is elected as the new primary master, Node2 continues to play the backup function and, to keep the desired number of backup masters active, another backup node is chosen by Node3. Node3 uses its local replica of the job to proceed from where the Node1 fails. Return
  26. 26. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndComparison between two Papers First Paper Second Paper Issues To perform data-intensive To design a P2P MapReduce computation in Cloud en- system that can handle all the vironment in reasonable node’s failure including Mas- amount of time. ter node’s failure. Approaches Used Simple MapReduce (pre- Peer-to-peer architecture is sented by Google) imple- used to handle all the dy- mentation is used. The namic churns in a cluster. implemented version is known as Hadoop, which is based on the Master-Slave Model. Advantages Hadoop is scalable, reliable P2P-MapReduce can man- and distributed able to handle age node churn, master fail- enormous amount of data. It ures and job recovery in an ef- can process big data in real fective way. time. Table: Comparison between two Papers.
  27. 27. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndConclusion MapReduce is scalable, reliable computing model to exploids the distributed environment of the cloud. MapReduce optimizes the system performance by rescheduling the slow task on multiple slaves. P2P-MapReduce has all the property of simple MapReduce. Since P2P-MapReduce provides fault-tolerance against master failures, so it is more reliable. P2P-MapReduce prevents computation loss by keeping job state at backup masters. Return
  28. 28. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndReferences Gaizhen Yang, "The Application of MapReduce in the Cloud Computing", International Symposium on Intelligence Information Processing and Trusted Computing (IPTC), October 2011, pp. 154-156, http://ieeexplore.ieee. org/xpl/articleDetails.jsp?tp=&arnumber=6103560. Fabrizio Marozzo, Domenico Talia, Paolo Trunfioa, "P2P-MapReduce: Parallel data processing in dynamic Cloud environments", Journal of Computer and System Sciences, vol. 78, Issue 5 September 212, pp. 1382-1402,http://dl.acm.org/citation.cfm?id=2240494. Jeffrey Dean and Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters", OSDI’04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, vol. 6, 2004, pp.10-10, www.usenix.org/event/osdi04/tech/full_papers/dean/dean. pdfandhttp://dl.acm.org/citation.cfm?id=1251254.1251264.. Return
  29. 29. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References EndReferences The Racket Guide, http://docs.racket-lang.org/guide/. Hadoop Tutorial - YDN, http://developer.yahoo.com/hadoop/tutorial/module4.html. http://readwrite.com/2012/10/15/ why-the-future-of-software-and-apps-is-serverless. F. Marozzo, D. Talia, P. Trunfio, "A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments", In: N. Antonopoulos, L. Gillam (eds.), Cloud Computing: Principles, Systems and Applications, Springer, Chapter 7, 113-125, 2010, IBM developer work, Using MapReduce and load balancing on the cloud, http: //www.ibm.com/developerworks/cloud/library/cl-mapreduce/. Return
  30. 30. Introduction Motivation Description of First Paper Description of Second Paper Comparison Conclusion References End THANK YOU Return
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×