• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Distributed Computing
 

Distributed Computing

on

  • 1,644 views

 

Statistics

Views

Total Views
1,644
Views on SlideShare
1,638
Embed Views
6

Actions

Likes
0
Downloads
34
Comments
0

2 Embeds 6

http://www.slideshare.net 5
http://varunthacker.wordpress.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Distributed Computing Distributed Computing Presentation Transcript

    • Distributed Computing Varun Thacker Linux User’s Group Manipal April 8, 2010 Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 1 / 42
    • Outline I 1 Introduction LUG Manipal Points To Remember 2 Distributed Computing Distributed Computing Technologies to be covered Idea Data !! Why Distributed Computing is Hard Why Distributed Computing is Important Three Common Distributed Architectures 3 Distributed File System GFS What a Distributed File System Does Google File System Architecture GFS Architecture: Chunks Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 2 / 42
    • Outline II GFS Architecture: Master GFS: Life of a Read GFS: Life of a Write GFS: Master Failure 4 MapReduce MapReduce Do We Need It? Bad News! MapReduce Map Reduce Paradigm MapReduce Paradigm Working Working Under the hood: Scheduling Robustness 5 Hadoop Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 3 / 42
    • Outline III Hadoop What is Hadoop Who uses Hadoop? Mapper Combiners Reducer Some Terminology Job Distribution 6 Contact Information 7 Attribution 8 Copying Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 4 / 42
    • Who are we? Linux User’s Group Manipal Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
    • Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
    • Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
    • Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Most technologically focused “group” in University Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
    • Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Most technologically focused “group” in University LUG Manipal is a non profit “Group” alive only on voluntary work!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
    • Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Most technologically focused “group” in University LUG Manipal is a non profit “Group” alive only on voluntary work!! http://lugmanipal.org Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Documentation is really important Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Documentation is really important Google is your friend Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Documentation is really important Google is your friend If you have questions after this workshop mail me or come to LUG Manipal’s forums http://forums.lugmanipal.org Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
    • Distributed Computing Distributed Computing Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 7 / 42
    • Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
    • Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
    • Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. MapReduce is a framework which implements the idea of a distributed computing. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
    • Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. MapReduce is a framework which implements the idea of a distributed computing. GFS is the distributed file system on which distributed programs store and process data in Google. It’s free implementation is HDFS. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
    • Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. MapReduce is a framework which implements the idea of a distributed computing. GFS is the distributed file system on which distributed programs store and process data in Google. It’s free implementation is HDFS. Hadoop is an open source framework written in Java which implements the MapReduce technology. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
    • Idea While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42
    • Idea While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up. One terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42
    • Idea While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up. One terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42
    • Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
    • Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
    • Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
    • Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. The New York Stock Exchange generates about one terabyte of new trade data per day. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
    • Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
    • Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. The Large Hadron Collider near Geneva produces about 15 petabytes of data per year. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
    • Why Distributed Computing is Hard Computers crash. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
    • Why Distributed Computing is Hard Computers crash. Network links crash. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
    • Why Distributed Computing is Hard Computers crash. Network links crash. Talking is slow(even ethernet has 300 microsecond latency, during which time your 2Ghz PC can do 600,000 cycles). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
    • Why Distributed Computing is Hard Computers crash. Network links crash. Talking is slow(even ethernet has 300 microsecond latency, during which time your 2Ghz PC can do 600,000 cycles). Bandwidth is finite. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
    • Why Distributed Computing is Hard Computers crash. Network links crash. Talking is slow(even ethernet has 300 microsecond latency, during which time your 2Ghz PC can do 600,000 cycles). Bandwidth is finite. Internet scale: the computers and network are heterogeneous,untrustworthy, and subject to change at any time. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
    • Why Distributed Computing is Important Can be more reliable. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42
    • Why Distributed Computing is Important Can be more reliable. Can be faster. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42
    • Why Distributed Computing is Important Can be more reliable. Can be faster. Can be cheaper ($30 million Cray versus 100 $1000 PC’s). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42
    • Three Common Distributed Architectures Hope: have N computers do separate pieces of work. Speed-up < N. Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of individual crash). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42
    • Three Common Distributed Architectures Hope: have N computers do separate pieces of work. Speed-up < N. Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of individual crash). Replication: have N computers do the same thing. Speed-up < 1. Probability of failure = p N . Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42
    • Three Common Distributed Architectures Hope: have N computers do separate pieces of work. Speed-up < N. Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of individual crash). Replication: have N computers do the same thing. Speed-up < 1. Probability of failure = p N . Master-servant: have 1 computer hand out pieces of work to N-1 servants, and re-hand out pieces of work if servants fail. Speed-up < N − 1. Probability of failure ≈ p. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42
    • GFS GFS Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 14 / 42
    • What a Distributed File System Does Usual file system stuff: create, read, move & find files. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
    • What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
    • What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Files are stored distributedly. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
    • What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Files are stored distributedly. If you just do #1 and #2, you are a network file system. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
    • What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Files are stored distributedly. If you just do #1 and #2, you are a network file system. To do #3, it’s a good idea to also provide fault tolerance. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
    • GFS Architecture Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 16 / 42
    • GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
    • GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
    • GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
    • GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. By default, each chunk is stored thrice, preferably on more than one rack. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
    • GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. By default, each chunk is stored thrice, preferably on more than one rack. To protect data integrity, each 64 KB block gets a 32 bit checksum that is checked on all reads. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
    • GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. By default, each chunk is stored thrice, preferably on more than one rack. To protect data integrity, each 64 KB block gets a 32 bit checksum that is checked on all reads. When idle, a chunkserver scans inactive chunks for corruption. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Disadvantages of single master: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Disadvantages of single master: Metadata operations are bottlenecked. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Disadvantages of single master: Metadata operations are bottlenecked. Maximum Number of files limited by master’s memory. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Client reads chunk 2 from the closest location. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Client reads chunk 2 from the closest location. Client reads chunk 3 from the closest location. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Client reads chunk 2 from the closest location. Client reads chunk 3 from the closest location. ... Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Tricky Details: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Tricky Details: Master hands out a short term ( 1 minute) lease for a particular replica to be the primary one. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Tricky Details: Master hands out a short term ( 1 minute) lease for a particular replica to be the primary one. This primary replica assigns a serial number to each mutation so that every replica performs the mutations in the same order. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
    • GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
    • GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Both are replicated. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
    • GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Both are replicated. Master election and notification is implemented using an external lock server. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
    • GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Both are replicated. Master election and notification is implemented using an external lock server. New master restores state from checkpoint and log. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
    • MapReduce MapReduce Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 22 / 42
    • Do We Need It? Yes: Otherwise some problems are too big. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
    • Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
    • Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes One computer can read 30-35 MB/sec from disk Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
    • Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes One computer can read 30-35 MB/sec from disk four months to read the web Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
    • Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes One computer can read 30-35 MB/sec from disk four months to read the web Same problem with 1000 machines, < 3 hours Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
    • Bad News! Bad News!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Bad news II: repeat for every problem you want to solve Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Bad news II: repeat for every problem you want to solve Good News I and II: MapReduce and Hadoop! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Bad news II: repeat for every problem you want to solve Good News I and II: MapReduce and Hadoop! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures robustness Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures robustness Therfore we can write application level programs and let MapReduce insulate us from many concerns. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures robustness Therfore we can write application level programs and let MapReduce insulate us from many concerns. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
    • Map Reduce Paradigm Read a lot of data Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
    • Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
    • Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Shuffle and Sort. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
    • Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Shuffle and Sort. Reduce: aggregate, summarize, filter, or transform Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
    • Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Shuffle and Sort. Reduce: aggregate, summarize, filter, or transform Write the results. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> All v’ with same k’ are reduced together. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> All v’ with same k’ are reduced together. (Remember the invisible “Shuffle and Sort” step.) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> All v’ with same k’ are reduced together. (Remember the invisible “Shuffle and Sort” step.) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
    • Working Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 28 / 42
    • Working Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 29 / 42
    • Under the hood: Scheduling One master, many workers Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Worker reads intermediate (k,v) pairs from map workers Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Worker reads intermediate (k,v) pairs from map workers Worker sorts & applies user’s Reduce op to produce the output Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Worker reads intermediate (k,v) pairs from map workers Worker sorts & applies user’s Reduce op to produce the output User may specify Partition: which intermediate keys to which Reducer Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
    • Robustness One master, many workers. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: State is checkpointed to replicated file system. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: State is checkpointed to replicated file system. New master recovers & continues. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: State is checkpointed to replicated file system. New master recovers & continues. Very Robust: lost 1600 of 1800 machines once, but finished fine-Google. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
    • Hadoop Hadoop Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 32 / 42
    • What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
    • What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
    • What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
    • What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. It is then made input to the reduce tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
    • What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. It is then made input to the reduce tasks. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
    • Who uses Hadoop? Adobe Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Yahoo! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Yahoo! The New York Times,Last.fm,Hulu,LinkedIn Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Yahoo! The New York Times,Last.fm,Hulu,LinkedIn Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Users can optionally specify a combiner to perform local aggregation of the intermediate outputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Users can optionally specify a combiner to perform local aggregation of the intermediate outputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
    • Combiners When the map operation outputs its pairs they are already available in memory. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
    • Combiners When the map operation outputs its pairs they are already available in memory. If a combiner is used then the map key-value pairs are not immediately written to the output. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
    • Combiners When the map operation outputs its pairs they are already available in memory. If a combiner is used then the map key-value pairs are not immediately written to the output. They are collected in lists, one list per each key value. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
    • Combiners When the map operation outputs its pairs they are already available in memory. If a combiner is used then the map key-value pairs are not immediately written to the output. They are collected in lists, one list per each key value. When a certain number of key-value pairs have been written, this buffer is flushed by passing all the values of each key to the combiner’s reduce method and outputting the key-value pairs of the combine operation as if they were created by the original map operation. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. Reduce:In this phase the reduce method is called for each <key, (list of values)> pair in the grouped inputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. Reduce:In this phase the reduce method is called for each <key, (list of values)> pair in the grouped inputs. The generated ouput is a new value. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
    • Some Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42
    • Some Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set. Task – An execution of a Mapper or a Reducer on a slice of data Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42
    • Some Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set. Task – An execution of a Mapper or a Reducer on a slice of data Task Attempt – A particular instance of an attempt to execute a task on a machine. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42
    • Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42
    • Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options. Running a MapReduce job places these files into the HDFS and notifies TaskTrackers where to retrieve the relevant program code. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42
    • Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options. Running a MapReduce job places these files into the HDFS and notifies TaskTrackers where to retrieve the relevant program code. Data Distribution: Implicit in design of MapReduce! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42
    • Contact Information Varun Thacker Linux User’s Group Manipal varunthacker1989@gmail.com http://lugmanipal.org http: http://forums.lugmanipal.org //varunthacker.wordpress.com Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 40 / 42
    • Attribution Google Under the Creative Commons Attribution-Share Alike 2.5 Generic. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 41 / 42
    • Copying Creative Commons Attribution-Share Alike 2.5 India License http://creativecommons.org/licenses/by-sa/2.5/in/ Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 42 / 42