Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce
Upcoming SlideShare
Loading in...5
×
 

Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

on

  • 830 views

This presentation is based on two papers: ...

This presentation is based on two papers:
1) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP, 2003
2) J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, 2004

Statistics

Views

Total Views
830
Views on SlideShare
823
Embed Views
7

Actions

Likes
0
Downloads
11
Comments
0

2 Embeds 7

http://www.linkedin.com 5
http://www.docshut.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce Presentation Transcript

    • Cloud Infrastructure: GFS & MapReduce Andrii Vozniuk Used slides of: Jeff Dean, Ed AustinData Management in the Cloud EPFL February 27, 2012
    • Outline• Motivation• Problem Statement• Storage: Google File System (GFS)• Processing: MapReduce• Benchmarks• Conclusions
    • Motivation• Huge amounts of data to store and process• Example @2004: – 20+ billion web pages x 20KB/page = 400+ TB – Reading from one disc 30-35 MB/s • Four months just to read the web • 1000 hard drives just to store the web • Even more complicated if we want to process data• Exp. growth. The solution should be scalable.
    • Motivation• Buy super fast, ultra reliable hardware? – Ultra expensive – Controlled by third party – Internals can be hidden and proprietary – Hard to predict scalability – Fails less often, but still fails! – No suitable solution on the market
    • Motivation• Use commodity hardware? Benefits: – Commodity machines offer much better perf/$ – Full control on and understanding of internals – Can be highly optimized for their workloads – Really smart people can do really smart things• Not that easy: – Fault tolerance: something breaks all the time – Applications development – Debugging, Optimization, Locality – Communication and coordination – Status reporting, monitoring• Handle all these issues for every problem you want to solve
    • Problem Statement• Develop a scalable distributed file system for large data-intensive applications running on inexpensive commodity hardware• Develop a tool for processing large data sets in parallel on inexpensive commodity hardware• Develop the both in coordination for optimal performance
    • Google Cluster Environment Cloud DatacentersServers Racks Clusters
    • Google Cluster Environment• @2009: – 200+ clusters – 1000+ machines in many of them – 4+ PB File systems – 40GB/s read/write load – Frequent HW failures – 100s to 1000s active jobs (1 to 1000 tasks)• Cluster is 1000s of machines – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day – How to store data reliably with high throughput? – How to make it easy to develop distributed applications?
    • Google Technology Stack
    • Google Technology Stack Focus of this talk
    • GF S Google File System*• Inexpensive commodity hardware• High Throughput > Low Latency• Large files (multi GB)• Multiple clients• Workload – Large streaming reads – Small random writes – Concurrent append to the same file* The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
    • GF S Architecture• User-level process running on commodity Linux machines• Consists of Master Server and Chunk Servers• Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs)• Data transfers happen directly between clients and Chunk Servers
    • GF S Master Node• Centralization for simplicity• Namespace and metadata management• Managing chunks – Where they are (file<-chunks, replicas) – Where to put new – When to re-replicate (failure, load-balancing) – When and what to delete (garbage collection)• Fault tolerance – Shadow masters – Monitoring infrastructure outside of GFS – Periodic snapshots – Mirrored operations log
    • GF S Master Node• Metadata is in memory – it’s fast! – A 64 MB chunk needs less than 64B metadata => for 640 TB less than 640MB• Asks Chunk Servers when – Master starts – Chunk Server joins the cluster• Operation log – Is used for serialization of concurrent operations – Replicated – Respond to client only when log is flushed locally and remotely
    • GF S Chunk Servers• 64MB chunks as Linux files – Reduce size of the master‘s datastructure – Reduce client-master interaction – Internal fragmentation => allocate space lazily – Possible hotspots => re-replicate• Fault tolerance – Heart-beat to the master – Something wrong => master inits replication
    • GF S Mutation Order Current lease holder? Write request 3a. data identity of primary location of replicas (cached by client)Operation completed Operation completedor Error report 3b. data Primary assigns # to mutations Applies it Forwards write request 3c. data Operation completed
    • GF S Control & Data Flow• Decouple control flow and data flow• Control flow – Master -> Primary -> Secondaries• Data flow – Carefully picked chain of Chunk Servers • Forward to the closest first • Distance estimated based on IP – Fully utilize outbound bandwidth – Pipelining to exploit full-duplex links
    • GF S Other Important Things• Snapshot operation – make a copy very fast• Smart chunks creation policy – Below-average disk utilization, limited # of recent• Smart re-replication policy – Under replicated first – Chunks that are blocking client – Live files first (rather than deleted)• Rebalance and GC periodically How to process data stored in GFS?
    • M M M R R MapReduce*• A simple programming model applicable to many large-scale computing problems• Divide and conquer strategy• Hide messy details in MapReduce runtime library: – Automatic parallelization – Load balancing – Network and disk transfer optimizations – Fault tolerance – Part of the stack: improvements to core library benefit all users of library*MapReduce: Simplified Data Processing on Large Clusters.J. Dean, S. Ghemawat. OSDI, 2004
    • M M MR R Typical problem• Read a lot of data. Break it into the parts• Map: extract something important from each part• Shuffle and Sort• Reduce: aggregate, summarize, filter, transform Map results• Write the results• Chain, Cascade• Implement Map and Reduce to fit the problem
    • M M MR R Nice Example
    • M M MR R Other suitable examples• Distributed grep• Distributed sort (Hadoop MapReduce won TeraSort)• Term-vector per host• Document clustering• Machine learning• Web access log stats• Web link-graph reversal• Inverted index construction• Statistical machine translation
    • M M MR R Model• Programmer specifies two primary methods – Map(k,v) -> <k’,v’>* – Reduce(k’, <v’>*) -> <k’,v’>*• All v’ with same k’ are reduced together, in order• Usually also specify how to partition k’ – Partition(k’, total partitions) -> partition for k’ • Often a simple hash of the key • Allows reduce operations for different k’ to be parallelized
    • M M MR R CodeMap map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1");Reduce reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result))
    • M M MR R Architecture • One master, many workers • Infrastructure manages scheduling and distribution • User implements Map and Reduce
    • M M MR R Architecture
    • M M MR R Architecture
    • M M MR R Architecture Combiner = local Reduce
    • M M MR R Important things• Mappers scheduled close to data• Chunk replication improves locality• Reducers often run on same machine as mappers• Fault tolerance – Map crash – re-launch all task of machine – Reduce crash – repeat crashed task only – Master crash – repeat whole job – Skip bad records• Fighting ‘stragglers’ by launching backup tasks• Proven scalability• September 2009 Google ran 3,467,000 MR Jobs averaging 488 machines per Job• Extensively used in Yahoo and Facebook with Hadoop
    • GF S Benchmarks: GFS
    • M M MR R Benchmarks: MapReduce
    • Conclusions“We believe we get tremendous competitive advantageby essentially building our own infrastructure”-- Eric Schmidt• GFS & MapReduce – Google achieved their goals – A fundamental part of their stack• Open source implementations – GFS  Hadoop Distributed FS (HDFS) – MapReduce  Hadoop MapReduce
    • Thank your for your attention! Andrii.Vozniuk@epfl.ch