SlideShare a Scribd company logo
1 of 9
MAULANA AZAD NATIONAL URDU UNIVERSITY
Topic
GFS
Map Reduce
GFS (GOOGLE FILE SYSTEM)
• A scalable distributed file system for large
distributed data intensive applications
• Multiple GFS clusters are currently deployed.
• The largest ones (in 2003) have:
o 1000+ storage nodes
o 300+ TeraBytes of disk storage heavily accessed
by hundreds of clients on distinct machines
THE DESIGN
• Cluster consists of a single master and multiple
chunkservers and is accessed by multiple clients
• Google organized the GFS into clusters of computers.
A cluster is simply a network of computers.
• Each cluster might contain hundreds or even thousands
of machines. Within GFS clusters there are three kinds
of entities: clients, master servers and chunkservers.
CLIENT
• In the world of GFS, the term "client" refers to any
entity that makes a file request.
• Requests can range from retrieving and manipulating
existing files to creating new files on the system.
• Clients can be other computers or computer
applications. You can think of clients as the customers
of the GFS.
MASTER SERVERS
• The master server acts as the coordinator for the cluster.
• The master's duties include maintaining an operation log,
which keeps track of the activities of the master's cluster.
• The operation log helps keep service interruptions to a minimum
-- if the master server crashes, a replacement server that has
monitored the operation log can take its place.
• The master server also keeps track of metadata, which is the
information that describes chunks.
CHUNKSERVERS
• Chunkservers are the workhorses of the GFS.
• They're responsible for storing the 64-MB file chunks.
• The chunkservers don't send chunks to the master
server. Instead, they send requested chunks directly to
the client.
• The GFS copies every chunk multiple times and stores
it on different chunkservers. Each copy is called
a replica.
WHAT IS MAPREDUCE?
• MapReduce is a processing technique and a program model for
distributed computing based on java.
• The MapReduce algorithm contains two important tasks, namely
Map and Reduce. Map takes a set of data and converts it into
another set of data, where individual elements are broken down
into tuples (key/value pairs).
• Secondly, reduce task, which takes the output from a map as an
input and combines those data tuples into a smaller set of
tuples.
CONTINUE….
• The major advantage of MapReduce is that it is easy to
scale data processing over multiple computing nodes.
• Under the MapReduce model, the data processing
primitives are called mappers and reducers.
• Decomposing a data processing application
into mappers and reducers is sometimes nontrivial.
THANK YOU

More Related Content

Similar to Gfs and map redusing

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
Ozgun Erdogan
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Similar to Gfs and map redusing (20)

Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Gfs sosp2003
Gfs sosp2003Gfs sosp2003
Gfs sosp2003
 
Gfs
GfsGfs
Gfs
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptx
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Information system architecture
Information system architectureInformation system architecture
Information system architecture
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Gfs and map redusing

  • 1. MAULANA AZAD NATIONAL URDU UNIVERSITY Topic GFS Map Reduce
  • 2. GFS (GOOGLE FILE SYSTEM) • A scalable distributed file system for large distributed data intensive applications • Multiple GFS clusters are currently deployed. • The largest ones (in 2003) have: o 1000+ storage nodes o 300+ TeraBytes of disk storage heavily accessed by hundreds of clients on distinct machines
  • 3. THE DESIGN • Cluster consists of a single master and multiple chunkservers and is accessed by multiple clients • Google organized the GFS into clusters of computers. A cluster is simply a network of computers. • Each cluster might contain hundreds or even thousands of machines. Within GFS clusters there are three kinds of entities: clients, master servers and chunkservers.
  • 4. CLIENT • In the world of GFS, the term "client" refers to any entity that makes a file request. • Requests can range from retrieving and manipulating existing files to creating new files on the system. • Clients can be other computers or computer applications. You can think of clients as the customers of the GFS.
  • 5. MASTER SERVERS • The master server acts as the coordinator for the cluster. • The master's duties include maintaining an operation log, which keeps track of the activities of the master's cluster. • The operation log helps keep service interruptions to a minimum -- if the master server crashes, a replacement server that has monitored the operation log can take its place. • The master server also keeps track of metadata, which is the information that describes chunks.
  • 6. CHUNKSERVERS • Chunkservers are the workhorses of the GFS. • They're responsible for storing the 64-MB file chunks. • The chunkservers don't send chunks to the master server. Instead, they send requested chunks directly to the client. • The GFS copies every chunk multiple times and stores it on different chunkservers. Each copy is called a replica.
  • 7. WHAT IS MAPREDUCE? • MapReduce is a processing technique and a program model for distributed computing based on java. • The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). • Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.
  • 8. CONTINUE…. • The major advantage of MapReduce is that it is easy to scale data processing over multiple computing nodes. • Under the MapReduce model, the data processing primitives are called mappers and reducers. • Decomposing a data processing application into mappers and reducers is sometimes nontrivial.