SlideShare a Scribd company logo
1 of 15
“
”
Google FileSystem
Presentation by
MBA 713 Group
Tennyson Sigauke M223098
Beauty Charamba M211405
Roselyn Moyana M223473
Sharon Zinyorewa M222266
Chipo Jekapu M222253
Ramadan Adadi M215900
Google File System (GFS)
•Google File System (GFS) is a distributed file system
developed by Google to
•Store, manage, and process large amounts of data
across a massive infrastructure.
•It is designed to be highly scalable, fault-tolerant, and
optimized for handling big data workloads. (Ghemawat
et al, 2003)
Distributed File System (DFS)
• A Distributed File System (DFS) is a system that enables files and directories
to be accessed and shared across multiple computers or nodes in a network.
It provides a unified and transparent view of distributed storage resources
by abstracting the underlying physical locations and complexities.
• A DFS typically offers features such as file replication, fault tolerance,
scalability, and distributed access control. (Tanenbaum,Van Steen 2006).
GFS was developed by Google to (Ghemawat, et al. 2008) :
• store, manage, and process large amounts of data across a massive infrastructure.
• It is designed to be highly scalable, fault-tolerant, and optimized for handling big
data workloads.
• GFS uses a chunk-based architecture, where files are divided into fixed-size chunks
and replicated across multiple servers for data redundancy.
• A master server maintains metadata about the file system and coordinates
operations across the distributed servers. GFS prioritizes high throughput for
streaming reads and writes, and it aims to minimize network overhead by placing
computation near the data. It provides a simple file system interface for
applications
• GFS is built on top of commodity hardware, such as inexpensive servers and
disk arrays, and is designed to run on a cluster of servers that can scale
horizontally as the amount of data being stored grows.
• The system is designed to provide a single global namespace for all data
stored in the system, allowing applications to access and manipulate large
amounts of data in a consistent and reliable manner.
• GFS also uses a technique called "data replication" to ensure data is stored
redundantly across multiple servers, which helps protect against data loss in
the event of hardware failure or other types of system failures.
• Overall, GFS has been highly successful in scaling to support the massive
amounts of data that Google deals with on a daily basis, and has served as a
key inspiration for other distributed file systems like Hadoop Distributed File
System (HDFS) and Amazon's Simple Storage Service (S3)
Main components in the GFS architecture
The File system is divided into three main components:
• Master server,
• Chunk servers and
• Client library.
The master server is the central part of the file system. It handles file metadata and
chunk servers control operations in the filesystem. (Ghemawat, S., & Gobioff, H.
(2006).
Features of GFS
 Namespace management and locking.
 Fault tolerance.
 Reduced client and master interaction because of large chunk server size.
 High availability.
 Critical data replication.
 Automatic and efficient data recovery.
 High aggregate throughput.
Use of Google File System (GFS) by Google:
• 1. Google Search: GFS is a critical component of Google's search infrastructure. It stores and
manages the vast index of web pages and documents that Google's search engine uses to
provide relevant search results to users.
• 2. Google Maps: GFS is used to store and serve the massive amount of geographical data that
powers Google Maps.This includes map tiles, satellite imagery, street view images, and other
location-related data.
• 3. YouTube: GFS plays a crucial role in storing and delivering the enormous amount of video
content onYouTube. It allows for efficient storage, replication, and distribution of video files to
ensure smooth playback for millions of users worldwide.
• 4. Gmail: GFS is utilized for storing and managing the immense volume of user data in Gmail,
Google's popular email service. It ensures reliable and efficient storage of emails, attachments,
and other user-related data.
• 5. Google Cloud Platform: GFS serves as the underlying storage system for various services and
products offered by Google Cloud Platform (GCP). It provides scalable and resilient storage for
applications, databases, analytics, and other data-intensive workloads on the cloud platform.
These are just a few examples of how GFS is used within Google's ecosystem. It demonstrates the
system's ability to handle large-scale data storage, replication, and retrieval requirements for a
variety of applications and services
Advantages of Google File System (GFS)
• 1. Scalability: GFS has been designed from the ground up to handle large amounts of data, making it
incredibly scalable. It can easily scale up or down to meet the changing needs of an organization.
• 2. FaultTolerance: GFS is designed to be highly fault-tolerant. It uses data replication and automatic data
recovery to ensure that data is always available, even in the event of hardware failures.
• 3. Consistency: GFS supports consistent read and write operations across distributed servers. It also has
built-in support for data consistency, which helps to prevent data loss or corruption.
• 4. Manageability:GFS provides a single global namespace for all data stored in the system, making it easy
to manage and access data across geographically dispersed locations.
• 5. Performance: GFS is optimized for high-performance data access. It uses a technique called “Data
Chunking” to allow for faster data retrieval and also provides built-in support for data snapshotting.
• 6. Low cost: GFS uses commodity hardware and is open-source, making it a low-cost alternative to other
enterprise-level file systems.
Overall, GFS is an incredibly effective and scalable file system that provides many benefits over other
traditional file systems. It is highly fault-tolerant, consistent, manageable, and cost-effective, making it a
popular choice for large-scale organizations.
Disadvantages of GFS
1.Not the best fit for small files.
2.Master may act as a bottleneck.
3.unable to type at random.
4.Suitable for procedures or data that are written once and only
read (appended) later.
Key terms
WASHINGTON
STATE
UNIVERSITY
CHUNK
12
❖ Files are divided into fixed size blocks called chunk
❖ 64 MB; greater than typical file system block size
❖ Each chunk is replicated 3 or more times
❖ Each chunk is identified by 64-bit chunk handle
META DATA
13
❖ Metadata is data of the stored data i.e Picture data has background
data of location where picture was taken, date, time, event etc.
❖ Three major types of metadata
➢ The file and chunk namespaces
➢ The mapping from files to chunks
➢ Locations of each chunk’s replicas
❖ All the metadata is kept in the Master’s memory
❖ 64MB chunk has 64 bytes of metadata
❖ Chunk Location are updated on every restart & heartbeat message
❖ Operation log contains a historical record of critical metadata changes.
• Google no longer uses GFS. The company moved its search to a new software
foundation based on a revamped file system known as Colossus, and Urs Hölzle
• Colossus now underpins virtually all of Google's web services, from Gmail,
Google Docs, and YouTube to the Google Cloud Storage service the company
offers to third-party developers.
• Whereas GFS was built for batch operations -- i.e., operations that happen in
the background before they're actually applied to a live website -- Colossus is
specifically built for "realtime" services.
References
• Ghemawat, S., Gobioff, H., & Leung, S.T. (2003).The Google File System.ACM SIGOPSOperating
Systems Review
• Ghemawat, S., & Gobioff, H. (2005).The Google File System: Evolutionary Iteration vs. Clean-Slate
Design:A Case Study. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on
Computer Systems (EuroSys '06), 1-10.
• Ghemawat, S., & Gobioff, H. (2006). Understanding the Performance of a Large-Scale Distributed File
System. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer
Systems (EuroSys '07), 1-8.
• Ghemawat, S., Gobioff, H., & Leung, S.T. (2018).The Google File System.ACMTransactions on
Storage (TOS), 12(4), 1-37.
• Tanenbaum, A. S., &Van Steen, M. (2006). Distributed Systems: Principles and Paradigms. Pearson
Education.

More Related Content

Similar to GFS Presentation

Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS Dr Neelesh Jain
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingRutuja751147
 
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesAn-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesManikandan Selvaganesh
 
Google File System
Google File SystemGoogle File System
Google File Systemvivatechijri
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems ReviewSchubert Zhang
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptxShimoFcis
 
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSPARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSRaheemUnnisa1
 
The Enterprise File Fabric for Google Cloud Platform
The Enterprise File Fabric for Google Cloud PlatformThe Enterprise File Fabric for Google Cloud Platform
The Enterprise File Fabric for Google Cloud PlatformHybrid Cloud
 
Data Analytics: HDFS with Big Data : Issues and Application
Data Analytics:  HDFS  with  Big Data :  Issues and ApplicationData Analytics:  HDFS  with  Big Data :  Issues and Application
Data Analytics: HDFS with Big Data : Issues and ApplicationDr. Chitra Dhawale
 
Integrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI TargetIntegrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI Targetijsrd.com
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformancesprdd
 

Similar to GFS Presentation (20)

Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud Computing
 
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesAn-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
 
Google File System
Google File SystemGoogle File System
Google File System
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptx
 
GPFS Solution Brief
GPFS Solution BriefGPFS Solution Brief
GPFS Solution Brief
 
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSPARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
 
The Enterprise File Fabric for Google Cloud Platform
The Enterprise File Fabric for Google Cloud PlatformThe Enterprise File Fabric for Google Cloud Platform
The Enterprise File Fabric for Google Cloud Platform
 
Data Analytics: HDFS with Big Data : Issues and Application
Data Analytics:  HDFS  with  Big Data :  Issues and ApplicationData Analytics:  HDFS  with  Big Data :  Issues and Application
Data Analytics: HDFS with Big Data : Issues and Application
 
Integrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI TargetIntegrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI Target
 
191
191191
191
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

GFS Presentation

  • 1. “ ” Google FileSystem Presentation by MBA 713 Group Tennyson Sigauke M223098 Beauty Charamba M211405 Roselyn Moyana M223473 Sharon Zinyorewa M222266 Chipo Jekapu M222253 Ramadan Adadi M215900
  • 2. Google File System (GFS) •Google File System (GFS) is a distributed file system developed by Google to •Store, manage, and process large amounts of data across a massive infrastructure. •It is designed to be highly scalable, fault-tolerant, and optimized for handling big data workloads. (Ghemawat et al, 2003)
  • 3. Distributed File System (DFS) • A Distributed File System (DFS) is a system that enables files and directories to be accessed and shared across multiple computers or nodes in a network. It provides a unified and transparent view of distributed storage resources by abstracting the underlying physical locations and complexities. • A DFS typically offers features such as file replication, fault tolerance, scalability, and distributed access control. (Tanenbaum,Van Steen 2006).
  • 4. GFS was developed by Google to (Ghemawat, et al. 2008) : • store, manage, and process large amounts of data across a massive infrastructure. • It is designed to be highly scalable, fault-tolerant, and optimized for handling big data workloads. • GFS uses a chunk-based architecture, where files are divided into fixed-size chunks and replicated across multiple servers for data redundancy. • A master server maintains metadata about the file system and coordinates operations across the distributed servers. GFS prioritizes high throughput for streaming reads and writes, and it aims to minimize network overhead by placing computation near the data. It provides a simple file system interface for applications
  • 5. • GFS is built on top of commodity hardware, such as inexpensive servers and disk arrays, and is designed to run on a cluster of servers that can scale horizontally as the amount of data being stored grows. • The system is designed to provide a single global namespace for all data stored in the system, allowing applications to access and manipulate large amounts of data in a consistent and reliable manner. • GFS also uses a technique called "data replication" to ensure data is stored redundantly across multiple servers, which helps protect against data loss in the event of hardware failure or other types of system failures. • Overall, GFS has been highly successful in scaling to support the massive amounts of data that Google deals with on a daily basis, and has served as a key inspiration for other distributed file systems like Hadoop Distributed File System (HDFS) and Amazon's Simple Storage Service (S3)
  • 6. Main components in the GFS architecture The File system is divided into three main components: • Master server, • Chunk servers and • Client library. The master server is the central part of the file system. It handles file metadata and chunk servers control operations in the filesystem. (Ghemawat, S., & Gobioff, H. (2006).
  • 7. Features of GFS  Namespace management and locking.  Fault tolerance.  Reduced client and master interaction because of large chunk server size.  High availability.  Critical data replication.  Automatic and efficient data recovery.  High aggregate throughput.
  • 8. Use of Google File System (GFS) by Google: • 1. Google Search: GFS is a critical component of Google's search infrastructure. It stores and manages the vast index of web pages and documents that Google's search engine uses to provide relevant search results to users. • 2. Google Maps: GFS is used to store and serve the massive amount of geographical data that powers Google Maps.This includes map tiles, satellite imagery, street view images, and other location-related data. • 3. YouTube: GFS plays a crucial role in storing and delivering the enormous amount of video content onYouTube. It allows for efficient storage, replication, and distribution of video files to ensure smooth playback for millions of users worldwide. • 4. Gmail: GFS is utilized for storing and managing the immense volume of user data in Gmail, Google's popular email service. It ensures reliable and efficient storage of emails, attachments, and other user-related data. • 5. Google Cloud Platform: GFS serves as the underlying storage system for various services and products offered by Google Cloud Platform (GCP). It provides scalable and resilient storage for applications, databases, analytics, and other data-intensive workloads on the cloud platform. These are just a few examples of how GFS is used within Google's ecosystem. It demonstrates the system's ability to handle large-scale data storage, replication, and retrieval requirements for a variety of applications and services
  • 9. Advantages of Google File System (GFS) • 1. Scalability: GFS has been designed from the ground up to handle large amounts of data, making it incredibly scalable. It can easily scale up or down to meet the changing needs of an organization. • 2. FaultTolerance: GFS is designed to be highly fault-tolerant. It uses data replication and automatic data recovery to ensure that data is always available, even in the event of hardware failures. • 3. Consistency: GFS supports consistent read and write operations across distributed servers. It also has built-in support for data consistency, which helps to prevent data loss or corruption. • 4. Manageability:GFS provides a single global namespace for all data stored in the system, making it easy to manage and access data across geographically dispersed locations. • 5. Performance: GFS is optimized for high-performance data access. It uses a technique called “Data Chunking” to allow for faster data retrieval and also provides built-in support for data snapshotting. • 6. Low cost: GFS uses commodity hardware and is open-source, making it a low-cost alternative to other enterprise-level file systems. Overall, GFS is an incredibly effective and scalable file system that provides many benefits over other traditional file systems. It is highly fault-tolerant, consistent, manageable, and cost-effective, making it a popular choice for large-scale organizations.
  • 10. Disadvantages of GFS 1.Not the best fit for small files. 2.Master may act as a bottleneck. 3.unable to type at random. 4.Suitable for procedures or data that are written once and only read (appended) later.
  • 12. WASHINGTON STATE UNIVERSITY CHUNK 12 ❖ Files are divided into fixed size blocks called chunk ❖ 64 MB; greater than typical file system block size ❖ Each chunk is replicated 3 or more times ❖ Each chunk is identified by 64-bit chunk handle
  • 13. META DATA 13 ❖ Metadata is data of the stored data i.e Picture data has background data of location where picture was taken, date, time, event etc. ❖ Three major types of metadata ➢ The file and chunk namespaces ➢ The mapping from files to chunks ➢ Locations of each chunk’s replicas ❖ All the metadata is kept in the Master’s memory ❖ 64MB chunk has 64 bytes of metadata ❖ Chunk Location are updated on every restart & heartbeat message ❖ Operation log contains a historical record of critical metadata changes.
  • 14. • Google no longer uses GFS. The company moved its search to a new software foundation based on a revamped file system known as Colossus, and Urs Hölzle • Colossus now underpins virtually all of Google's web services, from Gmail, Google Docs, and YouTube to the Google Cloud Storage service the company offers to third-party developers. • Whereas GFS was built for batch operations -- i.e., operations that happen in the background before they're actually applied to a live website -- Colossus is specifically built for "realtime" services.
  • 15. References • Ghemawat, S., Gobioff, H., & Leung, S.T. (2003).The Google File System.ACM SIGOPSOperating Systems Review • Ghemawat, S., & Gobioff, H. (2005).The Google File System: Evolutionary Iteration vs. Clean-Slate Design:A Case Study. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys '06), 1-10. • Ghemawat, S., & Gobioff, H. (2006). Understanding the Performance of a Large-Scale Distributed File System. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys '07), 1-8. • Ghemawat, S., Gobioff, H., & Leung, S.T. (2018).The Google File System.ACMTransactions on Storage (TOS), 12(4), 1-37. • Tanenbaum, A. S., &Van Steen, M. (2006). Distributed Systems: Principles and Paradigms. Pearson Education.