Cloud File Systems:Google
File System (GFS) and
Hadoop Distributed File
System (HDFS).
2.
Introduction to CloudFile Systems
Old Ways Couldn't Keep Up
Traditional file systems struggled to handle the massive amounts of data
we started creating, especially when it was spread across many
computers.
Using Regular Computers
GFS and HDFS changed the game. They were designed to use thousands
of normal, inexpensive computers instead of a few costly, special ones.
Made to Grow Big
These systems are built to easily grow, handle problems if parts break,
and process huge amounts of data smoothly across many connected
computers.
3.
Google File System(GFS):
01
Main Controller
One main computer (the Master)
keeps track of where all the files are,
who can use them, and which parts
(chunks) are stored where.
02
Storage Servers Hold Data
Many other computers
(Chunkservers) store the actual file
parts, called 'chunks', which are
typically 64MB each. These are spread
out across many cheap computers to
keep your data safe.
03
Users Get Data Directly
When your computer (the Client) wants a file, it first asks the Master where it
is. Then, it goes straight to the Chunkservers to get the file parts quickly.
This system was built for Google to handle huge
amounts of data, especially when it's constantly
being used and updated.
4.
Hadoop Distributed FileSystem (HDFS):
Based on GFS Ideas
HDFS uses ideas from Google's file
system, but it's made for handling and
analyzing huge amounts of data in the
larger Hadoop system.
Main Parts: NameNode and
DataNodes
The NameNode keeps track of where
files are stored. The DataNodes actually
hold the pieces of data (called blocks,
usually 128MB each) across many
regular computers.
Designed for One-Time Writing
You write data to HDFS once, and then
you can read it many times. This is
perfect for analyzing big sets of data,
like running reports or processing logs.
5.
Comparison: GFS vs.HDFS
Feature GFS HDFS
Data Block Size 64 MB 128 MB (standard)
Number of Copies You can choose how many Usually 3 copies
How Data is Added Added once, can add more to the end Added once, only new data can be
added to the end
What it Runs On Works on Linux systems Works on many systems (made with
Java)
Main Purpose Fast-moving data, live analysis Analyzing large amounts of data all at
once
6.
Applications.
GFS Application
GFS helpsGoogle Search find information on billions of web pages. It also stores YouTube's
huge video library and handles live information for Gmail and Google Maps.
HDFS Applications
HDFS is the main system for analyzing huge amounts of data with Hadoop. It's used by
companies like Yahoo and Facebook, and many others globally, to manage vast amounts of
internet usage records and data for AI programs.
7.
Conclusion.
Their Unique Strengths
GFSand HDFS helped create modern
ways to store huge amounts of data
across many computers. Each has
special benefits for different big data
tasks.
How to Pick the Best One
GFS is great for tasks that need data
right away, like live streaming. HDFS
is best for sorting and analyzing
large batches of data, often used in
big data warehouses.
Building for the Future
Knowing their different designs
helps engineers build strong
systems that can grow easily and
keep working even if parts fail. This
moves cloud computing and big data
forward.