2. GFS architecture
A
GSF
cluster
consists
of
a
single
master
and
mul;ple
chunkservers
and
is
accessed
by
mul;ple
clients
2
Ghemawat(2003),
The
Google
File
System
• Master
maintains
all
file
system
metadata
which
includes
loca;ons
of
chunks
• Chunkservers
store
fix-‐sized
chunks
into
which
files
are
divided
master
metadata
chunk
chunk
chunk
clients
3. Different points in the design space
A. Treatment
of
component
failures
as
the
norm
B. Op;miza;on
for
huge
files
C. Benefit
from
co-‐designing
the
applica;ons
and
the
file
system
API
3
Ghemawat(2003),
The
Google
File
System
4. A. Why not treat component failures as the excep>on?
Among
hundreds
of
servers
in
a
GFS
cluster,
some
are
bound
to
be
unavailable
at
any
given
;me
Quality
• The
system
is
build
from
many
inexpensive
commodity
Quan;ty
• hundreds
of
servers
in
a
GFS
cluster
4
Ghemawat(2003),
The
Google
File
System
5. A. Fault tolerance
The
GFS
provides
fault
tolerance
by..
a. Constant
monitoring
b. Replica;ng
crucial
data
c. Fast
and
automa;c
recovery
a. Exchange
Heatbeat
message
b.
chunk
replica;on
5
Ghemawat(2003),
The
Google
File
System
master
clients
termina;on
normal
abnormal
Cause
restore
killing
process
excep;on
in
seconds
in
seconds
c.
Not
dis;nguish
between
normal
and
abnormal
termina;on
6. B. op>miza>on
A
Chunk
size
64MB,
which
is
much
larger
then
typical
file
system
block
sizes,
offers
3
advantages
6
Ghemawat(2003),
The
Google
File
System
• Keep
the
metadata
in
memory
• A
client
is
more
likely
to
perform
many
opera;ons
on
a
given
chunk
• Applica;ons
mostly
read
and
write
large
files
sequen;ally
• Reduce
client-‐master
interac;on
• Reduce
network
overhead
• Reduce
the
size
of
the
metadata
stored
on
the
master
7. C. Benefit from co-‐designing
7
Ghemawat(2003),
The
Google
File
System
Applica;on
GFS
Record
append
opera;on
Most
files
are
mutated
by
appending
new
data
rather
than
overwri;ng
exis;ng
data
mechanism
Record
append
allows
mul;ple
clients
to
append
data
to
the
same
file
concurrently
API
Layer
Descrip;on
Mul;ple
clients
can
append
concurrently
to
a
file
without
extra
synchroniza;on
between
them