Gfs and map redusing

MAULANA AZAD NATIONAL URDU UNIVERSITY
Topic
GFS
Map Reduce

GFS (GOOGLE FILE SYSTEM)
• A scalable distributed file system for large
distributed data intensive applications
• Multiple GFS clusters are currently deployed.
• The largest ones (in 2003) have:
o 1000+ storage nodes
o 300+ TeraBytes of disk storage heavily accessed
by hundreds of clients on distinct machines

THE DESIGN
• Cluster consists of a single master and multiple
chunkservers and is accessed by multiple clients
• Google organized the GFS into clusters of computers.
A cluster is simply a network of computers.
• Each cluster might contain hundreds or even thousands
of machines. Within GFS clusters there are three kinds
of entities: clients, master servers and chunkservers.

CLIENT
• In the world of GFS, the term "client" refers to any
entity that makes a file request.
• Requests can range from retrieving and manipulating
existing files to creating new files on the system.
• Clients can be other computers or computer
applications. You can think of clients as the customers
of the GFS.

MASTER SERVERS
• The master server acts as the coordinator for the cluster.
• The master's duties include maintaining an operation log,
which keeps track of the activities of the master's cluster.
• The operation log helps keep service interruptions to a minimum
-- if the master server crashes, a replacement server that has
monitored the operation log can take its place.
• The master server also keeps track of metadata, which is the
information that describes chunks.

CHUNKSERVERS
• Chunkservers are the workhorses of the GFS.
• They're responsible for storing the 64-MB file chunks.
• The chunkservers don't send chunks to the master
server. Instead, they send requested chunks directly to
the client.
• The GFS copies every chunk multiple times and stores
it on different chunkservers. Each copy is called
a replica.

WHAT IS MAPREDUCE?
• MapReduce is a processing technique and a program model for
distributed computing based on java.
• The MapReduce algorithm contains two important tasks, namely
Map and Reduce. Map takes a set of data and converts it into
another set of data, where individual elements are broken down
into tuples (key/value pairs).
• Secondly, reduce task, which takes the output from a map as an
input and combines those data tuples into a smaller set of
tuples.

CONTINUE….
• The major advantage of MapReduce is that it is easy to
scale data processing over multiple computing nodes.
• Under the MapReduce model, the data processing
primitives are called mappers and reducers.
• Decomposing a data processing application
into mappers and reducers is sometimes nontrivial.

Gfs and map redusing

Recommended

Recommended

More Related Content

Similar to Gfs and map redusing

Similar to Gfs and map redusing (20)

Recently uploaded

Recently uploaded (20)

Gfs and map redusing