Learning objectives
• Understand how to handle massive amount of data using data grid.
• Explains data replication and namespaces
• Identify the various data access model.
Roadmap to Membership of RICS - Pathways and Routes
Data Intensive Grid Service Model
1. CS6703- GRID AND CLOUD
COMPUTING
2.6. Data Intensive Grid Service Models
By
M.Gomathy Nayagam, AP(SG)/CSE
Ramco Institute of Techonolgy, Rajapalayam
2. DATA INTENSIVE GRID SERVICE MODEL
Grid applications are grouped together as:
Computation intensive
Data intensive.
Data intensive- application have to deal with
massive amount of data.
Example: Large Hadron Collider data set exceeds
several peta bytes by every year.
Data intensive grid system designed to discover,
transfer, and manipulate these massive data sets.
Transferring massive data is time consuming one
Let us discuss some mechanism for solving data
movement problems.
3. DATA REPLICATION AND UNIFIED NAMESPACE.
Data access method is known as caching.
Caching –enhance the data efficiency.
Replication – stores the same data block and scatter
them in multiple regions of grid.
Hence user can access the same data with locality of
reference.
Key data will not lose incase of failure.
But, it increases storage requirements and network
bandwidth.
Replication strategies determine when and where to
create a replica of the data.
The factors to consider include data demand, network
conditions, and transfer cost.
4. DATA REPLICATION AND UNIFIED NAMESPACE.
Two types of replication strategies are:
Static replication
The locations and number of replicas are determined in
advance and will not be modified.
It cannot be suitable to adapt for changes in demand,
bandwidth, and storage availability
Dynamic replication - adjust locations and number of
data replicas according to changes in conditions
Frequent data-moving operations can result in
much more overhead than in static strategies.
The replication strategy must be optimized with
respect to the status of data replicas.
5. GRID DATA ACCESS MODEL
Multiple participants may want to share the same data
collection.
To retrieve any piece of data, a grid with a unique global
namespace is needed.
Similarly, need to have unique file names.
So, we need to resolve inconsistencies among multiple data
objects bearing the same name.
Data needs to be protected to avoid leakage and damage.
Users who want to access data have to be authenticated first
and then authorized for access.
There are 4 data access models:
Monadic Model
Hierarchical Model
Federation Model
Hybrid Model
6. GRID DATA ACCESS MODEL
Monadic model:
Centralized data repository
model
data is saved in a central data
repository
Users have to submit requests
directly to the central
repository for accessing data.
No data is replicated for
preserving data locality.
It is a simple model.
Data replication is permitted in
this model only when fault
tolerance is demanded.
7. GRID DATA ACCESS MODEL
Hierarchical model:
It is suitable for building a large
data grid
The data may be transferred
from the source to a second-
level center.
Then some data in the regional
center is transferred to the
third-level center.
After being forwarded several
times, specific data objects are
accessed directly by users.
8. GRID DATA ACCESS MODEL
Federation Model:
It is suited for designing a data
grid with multiple sources of data
supplies.
This model is also known as a
mesh model.
The data sources are distributed
to many different locations.
Although the data is shared, the
data items are still owned and
controlled by their original owners.
According to predefined access
policies, only authenticated users
are authorized to request data
from any data source.
9. GRID DATA ACCESS MODEL
Hybrid Model
The model combines the best
features of the hierarchical
and mesh models.
Traditional data transfer
technology, such as FTP,
applies for networks with
lower bandwidth.
10. PARALLEL VERSUS STRIPED DATA
TRANSFERS
Parallel data transfer opens multiple data streams
for passing subdivided segments of a file
simultaneously.
Although the speed of each stream is the same as
in sequential streaming, the total time to move data
in all streams can be significantly reduced
compared to FTP transfer.
In striped data transfer, a data object is partitioned
into a number of sections, and each section is
placed in an individual site in a data grid.