A Survey: Hybrid Job-Driven Meta Data Scheduling for Data storage with Internet Approach

IDL - International Digital Library Of
Technology & Research
Available at: www.dbpublications.org
International e-Journal For Technology And Research-2018
IDL - International Digital Library 1 | P a g e Copyright@IDL-2017
A Survey: Hybrid Job-Driven Meta Data
Scheduling for Data storage with Internet
Approach
Ms. BHANUPRIYA S V 1
, Mrs. SHRUTHI G 2
Department of Computer Science and Engineering
1 M-Tech, Student, DBIT, Bengaluru, India
2 Guide and Professor, DBIT, Bengaluru, India
1. ABSTRACT
Cloud computing is a promising computing
model that enables convenient and on demand network
access to a shared pool of configurable computing
resources. The first offered cloud service is moving
data into the cloud: data owners let cloud service
providers host their data on cloud servers and data
consumers can access the data from the cloud servers.
This new paradigm of data storage service also
introduces new security challenges, because data
owners and data servers have different identities and
different business interests with map and reduce tasks
in different jobs. Therefore, an independent auditing
service is required to make sure that the data is
correctly hosted in the Cloud. The goal is to improve
data locality for both map tasks and reduce tasks,
avoid job starvation, and improve job execution
performance. Two variations are further introduced to
separately achieve a better map-data locality and a
faster task assignment. We conduct extensive
experiments to evaluate and compare the two
variations with current scheduling algorithms. The
results show that the two variations outperform the
other tested algorithms in terms of map-data locality,
reduce-data locality, and network overhead without
incurring significant overhead. In addition, the two
variations are separately suitable for different Map
Reduce workload scenarios and provide the best job
performance among all tested algorithms in cloud
computing data storage.
Keywords: Cloud Computing, Communication
System, IAAS, Scheduling Process, Auditing
Process.
2. INTRODUCTION
Cloud computing is a promising computing model
that enables convenient and on-demand network
access to a shared pool of computing resources. Cloud
computing offers a group of services, including
Software as a Service, Platform as a Service and
Infrastructure as a Service. Cloud storage is an
important service of cloud computing, which allows
data owners to move data from their local computing
systems to the Cloud. More and more data owners
start choosing to host their data in the Cloud. By
hosting their data in the Cloud, data owners can avoid
the initial investment of expensive infrastructure setup,
large equipments, and daily maintenance cost. The
data owners only need to pay the space they actually
use, e.g., cost-per-giga byte stored model. Another
reason is that data owners can rely on the Cloud to
provide more reliable services, so that they can access
data from anywhere and at any time. Individuals or
small-sized companies usually do not have the
resource to keep their servers as reliable as the Cloud

does. By hosting data in the Cloud, it introduces new
security challenges.
Firstly, user can be authorized to store data
in a cloud according to job scheduling with connected
internet.
Secondly, the data owners would worry their
data could be take max storage in cloud or data loss.
This is because data loss could happen in any
infrastructure, no matter what high degree of reliable
measures the cloud service providers would take.
Some recent data loss incidents are the Sidekick Cloud
Disaster in 2009 and the breakdown of Amazon’s
Elastic Compute Cloud (EC2) in 2010. Sometimes, the
cloud service providers may be dishonest and they
may discard the data which has not been accessed or
rarely accessed to save the storage space or keep fewer
replicas than promised. Moreover, the cloud service
providers may choose to hide data loss and claim that
the data are still correctly stored in the Cloud. As a
result, data owners need to be convinced that their data
are correctly stored in the Cloud. Checking on
retrieval is a common method for checking the data
integrity, which means data owners check the data
integrity when accessing their data. This method has
been used in peer-to-peer storage systems, network file
systems, long-term archives, web-service object stores
and database systems. However, checking on retrieval
is not sufficient to check the integrity for all the data
stored in the Cloud. There is usually a large amount of
data stored in the Cloud, but only a small percentage is
frequently accessed. There is no guarantee for the data
that are rarely accessed. An improved method was
proposed by generating some virtual retrieval to check
the integrity of rarely accessed data. But this causes
heavy I/O overhead on the cloud servers and high
communication cost due to the data retrieval
operations.
Therefore, it is desirable to have storage auditing
service to assure data owners that their data are
correctly stored in the Cloud. But data owners are not
willing to perform such auditing service due to the
heavy overhead and cost. In fact, it is not fair to let any
side of the cloud service providers or the data owners
conduct the auditing, because neither of them could be
guaranteed to provide unbiased and honest auditing
result. Data storage auditing is a very resource
demanding operation in terms of computational
resource, memory space, and communication cost.
3. SURVEYS
3.1 “Data storage auditing service in cloud
computing: challenges, methods and
opportunities”
In this survey author said that, cloud
computing is a promising computing model that
enables convenient and on-demand network access to
a shared pool of configurable computing resources.
The first offered cloud service is moving data into the
cloud: data owners let cloud service providers host
their data on cloud servers and data consumers can
access the data from the cloud servers. This new
paradigm of data storage service also author
introduces new security challenges, because data
owners and data servers have different identities and
different business interests. Therefore, an
independent auditing service is required to make sure
that the data is correctly hosted in the Cloud. In this
survey paper, they investigate this kind of problem
and give an extensive survey of storage auditing
methods in the literature. First, they give a set of
requirements of the auditing protocol for data storage
in cloud computing. Then, they introduce some
existing auditing schemes and analyze them in terms
of security and performance. Finally, some
challenging issues are introduced in the design of
efficient auditing protocol for data storage in cloud
computing.
3.2 “Efficient Public Integrity Checking for Cloud
Data Sharing with Multi-User Modification”
In past years a body of data integrity
checking techniques have been proposed for securing
cloud data services. Most of these survey assume that
only the data owner can modify cloud-stored data.
Recently a few attempts started considering more
realistic scenarios by allowing multiple cloud users to

modify data with integrity assurance. However, these
attempts are still far from practical due to the
tremendous computational cost on cloud users.
Moreover, collusion between misbehaving cloud
servers and revoked users is not considered. This
paper proposes a novel data integrity checking
scheme characterized by multi-user modification,
collusion resistance and a constant computational
cost of integrity checking for cloud users, this survey
novel design of polynomial-based authentication tags
and proxy tag update techniques. This survey scheme
also supports public checking and efficient user
revocation and is provably secure. Numerical
analysis and extensive experimental results show the
efficiency and scalability of our proposed scheme.
3.3 “Hybrid Job-Driven Meta Data Scheduling for
BigData with MapReduce Clusters and Internet
Approach”
It is cost-efficient for a tenant with a limited
budget to establish a virtual Map Reduce cluster by
renting multiple virtual private servers (VPSs) from a
VPS provider. To provide an appropriate scheduling
scheme for this type of computing environment, we
propose in this paper a hybrid job-driven scheduling
scheme (JoSS for short) from a tenant’s perspective.
JoSS provides not only job level scheduling, but also
map-task level scheduling and reduce-task level
scheduling. JoSS classifies Map Reduce jobs based
on job scale and job type and designs an appropriate
scheduling policy to schedule each class of jobs. The
goal is to improve data locality for both map tasks
and reduce tasks, avoid job starvation, and improve
job execution performance. Two variations of JoSS
are further introduced to separately achieve a better
map-data locality and a faster task assignment. We
conduct extensive experiments to evaluate and
compare the two variations with current scheduling
algorithms supported by Hadoop. The survey show
that the two variations outperform the other tested
algorithms in terms of map-data locality, reduce data
locality, and network overhead without incurring
significant overhead. In addition, the two variations
are separately suitable for different Map Reduce
workload scenarios and provide the best job
performance among all tested algorithms.
3.4 “Hybrid Job-Driven Scheduling for Virtual
MapReduce Clusters”
It is cost-efficient for a tenant with a limited
budget to establish a virtual MapReduce cluster by
renting multiple virtual private servers (VPSs) from a
VPS provider. To provide an appropriate scheduling
scheme for this type of computing environment, we
propose in this paper a hybrid job-driven scheduling
scheme (JoSS for short) from a tenant's perspective.
JoSS provides not only job-level scheduling, but also
map-task level scheduling and reduce-task level
scheduling. JoSS classifies MapReduce jobs based on
job scale and job type and designs an appropriate
scheduling policy to schedule each class of jobs. This
survey goal is to improve data locality for both map
tasks and reduce tasks, avoid job starvation, and
improve job execution performance. Two variations
of JoSS are further introduced to separately achieve a
better map-data locality and a faster task assignment.
We conduct extensive experiments to evaluate and
compare the two variations with current scheduling
algorithms supported by Hadoop. The results show
that the two variations outperform the other tested
algorithms in terms of map-data locality, reduce-data
locality, and network overhead without incurring
significant overhead. In addition, the two variations
are separately suitable for different MapReduce-
workload scenarios and provide the best job
performance among all tested algorithms.
3.5 “A new approach to internet host mobility.
ACM Computer Comminication”
This paper describes a new approach to
Internet host mobility. We argue that by separating
local and wide area mobility, the performance of
existing mobile host protocols (e.g. Mobile IP) can be
significantly improved. We propose Cellular IP, a
new lightweight and robust protocol that is optimized
to support local mobility but efficiently interworks
with Mobile IP to provide wide area mobility
support. Cellular IP shows great benefit in

comparison to existing host mobility proposals for
environments where mobile hosts migrate frequently,
which we argue, will be the rule rather than the
exception as Internet wireless access becomes
ubiquitous. Cellular IP maintains distributed cache
for location management and routing purposes.
Distributed paging cache coarsely maintains the
position of ‘idle’ mobile hosts in a service area.
Cellular IP uses this paging cache to quickly and
efficiently pinpoint ‘idle’ mobile hosts that wish to
engage in ‘active’ communications. This approach is
beneficial because it can accommodate a large
number of users attached to the network without
overloading the location management system.
Distributed routing cache maintains the position of
active mobile hosts in the service area and
dynamically refreshes the routing state in response to
the handoff of active mobile hosts. These distributed
location management and routing algorithms lend
themselves to a simple and low cost implementation
of Internet host mobility requiring no new packet
formats, encapsulations or address space allocation
beyond what is present in IP.
4. CONCLUSION
In this paper, we have plan to investigate the
auditing problem for data storage in cloud computing
and proposed a set of requirements of designing the
third Party Auditing protocols. Here we applying job
scheduling process with two phase level, its help to
comparison all type of processing issue from storage
system with respect data and time. Finally, we have
plan to introduce some challenging issues in the
design of efficient auditing protocols for data storage
in cloud computing.
5.REFERENCES
[1] H Luo, R. Ramjee, P. Sinha, L. Li, and S. Lu.
Ucan: A unified cell and cloud computing
architecture. In Proc. of MOBICOM, 2003.
[2] P. K. McKinley, H. Xu, A. H. Esfahanian, and L.
M. Ni. Unicastbased cloud storage communication in
wormhole-routed direct networks. TPDS, 1992.
[3] H. Wu, C. Qiao, S. De, and O. Tonguz. Integrated
cell and ad hoc relaying systems: iCAR. J-SAC, 2001.
[4] Y. H. Tam, H. S. Hassanein, S. G. Akl, and R.
Benkoczi. Optimal multi-hop cellular architecture for
wireless communications. In Proc. of LCN, 2006.
[5] Y. D. Lin and Y. C. Hsu. Multi-hop cellular: A
new architecture for wireless communications. In
Proc. of INFOCOM, 2000.
[6] P. T. Oliver, Dousse, and M. Hasler. Connectivity
in ad hoc and hybrid networks. In Proc. of INFOCOM,
2002.
[7] E. P. Charles and P. Bhagwat. Highly dynamic
destination sequenced distance vector routing (DSDV)
for mobile computers.In Proc. of SIGCOMM, 1994.
[8] C. Perkins, E. Belding-Royer, and S. Das. RFC
3561: Ad hoc on demand distance vector (AODV)
routing. Technical report, Internet Engineering Task
Force, 2003.
[9] D. B. Johnson and D. A. Maltz. Dynamic source
routing in adhoc wireless networks. IEEE Mobile
Computing, 1996.
[10] V. D. Park and M. Scott Corson. A highly
adaptive distributed routing algorithm for mobile
wireless networks. In Proc. of INFOCOM, 1997.
[11] R. S. Chang, W. Y. Chen, and Y. F. Wen. Hybrid
wireless network protocols. IEEE Transaction on
Vehicular Technology, 2003.
[12] G. N. Aggelou and R. Tafazolli. On the relaying
capacity of nextgeneration gsm cellular networks.
IEEE Personal Communications Magazine, 2001.

A Survey: Hybrid Job-Driven Meta Data Scheduling for Data storage with Internet Approach

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to A Survey: Hybrid Job-Driven Meta Data Scheduling for Data storage with Internet Approach

Similar to A Survey: Hybrid Job-Driven Meta Data Scheduling for Data storage with Internet Approach (20)

Recently uploaded

Recently uploaded (20)

A Survey: Hybrid Job-Driven Meta Data Scheduling for Data storage with Internet Approach