DATA PROVENENCE IN PUBLIC CLOUD

IJSRD - International Journal for Scientific Research & Development| Vol. 2, Issue 09, 2014 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 212
Data Provenence in Public Cloud
Govinda K1
Md Fuzail C2
1,2
Department of Computer Science
1,2
SCSE, VIT University, Vellore, India
Abstract— Cloud computing is the internet based computing
it is also known as “pay per use model”; we can pay only for
the resources that are in the use. The key barrier to
widespread uptake of cloud computing is the lack of trust in
clouds by potential customers. While preventive controls for
security and privacy measures are actively being researched,
there is still little focus on detective controls related to cloud
accountability and auditability. The complexity resulting
from the sheer amount of virtualization and data distribution
carried out in current clouds has also revealed an urgent
need for research in cloud accountability, as has the shift in
focus of customer concerns from server health and
utilization to the integrity and safety of end-users‟ data. In
this paper we purpose the method to store data provenance
using Amazon S3 and simple DB.
Key words: Provenance, Data, Pass, Wal Queue, S3, Simple
Db, Cloud
I. INTRODUCTION
Cloud computing is an increasingly popular approach for the
processing of large data sets and computationally expensive
programs. This includes scenarios that have clear
requirements for maintaining the provenance of data,
including eScience and healthcare where assurance in the
quality and repeatability of results is essential. In addition,
clouds have their own application for provenance: the
identification of the origins of faults and security violations.
However, cloud systems are structured in a fundamentally
different way from other distributed systems, such as grids,
and therefore present new problems for the collection of
provenance data. In this paper we describe the challenges in
collecting provenance for cloud computing and identify
where additional work is needed. We believe that there is an
opportunity for creating new, practical provenance systems
for clouds to support scientific computing as well as
satisfying current requirements for better debugging,
auditing, forensics, billing and security. These systems must
reflect the unique nature of clouds and should take
advantage of existing research from grid computing.
Provenance, on the other hand, is better defined. It generally
refers to information that „helps determine the derivation
history of a data product, starting from its original source.
This information is clearly valuable in data-intensive
computing scenarios, such as scientific computing, to
provide assurance in quality of results and ensure the
repeatability of experiments. We observe that the problem
(if not the concept) of provenance should also be familiar to
anyone involved in debugging IT systems. System
administrators must identify where an error originated, what
caused it and the effects it had. This is particularly true of
security violations, and provenance records are closely
related to data forensics. These tasks are usually supported
through logging and auditing. This is particularly difficult in
complex systems with multiple layers of interacting
software and hardware such as a cloud. Clouds are dynamic
and heterogeneous by definition [4], and involve several
components provided by different vendors which must
interoperate. Tracing the origins of faults on cloud
infrastructures involves the collection of evidence and data
from diverse sources with difficult to determine causes and
effects.
Cloud computing therefore is a good example of a
situation where the introduction of better provenance data
could provide immediate benefits for system administrators
as well as users. As an aside, we note that logs and
provenance data are distinctly different. Logs provide a
sequential history of pre-defined actions usually relating to a
particular application. Provenance data refers to the history
of the origins of a particular data object, with perhaps
greater requirements for assurance and semantics.
Provenance goes beyond an individual application and may
refer to any pieces of equipment as well as people.
Throughout this paper we refer to logs as being a source of
provenance, primarily because in cloud systems, when
present, they are used in combination for a similar purpose.
II. LITERATURE REVIEW
According to The need for additional provenance
information in cloud computing storage has been well
established by Muniswamy Reddy et al. [11, 12]. For adding
data provenance to cloud storage systems, authors have
discussed the requirements above and several alternative
implementations have analysed by them. This is in contrast
to our work, which considers the entire cloud infrastructure
and identifies that existing (but inadequate) tools are already
being used for provenance. The use of provenance for fault
tolerance has also been proposed before for grid computing
[7]. One aim to avoid common modes failure while using
multiple composite web services. Useful motivation for the
collection of provenance data can be provided through the
above work, but the move to cloud computing requires a
new analysis of current problems in the collection of
provenance data. Use of cloud environments can be adopted
by many promising tools. Which have already evaluated the
use of PASS – the Provenance-Aware Storage System by
Muniswamy- Reddy et al. [11, 12] – for cloud provenance?
Extending the Condor batch execution system to capture
data on execution environments, machine identities, log
files, file permissions and more have proposed by Reilly and
Naughton [13]. The right kind of provenance data can be
collected certainly while there are significant new
challenges on a cloud infrastructure, In the basis of the
Provenance-Aware Condor system.
III. PROPOSED METHOD
In proposed method we have described three protocols and
the third protocol is implemented in our approach for
maintaining data provenance.
Protocol 1: Both provenance and data are recorded
in a cloud object store (S3).
Protocol 2: Provenance is stored in a cloud
database (Simple DB) and data is stored in a cloud store
(S3).

Data Provenence in Public Cloud
(IJSRD/Vol. 2/Issue 09/2014/048)
All rights reserved by www.ijsrd.com 213
Protocol3: Provenance is stored in a cloud database
(Simple DB) which is hosted on MySql and data is stored in
a cloud store (S3). A cloud messaging service (SQS) is used
to provide data-coupling and multi-object causal ordering.
protocol (3) are the same as the protocol in way of the
storage but differ in the cloud messaging service shortly
defined as SQS and also differ in the transaction of data
coupling. Client maintains a SQS queue which is separated
by the Daemon and the queue is also called as Write Ahead
Log (WAL).The commit daemona record of the long
transaction takes more time. Suppose the client machine
crashes then the SQS is not responsible for the cloud
storage. In that case the local queue takes care of the cloud
storage. When a machine crashes before the machine
commits the storage, the other machine takes care of the
commit process and completes its transaction successfully.
In this protocol (3) we won‟t able to store the
messages more than 8KB. It has a major drawback of being
unable to record a large data in the WAL queue. So to
overcome that drawback we using the temporary object
called S3. When the commit daemon enters the WAL then
the entries inside the temporary object S3 is copied into the
original entries and delete the temporary entries. The
unwanted and the uncommitted transaction messages that
are left back inside the temporary object are called as
Garbage Collect that deletes automatically once in4 four
days. This process of cleaning the data and backup the four
days older message is called Cleaner daemon.
Fig. 1: Shows the proposed Architecture
IV. CONCLUSION
In this paper, for better data management in a cloud we
emphasized on the necessity of transparency and
accountability. It can be done by Adding data provenance
mechanisms to cloud computing and storage offerings is a
key requirement for a further increase and sustained
adoption of cloud services which argued by us. After
identifying key challenges and requirements for collecting
provenance in a cloud, we discussed our proposed approach,
Our approach that aims at enabling a provenance
supplemented cloud for better integrity and safety of
consumers‟ data. Moving forward, provenance collection
must not only be restricted to a single cloud service
provider‟s solutions and it can be done across multiple
clouds.
REFERENCES
[1] ABBADI, I. M. Clouds‟ Infrastructure Taxonomy,
Properties, and Management Services. In Cloud
Computing ‟11: in proceeding of the International
workshop on Cloud Computing: Architecture,
Algorithms and Applications (to appear) (July
2011), LNCS,Springer-Verlag.
[2] ABBADI, I. M. Middleware Services at Cloud
Virtual Layer. In DSOC 2011: Proceedings of the
2nd International Workshop on Dependable
Service-Oriented and Cloud computing (to appear)
(Aug. 2011), IEEE Computer Society.
[3] ABBADI, I. M. Operational trust in clouds‟
environment. In MoCS 2011: Proceedings of the
Workshop on Management of Cloud Systems (to
appear) (June 2011), IEEE Computer Society.
[4] ABBADI, I. M. Toward Trustworthy Clouds‟
Internet Scale Critical Infrastructure. In ISPEC ‟11:
Proceedings of the 7th Information Security
Practice and Experience Conference (to appear)
(June 2011), LNCS, Springer-Verlag.
[5] BOSE, R., AND FREW, J. Lineage retrieval for
scientific data processing: a survey. ACM
Comput.Surv. 37, 1 (2005), 1–28.
[6] CHENEY, J., Ed. First Workshop on the Theory
and Practice of Provenance, February 23, 2009,
San Francisco, CA, USA, Proceedings (2009),
USENIX.
[7] CRAWL, D., AND ALTINTAS, I. A Provenance-
Based Fault Tolerance Mechanism for Scientific
Workflows. In Provenance and Annotation of Data
and Processes, vol. 5272 of LNCS. Springer, 2008,
pp. 152–159.
[8] GIL, Y., DEELMAN, E., ELLISMAN, M.,
FAHRINGER, T., FOX, G., GANNON, D.,
GOBLE, C., LIVNY, M., MOREAU, L., AND
MYERS, J. Examining the Challenges of Scientific
Workflows.IEEE Computer 40, 12 (Dec. 2007),
26–34.
[9] LYLE, J., AND MARTIN, A. Trusted computing
and provenance: Better together. In TaPP ‟10:
Proceedings of the 2nd Workshop on the Theory
and Practice of Provenance (2010), Usenix.
[10]MELL, P., AND GRANCE, T. The NIST
Definition of Cloud Computing, 2009.
http://csrc.nist.gov/groups/SNS/cloudcomputing/
clouddefv15.doc.
[11]MUNISWAMY-REDDY, K.-K., MACKO, P.,
AND SELTZER, M. Provenance for the cloud. In
FAST ‟10: Proceedings of the 8th USENIX
conference on File and storage technologies (2010),
USENIX, pp. 15–14.
[12]MUNISWAMY-REDDY, K.-K., MACKO, P.,
AND SELTZER, M. I. Making a cloud
provenance-aware. In Cheney [6].
[13]REILLY, C. F., AND NAUGHTON, J. F.
Transparently Gathering Provenance with
Provenance Aware Condor.In Cheney [6].

DATA PROVENENCE IN PUBLIC CLOUD

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to DATA PROVENENCE IN PUBLIC CLOUD

Similar to DATA PROVENENCE IN PUBLIC CLOUD (20)

More from ijsrd.com

More from ijsrd.com (20)

Recently uploaded

Recently uploaded (20)

DATA PROVENENCE IN PUBLIC CLOUD