SlideShare a Scribd company logo
1 of 3
Download to read offline
International Journal of Modern Engineering Research (IJMER)
                  www.ijmer.com          Vol.3, Issue.1, Jan-Feb. 2013 pp-538-540      ISSN: 2249-6645

                                     A Novel Data Leakage Detection

                        Priyanka Barge, 1 Pratibha Dhawale, 2 Namrata Kolashetti3
                            Ass. Prof., Department of Computer Engineering, NIRMALA CHOUHAN

Abstract: This paper contains concept of data leakage, its causes of leakage and different techniques to detect the data
leakage. The value of the data is incredible, so it should not be leaked or altered. In the field of IT, huge database is being
used. This database is shared with multiple people at a time. But during this sharing of the data, there are huge chances of
data vulnerability, leakage or alteration. So, to prevent these problems, a data leakage detection system has been proposed.
This paper includes brief idea about data leakage detection and a methodology to detect the data leakage persons.

Keywords: Guilty agent, data distributor, watermarking, fake object, data leakage.

                                                          I. Introduction
          Data leakage is defined as the accidental or unintentional distribution of private or sensitive data to unauthorized
entity. Sensitive data of companies and organizations includes intellectual property (IP), financial information, patient
information, personal credit-card data, and other information depending on the business and the industry.
          Furthermore, in many cases, sensitive data is shared among various stakeholders such as employees working from
outside the organizational premises (e.g., on laptops), business partners and customers. This increases the risk of confidential
information falling into unauthorized hands. Whether caused by malicious intent, or an inadvertent mistake, by an insider or
outsider, exposed sensitive information can seriously hurt an organization.
          The potential damage and adverse consequences of a data leak incident can be classified into the following two
categories: direct and indirect loss[3]. Direct loss refers to tangible damage that is easy to measure and estimate
quantitatively. Indirect loss, on the other hand, is much harder to quantify and has a much broader impact in terms of cost,
place and time. Direct loss includes violating regulations (such as those protecting customer privacy) resulting in
fine/settlement/customer compensation fees; litigation of lawsuits; loss of future sales; costs of investigation and
remedial/restoration fees. Indirect loss includes reduced share-price as a result of the negative publicity; damage to
company's goodwill and reputation; customer abandonment; and exposure of Intellectual Property (business plans, code,
financial reports, and meeting agendas) to competitors.

                                                      II. Existing System
          Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed
copy. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be
very useful in some cases, but again, involve some modification of the original data. Furthermore, watermarks can
sometimes be destroyed if the data recipient is malicious. E.g. A hospital may give patient records to researchers who will
devise new treatments. Similarly, a company may have partnerships with other companies that require sharing customer data.
Another enterprise may outsource its data processing, so data must be given to various other companies. We call the owner
of the data the distributor and the supposedly trusted third parties the agents.

                                                     III. Proposed System
           Our goal is to detect, when the distributor’s sensitive data has been leaked by agents, and if possible to identify the
agent that leaked the data. Perturbation is a very useful technique where the data is modified and made “less sensitive”
before being handed to agents. We propose to develop unobtrusive techniques for detecting leakage of a set of objects or
records.
           In this section, we propose to develop a model for assessing the “guilt” of agents. We also present algorithms for
distributing objects to agents, in a way that improves our chances of identifying a leaker. Finally, we also consider the option
of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the
agents. In a sense, the fake objects acts as a type of watermark for the entire set, without modifying any individual members
[2]. If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident
that agent was guilty.

                                                       IV. Methodology
4.1 Entities And Agents
             Distributor owns set of data objects T= {t1,t2,…,tn}. Distributor has to share some of the objects with set of agents
 U1, U2 . . . Un , but does not wish the object be leaked to other third parties. The objects in T could be of any type and size, e.g.,
 they could be tuples in a relation or relations in a database.
 An agent Ui receives a subset Ri of objects T, determined either by a sample request or an explicit request.




                                                              www.ijmer.com                                                538 | Page
International Journal of Modern Engineering Research (IJMER)
                  www.ijmer.com          Vol.3, Issue.1, Jan-Feb. 2013 pp-538-540      ISSN: 2249-6645


    Evaluation of Explicit Data Request Algorithms
          In this request the agent will send the request with appropriate condition. Agent gives the input as request with input
as well as the condition for the request. After processing on the data he will get the new data by adding fake object using
watermarking technique.
Explicit request Ri= EXPLICIT(T, condi): Agent Ui receives all T objects that satisfy condi.

    Evaluation of Sample Data Request Algorithms
         In this request agent request does not have condition. The agent sends the request without condition as per his query
he will get the data by adding fake object using watermarking technique.
         Sample request= SAMPLE(T, mi) : Any subset of mi records from T can be given to Ui.

4.2 Guilty Agents
           Suppose that after giving objects to agents, the distributor discovers that a set S belongs to T has leaked. This means
 that the third party, called the target, has been caught in possession of S.
           We say an agent Ui is guilty and if it contributes one or more object to target. We denote the event that agent Ui is
 guilty as Gi and the event that agent Ui is guilty for a given leaked set S as Gi|S. Our next step is to estimate Pr{Gi|S}, i.e. the
 probability that agent Ui is guilty given evidence S.

4.3   Related Work
          Here we proposed a watermarking algorithm that embeds the watermark bits in the least significant bits (LSB) of
selected attributes of a selected subset of tuples. This technique does not provide mechanism for multibit watermarks, insead
only secret key is used. For each tuple, a secure message authenticated code (MAC) is computed using the secret key and
tuple’s primary key. The computed MAC is used to select candidate tuples, attributes and LSB position in the selected
attributes. However, the watermark can be easily compromised by very trivial attacks. For example a simple manipulation of
the data by shifting the LSB is one position easily leads to watermark loss without much damage to the data. Therefore LSB
based data hiding technique is not resilient. Moreover, it assumes that the LSB bits in any tuple can be altered without
checking data constraints. Simple unconstrained LSB manipulations can easily generate undesirable results. Thus such
technique is not resilient to deletion and insertion attacks.

                                                        V. User Model
           To compute this Pr{Gi |S}, we need an estimate for the probability that values in S can be “guessed” by the target
We call this estimate pt, the probability that object t can be guessed by the target.
           Probability Pt is analogous to the probabilities used in designing fault-tolerant systems. That is, to estimate how
likely it is that a system will be operational throughout a given period, we need the probabilities that individual components
will or will not fail. A component failure in our case is the event that the target guesses an object of S. The component failure
is used to compute the overall system reliability, while we use the probability of guessing to identify agents that have leaked
information. The component failure probabilities are estimated based on experiments, just as we propose to estimate the Pt’s.
Similarly, the component probabilities are usually conservative estimates, rather than exact numbers. For example, say we
use a component failure probability that is higher than the actual probability, and we design our system to provide a desired
high level of reliability. Then we will know that the actual system will have at least that level of reliability, but possibly
higher. In the same way, if we use Pt’s that are higher than the true values, we will know that the agents will be guilty with at
least the computed probabilities.

                                               VI. Data Allocation Problem
         The main focus of our paper is the data allocation problem: how can the distributor “intelligently” give data to
agents in order to improve the chances of detecting a guilty agent?

6.1 Fake Objects
          The idea of perturbing data to detect leakage is not new. In our case, we are perturbing the set of distributor objects
 by adding fake elements. In some applications, fake objects may cause fewer problems that perturbing real objects. For
 example, say the distributed data objects are medical records and the agents are hospitals. In this case, even small
 modifications to the records of actual patients may be undesirable. However, the addition of some fake medical records may
 be acceptable, since no patient matches these records, and hence no one will ever be treated based on fake records.
          Our use of fake objects is inspired by the use of “trace” records in mailing lists. In this case, company A sells to
 company B a mailing list to be used once (e.g., to send advertisements). Company A adds trace records that contain
 addresses owned by company A. Thus, each time company B uses the purchased mailing list, A receives copies of the
 mailing. These records are a type of fake objects that help identify improper use of data.
          The distributor creates and adds fake objects to the data that he distributes to agents. We let Fi ⊆ Ri be the subset of
 fake objects that agent Ui receives. Fake objects must be created carefully so that agents cannot distinguish them from real
 objects.

                                                             www.ijmer.com                                                539 | Page
International Journal of Modern Engineering Research (IJMER)
                 www.ijmer.com          Vol.3, Issue.1, Jan-Feb. 2013 pp-538-540      ISSN: 2249-6645

6.2 Optimization Problem
          The Optimization Module is the distributor’s data allocation to agents has one constraint and one objective. The
distributor’s constraint is to satisfy agents’ requests, by providing them with the number of objects they request or with all
available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any portion of his data.

                                                     VII. Conclusion
         In a perfect world there would be no need to hand over sensitive data to agents that may unknowingly or
maliciously leak it. And even if we had to hand over sensitive data, in a perfect world we could watermark each object so
that we could trace its origins with absolute certainty. However, in many cases we must indeed work with agents that may
not be 100% trusted, and we may not be certain if a leaked object came from an agent or from some other source, since
certain data cannot admit watermarks.
         In spite of these difficulties, we have shown it is possible to assess the likelihood that an agent is responsible for a
leak, based on the overlap of his data with the leaked data and the data of other agents, and based on the probability that
objects can be “guessed” by other means. Our model is relatively simple, but we believe it captures the essential trade-offs.
The algorithms we have presented implement a variety of data distribution strategies that can improve the distributor’s
chances of identifying a leaker. We have shown that distributing objects judiciously can make a significant difference in
identifying guilty agents, especially in cases where there is large overlap in the data that agents must receive.

7.1 Advantages
  It is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the
     leaked data and the data of other agents and based on the probability that objects can be guessed by other means [1].

Disadvantages
 In given watermarking algorithm, watermark can be easily compromised by very trivial attacks. For example a simple
    manipulation of the data by shifting the LSB is one position easily leads to watermark loss without much damage to the
    data. Therefore LSB based data hiding technique is not resilient.

                                                  VIII. Acknowledge
         For all the efforts behind this paper work, we first would like to express our sincere thanks to the staff of Dept. of
computer Engg., for their extended help and suggestions at every stage. It is with a great sense of gratitude that I
acknowledge the support, time to time suggestions to my guide prof. Nirmala Chouhan, Prof. G. M. Bhandari(HOD) and
prof. Jayant Jadhav (Project Co-ordinator).

                                                          References
[1]   P. Papadimitriou and H. Garcia-Molina, “Data leakage detection,” IEEE Transactions on Knowledge and Data
      Engineering, pages 51-63, volume 23, 2011.
[2]   R. Agrawal and J. Kiernan, “Watermarking Relational Databases,” Proc. 28th Int’l Conf. Very Large Data Bases
      (VLDB ’02), VLDB Endowment, pp. 155-166, 2002.
[3]   Sandip A. Kale, Prof. S.V. Kulkarni/ IOSR Journal of Computer Engineering (IOSRJCE) ISSN:2278-0661 Volume 1,
      Issue 6 (July-Aug 2012), PP 32-35 www.iosrjournals.org / page Data Leakage Detection : A Survey




                                                           www.ijmer.com                                               540 | Page

More Related Content

What's hot

Data Leakage Detectionusing Distribution Method
Data Leakage Detectionusing Distribution Method Data Leakage Detectionusing Distribution Method
Data Leakage Detectionusing Distribution Method IJMER
 
IRJET- Data Leakage Detection using Cloud Computing
IRJET- Data Leakage Detection using Cloud ComputingIRJET- Data Leakage Detection using Cloud Computing
IRJET- Data Leakage Detection using Cloud ComputingIRJET Journal
 
Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)Chaitanya Kn
 
Modeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage FraudModeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage FraudIOSR Journals
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectionAjitkaur saini
 
Privacy Preserving Based Cloud Storage System
Privacy Preserving Based Cloud Storage SystemPrivacy Preserving Based Cloud Storage System
Privacy Preserving Based Cloud Storage SystemKumar Goud
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectionbunnz12345
 
A Robust Approach for Detecting Data Leakage and Data Leaker in Organizations
A Robust Approach for Detecting Data Leakage and Data Leaker in OrganizationsA Robust Approach for Detecting Data Leakage and Data Leaker in Organizations
A Robust Approach for Detecting Data Leakage and Data Leaker in OrganizationsIOSR Journals
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsIOSR Journals
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches Venkat Projects
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigmMigrant Systems
 
data mining for security application
data mining for security applicationdata mining for security application
data mining for security applicationbharatsvnit
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data miningMesbah Uddin Khan
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
report
reportreport
reportbutest
 

What's hot (20)

Data Leakage Detectionusing Distribution Method
Data Leakage Detectionusing Distribution Method Data Leakage Detectionusing Distribution Method
Data Leakage Detectionusing Distribution Method
 
IRJET- Data Leakage Detection using Cloud Computing
IRJET- Data Leakage Detection using Cloud ComputingIRJET- Data Leakage Detection using Cloud Computing
IRJET- Data Leakage Detection using Cloud Computing
 
Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)
 
Modeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage FraudModeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage Fraud
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Sub1555
Sub1555Sub1555
Sub1555
 
Privacy Preserving Based Cloud Storage System
Privacy Preserving Based Cloud Storage SystemPrivacy Preserving Based Cloud Storage System
Privacy Preserving Based Cloud Storage System
 
709 713
709 713709 713
709 713
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
A Robust Approach for Detecting Data Leakage and Data Leaker in Organizations
A Robust Approach for Detecting Data Leakage and Data Leaker in OrganizationsA Robust Approach for Detecting Data Leakage and Data Leaker in Organizations
A Robust Approach for Detecting Data Leakage and Data Leaker in Organizations
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigm
 
data mining for security application
data mining for security applicationdata mining for security application
data mining for security application
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data mining
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
report
reportreport
report
 

Viewers also liked

At2419231928
At2419231928At2419231928
At2419231928IJMER
 
Ijmer 46060714
Ijmer 46060714Ijmer 46060714
Ijmer 46060714IJMER
 
D0502 01 1722
D0502 01 1722D0502 01 1722
D0502 01 1722IJMER
 
Χαλίλ Γκιμπράν
Χαλίλ ΓκιμπράνΧαλίλ Γκιμπράν
Χαλίλ ΓκιμπράνPopi Kaza
 
A Comparison Of Smart Routings In Mobile Ad Hoc Networks(MANETs)
A Comparison Of Smart Routings In Mobile Ad Hoc  Networks(MANETs) A Comparison Of Smart Routings In Mobile Ad Hoc  Networks(MANETs)
A Comparison Of Smart Routings In Mobile Ad Hoc Networks(MANETs) IJMER
 
Ijmer 46052329
Ijmer 46052329Ijmer 46052329
Ijmer 46052329IJMER
 
Results from set-operations on Fuzzy soft sets
Results from set-operations on Fuzzy soft setsResults from set-operations on Fuzzy soft sets
Results from set-operations on Fuzzy soft setsIJMER
 
Ireland's Energy Futures
Ireland's Energy FuturesIreland's Energy Futures
Ireland's Energy Futuressaulcj
 
I0503 01 6670
I0503 01 6670I0503 01 6670
I0503 01 6670IJMER
 
Bx31302306
Bx31302306Bx31302306
Bx31302306IJMER
 
Investigation of Effects of impact loads on Framed Structures
Investigation of Effects of impact loads on Framed StructuresInvestigation of Effects of impact loads on Framed Structures
Investigation of Effects of impact loads on Framed StructuresIJMER
 
Cz31447455
Cz31447455Cz31447455
Cz31447455IJMER
 
Picture powerpoint
Picture powerpointPicture powerpoint
Picture powerpointpadisav17
 
Optimization of Surface Impedance for Reducing Surface Waves between Antennas
Optimization of Surface Impedance for Reducing Surface Waves between AntennasOptimization of Surface Impedance for Reducing Surface Waves between Antennas
Optimization of Surface Impedance for Reducing Surface Waves between AntennasIJMER
 
G04013845
G04013845G04013845
G04013845IJMER
 
Finite Element Analysis of Human RIB Cage
Finite Element Analysis of Human RIB CageFinite Element Analysis of Human RIB Cage
Finite Element Analysis of Human RIB CageIJMER
 
Liberty computerpub2
Liberty computerpub2Liberty computerpub2
Liberty computerpub2padisav17
 
De3211001104
De3211001104De3211001104
De3211001104IJMER
 

Viewers also liked (20)

Cell survival curve
Cell survival curveCell survival curve
Cell survival curve
 
At2419231928
At2419231928At2419231928
At2419231928
 
Ijmer 46060714
Ijmer 46060714Ijmer 46060714
Ijmer 46060714
 
D0502 01 1722
D0502 01 1722D0502 01 1722
D0502 01 1722
 
Χαλίλ Γκιμπράν
Χαλίλ ΓκιμπράνΧαλίλ Γκιμπράν
Χαλίλ Γκιμπράν
 
A Comparison Of Smart Routings In Mobile Ad Hoc Networks(MANETs)
A Comparison Of Smart Routings In Mobile Ad Hoc  Networks(MANETs) A Comparison Of Smart Routings In Mobile Ad Hoc  Networks(MANETs)
A Comparison Of Smart Routings In Mobile Ad Hoc Networks(MANETs)
 
Ijmer 46052329
Ijmer 46052329Ijmer 46052329
Ijmer 46052329
 
Results from set-operations on Fuzzy soft sets
Results from set-operations on Fuzzy soft setsResults from set-operations on Fuzzy soft sets
Results from set-operations on Fuzzy soft sets
 
Ireland's Energy Futures
Ireland's Energy FuturesIreland's Energy Futures
Ireland's Energy Futures
 
I0503 01 6670
I0503 01 6670I0503 01 6670
I0503 01 6670
 
Bx31302306
Bx31302306Bx31302306
Bx31302306
 
Investigation of Effects of impact loads on Framed Structures
Investigation of Effects of impact loads on Framed StructuresInvestigation of Effects of impact loads on Framed Structures
Investigation of Effects of impact loads on Framed Structures
 
Cz31447455
Cz31447455Cz31447455
Cz31447455
 
Picture powerpoint
Picture powerpointPicture powerpoint
Picture powerpoint
 
Optimization of Surface Impedance for Reducing Surface Waves between Antennas
Optimization of Surface Impedance for Reducing Surface Waves between AntennasOptimization of Surface Impedance for Reducing Surface Waves between Antennas
Optimization of Surface Impedance for Reducing Surface Waves between Antennas
 
Reality tv
Reality tvReality tv
Reality tv
 
G04013845
G04013845G04013845
G04013845
 
Finite Element Analysis of Human RIB Cage
Finite Element Analysis of Human RIB CageFinite Element Analysis of Human RIB Cage
Finite Element Analysis of Human RIB Cage
 
Liberty computerpub2
Liberty computerpub2Liberty computerpub2
Liberty computerpub2
 
De3211001104
De3211001104De3211001104
De3211001104
 

Similar to Detecting data leakage through watermarking and fake objects

IRJET- Data Leakage Detection System
IRJET- Data Leakage Detection SystemIRJET- Data Leakage Detection System
IRJET- Data Leakage Detection SystemIRJET Journal
 
Data Leakage Detection
Data Leakage DetectionData Leakage Detection
Data Leakage DetectionAshwini Nerkar
 
164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf
164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf
164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdfDrog3
 
Jpdcs1 data leakage detection
Jpdcs1 data leakage detectionJpdcs1 data leakage detection
Jpdcs1 data leakage detectionChaitanya Kn
 
83504808-Data-Leakage-Detection-1-Final.ppt
83504808-Data-Leakage-Detection-1-Final.ppt83504808-Data-Leakage-Detection-1-Final.ppt
83504808-Data-Leakage-Detection-1-Final.pptnaresh2004s
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectionMohit Pandey
 
Data leakage detection
Data leakage detection Data leakage detection
Data leakage detection Suveeksha
 
10.1.1.436.3364.pdf
10.1.1.436.3364.pdf10.1.1.436.3364.pdf
10.1.1.436.3364.pdfmistryritesh
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectionkalpesh1908
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectionVikrant Arya
 
data-leakage-detection
data-leakage-detectiondata-leakage-detection
data-leakage-detectionNagendra Kumar
 
Secure Multimedia Content Protection and Sharing
Secure Multimedia Content Protection and SharingSecure Multimedia Content Protection and Sharing
Secure Multimedia Content Protection and SharingIRJET Journal
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePvrtechnologies Nellore
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectiongaurav kumar
 
Data leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdf
Data leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdfData leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdf
Data leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdfnaresh2004s
 
dataleakagedetection-1811210400vgjcd01.pptx
dataleakagedetection-1811210400vgjcd01.pptxdataleakagedetection-1811210400vgjcd01.pptx
dataleakagedetection-1811210400vgjcd01.pptxnaresh2004s
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposureredpel dot com
 
Privacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive DataPrivacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive Datapaperpublications3
 
M privacy for collaborative data publishing
M privacy for collaborative data publishingM privacy for collaborative data publishing
M privacy for collaborative data publishingJPINFOTECH JAYAPRAKASH
 

Similar to Detecting data leakage through watermarking and fake objects (20)

IRJET- Data Leakage Detection System
IRJET- Data Leakage Detection SystemIRJET- Data Leakage Detection System
IRJET- Data Leakage Detection System
 
Data Leakage Detection
Data Leakage DetectionData Leakage Detection
Data Leakage Detection
 
164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf
164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf
164788616_Data_Leakage_Detection_Complete_Project_Report__1_.docx.pdf
 
Jpdcs1 data leakage detection
Jpdcs1 data leakage detectionJpdcs1 data leakage detection
Jpdcs1 data leakage detection
 
83504808-Data-Leakage-Detection-1-Final.ppt
83504808-Data-Leakage-Detection-1-Final.ppt83504808-Data-Leakage-Detection-1-Final.ppt
83504808-Data-Leakage-Detection-1-Final.ppt
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
purnima.ppt
purnima.pptpurnima.ppt
purnima.ppt
 
Data leakage detection
Data leakage detection Data leakage detection
Data leakage detection
 
10.1.1.436.3364.pdf
10.1.1.436.3364.pdf10.1.1.436.3364.pdf
10.1.1.436.3364.pdf
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
data-leakage-detection
data-leakage-detectiondata-leakage-detection
data-leakage-detection
 
Secure Multimedia Content Protection and Sharing
Secure Multimedia Content Protection and SharingSecure Multimedia Content Protection and Sharing
Secure Multimedia Content Protection and Sharing
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Data leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdf
Data leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdfData leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdf
Data leakage detbxhbbhhbsbssusbgsgsbshsbsection.pdf
 
dataleakagedetection-1811210400vgjcd01.pptx
dataleakagedetection-1811210400vgjcd01.pptxdataleakagedetection-1811210400vgjcd01.pptx
dataleakagedetection-1811210400vgjcd01.pptx
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Privacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive DataPrivacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive Data
 
M privacy for collaborative data publishing
M privacy for collaborative data publishingM privacy for collaborative data publishing
M privacy for collaborative data publishing
 

More from IJMER

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless BicycleIJMER
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessIJMER
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And AuditIJMER
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLIJMER
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyIJMER
 

More from IJMER (20)

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed Delinting
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja Fibre
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless Bicycle
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise Applications
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation System
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological Spaces
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And Audit
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDL
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One Prey
 

Detecting data leakage through watermarking and fake objects

  • 1. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.1, Jan-Feb. 2013 pp-538-540 ISSN: 2249-6645 A Novel Data Leakage Detection Priyanka Barge, 1 Pratibha Dhawale, 2 Namrata Kolashetti3 Ass. Prof., Department of Computer Engineering, NIRMALA CHOUHAN Abstract: This paper contains concept of data leakage, its causes of leakage and different techniques to detect the data leakage. The value of the data is incredible, so it should not be leaked or altered. In the field of IT, huge database is being used. This database is shared with multiple people at a time. But during this sharing of the data, there are huge chances of data vulnerability, leakage or alteration. So, to prevent these problems, a data leakage detection system has been proposed. This paper includes brief idea about data leakage detection and a methodology to detect the data leakage persons. Keywords: Guilty agent, data distributor, watermarking, fake object, data leakage. I. Introduction Data leakage is defined as the accidental or unintentional distribution of private or sensitive data to unauthorized entity. Sensitive data of companies and organizations includes intellectual property (IP), financial information, patient information, personal credit-card data, and other information depending on the business and the industry. Furthermore, in many cases, sensitive data is shared among various stakeholders such as employees working from outside the organizational premises (e.g., on laptops), business partners and customers. This increases the risk of confidential information falling into unauthorized hands. Whether caused by malicious intent, or an inadvertent mistake, by an insider or outsider, exposed sensitive information can seriously hurt an organization. The potential damage and adverse consequences of a data leak incident can be classified into the following two categories: direct and indirect loss[3]. Direct loss refers to tangible damage that is easy to measure and estimate quantitatively. Indirect loss, on the other hand, is much harder to quantify and has a much broader impact in terms of cost, place and time. Direct loss includes violating regulations (such as those protecting customer privacy) resulting in fine/settlement/customer compensation fees; litigation of lawsuits; loss of future sales; costs of investigation and remedial/restoration fees. Indirect loss includes reduced share-price as a result of the negative publicity; damage to company's goodwill and reputation; customer abandonment; and exposure of Intellectual Property (business plans, code, financial reports, and meeting agendas) to competitors. II. Existing System Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be very useful in some cases, but again, involve some modification of the original data. Furthermore, watermarks can sometimes be destroyed if the data recipient is malicious. E.g. A hospital may give patient records to researchers who will devise new treatments. Similarly, a company may have partnerships with other companies that require sharing customer data. Another enterprise may outsource its data processing, so data must be given to various other companies. We call the owner of the data the distributor and the supposedly trusted third parties the agents. III. Proposed System Our goal is to detect, when the distributor’s sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data. Perturbation is a very useful technique where the data is modified and made “less sensitive” before being handed to agents. We propose to develop unobtrusive techniques for detecting leakage of a set of objects or records. In this section, we propose to develop a model for assessing the “guilt” of agents. We also present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker. Finally, we also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects acts as a type of watermark for the entire set, without modifying any individual members [2]. If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty. IV. Methodology 4.1 Entities And Agents Distributor owns set of data objects T= {t1,t2,…,tn}. Distributor has to share some of the objects with set of agents U1, U2 . . . Un , but does not wish the object be leaked to other third parties. The objects in T could be of any type and size, e.g., they could be tuples in a relation or relations in a database. An agent Ui receives a subset Ri of objects T, determined either by a sample request or an explicit request. www.ijmer.com 538 | Page
  • 2. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.1, Jan-Feb. 2013 pp-538-540 ISSN: 2249-6645  Evaluation of Explicit Data Request Algorithms In this request the agent will send the request with appropriate condition. Agent gives the input as request with input as well as the condition for the request. After processing on the data he will get the new data by adding fake object using watermarking technique. Explicit request Ri= EXPLICIT(T, condi): Agent Ui receives all T objects that satisfy condi.  Evaluation of Sample Data Request Algorithms In this request agent request does not have condition. The agent sends the request without condition as per his query he will get the data by adding fake object using watermarking technique. Sample request= SAMPLE(T, mi) : Any subset of mi records from T can be given to Ui. 4.2 Guilty Agents Suppose that after giving objects to agents, the distributor discovers that a set S belongs to T has leaked. This means that the third party, called the target, has been caught in possession of S. We say an agent Ui is guilty and if it contributes one or more object to target. We denote the event that agent Ui is guilty as Gi and the event that agent Ui is guilty for a given leaked set S as Gi|S. Our next step is to estimate Pr{Gi|S}, i.e. the probability that agent Ui is guilty given evidence S. 4.3 Related Work Here we proposed a watermarking algorithm that embeds the watermark bits in the least significant bits (LSB) of selected attributes of a selected subset of tuples. This technique does not provide mechanism for multibit watermarks, insead only secret key is used. For each tuple, a secure message authenticated code (MAC) is computed using the secret key and tuple’s primary key. The computed MAC is used to select candidate tuples, attributes and LSB position in the selected attributes. However, the watermark can be easily compromised by very trivial attacks. For example a simple manipulation of the data by shifting the LSB is one position easily leads to watermark loss without much damage to the data. Therefore LSB based data hiding technique is not resilient. Moreover, it assumes that the LSB bits in any tuple can be altered without checking data constraints. Simple unconstrained LSB manipulations can easily generate undesirable results. Thus such technique is not resilient to deletion and insertion attacks. V. User Model To compute this Pr{Gi |S}, we need an estimate for the probability that values in S can be “guessed” by the target We call this estimate pt, the probability that object t can be guessed by the target. Probability Pt is analogous to the probabilities used in designing fault-tolerant systems. That is, to estimate how likely it is that a system will be operational throughout a given period, we need the probabilities that individual components will or will not fail. A component failure in our case is the event that the target guesses an object of S. The component failure is used to compute the overall system reliability, while we use the probability of guessing to identify agents that have leaked information. The component failure probabilities are estimated based on experiments, just as we propose to estimate the Pt’s. Similarly, the component probabilities are usually conservative estimates, rather than exact numbers. For example, say we use a component failure probability that is higher than the actual probability, and we design our system to provide a desired high level of reliability. Then we will know that the actual system will have at least that level of reliability, but possibly higher. In the same way, if we use Pt’s that are higher than the true values, we will know that the agents will be guilty with at least the computed probabilities. VI. Data Allocation Problem The main focus of our paper is the data allocation problem: how can the distributor “intelligently” give data to agents in order to improve the chances of detecting a guilty agent? 6.1 Fake Objects The idea of perturbing data to detect leakage is not new. In our case, we are perturbing the set of distributor objects by adding fake elements. In some applications, fake objects may cause fewer problems that perturbing real objects. For example, say the distributed data objects are medical records and the agents are hospitals. In this case, even small modifications to the records of actual patients may be undesirable. However, the addition of some fake medical records may be acceptable, since no patient matches these records, and hence no one will ever be treated based on fake records. Our use of fake objects is inspired by the use of “trace” records in mailing lists. In this case, company A sells to company B a mailing list to be used once (e.g., to send advertisements). Company A adds trace records that contain addresses owned by company A. Thus, each time company B uses the purchased mailing list, A receives copies of the mailing. These records are a type of fake objects that help identify improper use of data. The distributor creates and adds fake objects to the data that he distributes to agents. We let Fi ⊆ Ri be the subset of fake objects that agent Ui receives. Fake objects must be created carefully so that agents cannot distinguish them from real objects. www.ijmer.com 539 | Page
  • 3. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.1, Jan-Feb. 2013 pp-538-540 ISSN: 2249-6645 6.2 Optimization Problem The Optimization Module is the distributor’s data allocation to agents has one constraint and one objective. The distributor’s constraint is to satisfy agents’ requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any portion of his data. VII. Conclusion In a perfect world there would be no need to hand over sensitive data to agents that may unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a perfect world we could watermark each object so that we could trace its origins with absolute certainty. However, in many cases we must indeed work with agents that may not be 100% trusted, and we may not be certain if a leaked object came from an agent or from some other source, since certain data cannot admit watermarks. In spite of these difficulties, we have shown it is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data and the data of other agents, and based on the probability that objects can be “guessed” by other means. Our model is relatively simple, but we believe it captures the essential trade-offs. The algorithms we have presented implement a variety of data distribution strategies that can improve the distributor’s chances of identifying a leaker. We have shown that distributing objects judiciously can make a significant difference in identifying guilty agents, especially in cases where there is large overlap in the data that agents must receive. 7.1 Advantages  It is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data and the data of other agents and based on the probability that objects can be guessed by other means [1]. Disadvantages  In given watermarking algorithm, watermark can be easily compromised by very trivial attacks. For example a simple manipulation of the data by shifting the LSB is one position easily leads to watermark loss without much damage to the data. Therefore LSB based data hiding technique is not resilient. VIII. Acknowledge For all the efforts behind this paper work, we first would like to express our sincere thanks to the staff of Dept. of computer Engg., for their extended help and suggestions at every stage. It is with a great sense of gratitude that I acknowledge the support, time to time suggestions to my guide prof. Nirmala Chouhan, Prof. G. M. Bhandari(HOD) and prof. Jayant Jadhav (Project Co-ordinator). References [1] P. Papadimitriou and H. Garcia-Molina, “Data leakage detection,” IEEE Transactions on Knowledge and Data Engineering, pages 51-63, volume 23, 2011. [2] R. Agrawal and J. Kiernan, “Watermarking Relational Databases,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166, 2002. [3] Sandip A. Kale, Prof. S.V. Kulkarni/ IOSR Journal of Computer Engineering (IOSRJCE) ISSN:2278-0661 Volume 1, Issue 6 (July-Aug 2012), PP 32-35 www.iosrjournals.org / page Data Leakage Detection : A Survey www.ijmer.com 540 | Page