The goal of seminar is to detect when the distributor’s sensitive data has been leaked by agents, and show the probability for identifying the agent that leaked the data. We study unobtrusive techniques for detecting leakage of a set of objects or records.
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Data Leakage Detection
1. Data Leakage Detection
Presented By: Guided By:
Miss. Ashwini A. Nerkar Prof. S. R. Sontakke
Computer Science & Engineering Department,
P. R. Patil College of Engg. & Technology, Amravati.
2. Contents:
What is Data Leakage?
How does Data Leakage take place?
Current System & limitations
Problem & Setup Notation
Addition of Fake Objects
Data Allocation Strategy
Data Distribution Strategy
Optimization Problem
Calculation of Probability
Conclusion
References
1 1
0
1 0
0
1
0
1
0
0 1
0
1
3. What is Data Leakage?
Data leakage is the unauthorized transmission of sensitive data or
information within an organization to an external destination or
recipient.
Sensitive data of companies and organization includes
intellectual property,
financial information ,
patient information,
personal credit card data ,
and other information depending upon the business and the
industry.
4. How does data leakage take place?
In the course of doing business, sometimes data must be handed over to
trusted third parties for some enhancement or operations.
Sometimes these trusted third parties may act as points of data leakage.
Examples:
A hospital may give patient records to researcher who will devise
new treatments.
A company may have partnership with other companies that require
sharing of customer data.
An enterprise may outsource its data processing , so data must be
given to various other companies.
5. Owner of data is termed as the distributor and the third parties
are called as the agents .
In case of data leakage, the distributor must assess or judge
the likelihood that the leaked data came from one or more
agents, as opposed to having been independently gathered by
other means.
6. Current System & Its Limitations:
The current technique used for Data Leakage Detection is
‘Watermarking’.
A unique code is embedded in each distributed copy. If that copy is
later discovered in the hands of an unauthorized party, the leaker
can be identified.
Limitations:
It involves some modification of data that is making the data
less sensitive by altering attributes of the data.
The second problem is that these watermarks can be
sometimes destroyed if the recipient is malicious.
7. Thus we need a data leakage detection
technique which fulfils the following objective
and abides by the given constraint.
CONSTRAINT :
To satisfy agent requests by providing them with the number of objects
they request or with all available objects that satisfy their conditions.
Avoid perturbation of original data before handing it to agents
OBJECTIVE:
To be able to detect an agent who leaks any portion of his data.
8. Problem Setup And Notation
Entities and Agents:
A distributor owns a set T= {t1, tm} of valuable data objects. The
distributor wants to share some of the objects with a set of agents U1,
U2,Un, but does not wish the objects be leaked to other third parties.
The objects in T could be of any type and size, e.g., they could be
tuples in a relation, or relations in a database. An agent Ui receives a
subset of objects, determined either by
* Sample request
or
* Explicit request
11. Addition of Fake Objects:
The distributor is able to add fake objects in order to improve
the effectiveness in detecting the guilty agent.
Fake objects are objects generated by the distributor that are
not in the original set.
The objects are designed which appear realistic, and are
distributed among the agents along with the original objects.
Different fake objects may be added to the data sets of
different agents in order to increase the chances of detecting
agents that leak data.
12. Data Allocation Strategy:
The distributor intelligently give data to agents in order to improve the
chances of detecting a guilty agent.
There are four instances of this problem, depending on the type of data
requests made by agents and whether “fake objects” are allowed.
13. Data Distribution Strategy: `
Sample Data Request:
The distributor has the freedom to select the data items to provide the agents
with
General Idea:
-- Provide agents with as much disjoint sets of data as possible.
Explicit data requests:
The distributor must provide agents with the data they request
General Idea:
-- Add fake data to the distributed ones to minimize overlap of distributed data
14. Optimization Problem:
The distributor’s data allocation to agents has one
constraint and one objective.
The distributor’s constraint is to satisfy agents’
requests.
His objective is to be able to detect an agent who leaks
any portion of his data by maximizing the guilt
probability difference.
15. Calculation Of Probability:
The request of every agent is evaluated and probability
of each agent being guilty is calculated.
Pr {Gi |Ri=S } is the probability that agent is guilty
(Gi) if the distributor discovers a leaked record (Ri) or
table S that contains all objects.
16. Conclusion:
In the real scenario there is no need to hand over the sensitive data to
the agents who will unknowingly or maliciously leak it.
However, in many cases, we must indeed work with agents that may
not be 100 percent trusted, and we may not be certain if a leaked
object came from an agent or from some other source.
In spite of these difficulties, it is possible to assess the likelihood that
an agent is responsible for a leak, based on the overlap of his data
with the leaked data .
The variety of data distribution strategies that can improve the
distributor’s chances of identifying a leaker.
17. References:
P. Papadimitriou and H. Garcia-Molina, “Data leakage detection,” IEEE
Transactions on Knowledge and Data Engineering, pages 51-63, volume 23,
2011
Anusha Koneru, G.Siva Nageswara Rao, J.Venkata Rao/International Journal
of P2P Network Trends and Technology- Volume3 Issue2-2013/ISSN: 2249-
2615/ ‘Data Leakage Detection Using Encrypted Fake Objects’/
http://www.internationaljournalssrg.org Page 104
Sandip A. Kale, Prof. S.V.Kulkarni/ Data Leakage Detection / International
Journal of Advanced Research in Computer and Communication Engineering
Vol. 1, Issue 9, November 2012
International Journal of Computer Trends and Technology- volume3Issue1-
2012 ISSN:2231-2803 http://www.internationaljournalssrg.org ‘Data
Allocation Strategies for Detecting Data Leakage’ Srikanth Yadav, Dr. Y.
Eswara rao, V. Shanmukha Rao, R. Vasantha