Vishal Patil
Paresh Rawat
Pratik Nikam
Satish Patil
By:
Under The Guidance Of
Prof.Rucha Samant
Agenda
PROBLEM DEFINITION
INTRODUCTION
ISSUES
SCOPE
ANALYSIS
DESIGN
IMPLEMENTATIONS
Problem Definition
 To detect whether data has been leaked by agents.
 To prevent data leakage .
Introduction
In the course of doing business, sometimes sensitive data must
be handed over to supposedly trusted third parties.
Our goal is to detect when the distributor's sensitive
data has been leaked by agents, and if possible to identify the
agent that leaked the data.
Existing System
Proposed System
 In this system the leakage of data is detected by
generating fake objects .
 Data leakage prevention and detection of guilty
agents is handled by e-mail filtering.
Types of employees that put your company at risk
The security illiterate
The gadget nerds
The unlawful residents
The malicious/disgruntled employees
Problem Setup and Notation
 Distributor (D) is a system which will distribute data to agents
 Valuable Data (T) is the set of sensitive data which the system is going
to send to the agents
 Agent (U) is the set of agents to whom the system is going to send
sensitive data.
 Request from client will be either sample request or explicit request.
Analysis
Explicit Data Requests
1. Distributor having data T={t1,t2}
2. Agent request (R )
R1= {t1, t2} R2= {t1}
R1 gets both data t1 and t2
R2 gets data t1
Therefore value of sum objective.
R1+ R2
2/2 + 1/2 = 1.5
3. Select agent using Randomize function
algorithm.
SelectAgent {R1,…….,R2}
4. E-optimal solution
O(n+n2B)= O(n2B)
Where n= number of agents,
B= number of Fake objects.
In this algorithm, the agent
receives the entire data object
that satisfies the condition of
the agents’ data request. The
following algorithm shows the
working of Explicit Data
Request:
Sample Data Requests
With sample data requests, agents are not interested in
particular objects.
In this algorithm, the agent receives only the subset of
data object that can be given. The working of Sample
Data Request algorithm is same as the working of
Explicit Data Request.
ARCHITECTURE DIAGRAM:
Data Distributor
Agents
Agents Requesting Secured data from
the Data Distributor
Requesting sensitive
data
ARCHITECTURE DIAGRAM:
Data distributor sending the secured data
to the agents
Sensitive data is sent
Data Distributor
Agents
Internet
Agent tries to leak the sensitive data
Internet
Agent tries to leak the sensitive data
The system has the following
• Data Allocation
-- approach same as watermarking
-- less sensitive
-- add fake object in some cases
• Fake Object
-- Are real looking object
-- Should not affect data
-- Limit on fake object insertion(e-mail inbox)
-- CREATEFAKEOBJECT (Ri, Fi, CONDi)
• Optimization
-- One constraint and one objective
-- Maximize the probability difference
• Data Distributor
• e-mail Filtering
Algorithm:
1.Identify the data.
2.Remove spamming stopping words.
3.Remove or change the synonyms.
4.Calculate the priority of the word depending upon the sensitivity of the data.
5.Compare data with predefine company data sets.
6.Filter the data if it has company’s important data sets.
Attached data is
not sensitive data
E-mail sent
successfully
Attached data is a
sensitive data
E-mail not sent as
the data it contains is
sensitive
Agent
O/S : Windows XP.
Language : Asp.Net, c#.
Data Base : Sql Server 2005
System : Pentium IV 2.4 GHz
Hard Disk : 40 GB
Monitor : 15 VGA colour
Mouse : Logitech.
Keyboard : 110 keys enhanced.
RAM : 256 MB
In the real scenario there is no need to hand over the sensitive data to the
agents who will unknowingly or maliciously leak it.
However, in many cases, we must indeed work with agents that may not
be 100 percent trusted, and we may not be certain if a leaked object came
from an agent or from some other source.
In spite of these difficulties, it is possible to assess the likelihood that an
agent is responsible for a leak, based on the overlap of his data with the
leaked data .
The algorithms we have presented implement a variety of data distribution
strategies that can improve the distributor’s chances of identifying a leaker.
 R. Agrawal and J. Kiernan, “Watermarking Relational Databases,”Proc. 28th Int’l
Conf. Very Large Data Bases (VLDB ’02), VLDB.Endowment, pp. 155-166, 2002.
 S. Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio
Watermarking,” http://www.scientificcommons. org/43025658, 2007.
 F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An Improved Algorithm to
Watermark Numeric Relational Data,” Information Security Applications, pp. 138-
149, Springer, 2006.
 S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible Support for
Multiple Access Control Policies,” ACM Trans. Database Systems, vol. 26, no. 2,
pp. 214-260, 2001.
 Panagiotis Papadimitriou and Hector Garcia-Molina, “Data Leakage Detection,”
IEEE Transactions on Knowledge and Data Engineering, Vol 23, No.1 January
2011.
 B. Mungamuru and H. Garcia-Molina, “Privacy, Preservation and Performance: The
3 P’s of Distributed Data Management,” technical report, Stanford Univ., 2008.
83504808-Data-Leakage-Detection-1-Final.ppt

83504808-Data-Leakage-Detection-1-Final.ppt

  • 1.
    Vishal Patil Paresh Rawat PratikNikam Satish Patil By: Under The Guidance Of Prof.Rucha Samant
  • 2.
  • 3.
    Problem Definition  Todetect whether data has been leaked by agents.  To prevent data leakage .
  • 4.
    Introduction In the courseof doing business, sometimes sensitive data must be handed over to supposedly trusted third parties. Our goal is to detect when the distributor's sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data.
  • 5.
  • 6.
    Proposed System  Inthis system the leakage of data is detected by generating fake objects .  Data leakage prevention and detection of guilty agents is handled by e-mail filtering.
  • 7.
    Types of employeesthat put your company at risk The security illiterate The gadget nerds The unlawful residents The malicious/disgruntled employees
  • 9.
    Problem Setup andNotation  Distributor (D) is a system which will distribute data to agents  Valuable Data (T) is the set of sensitive data which the system is going to send to the agents  Agent (U) is the set of agents to whom the system is going to send sensitive data.  Request from client will be either sample request or explicit request. Analysis
  • 10.
    Explicit Data Requests 1.Distributor having data T={t1,t2} 2. Agent request (R ) R1= {t1, t2} R2= {t1} R1 gets both data t1 and t2 R2 gets data t1 Therefore value of sum objective. R1+ R2 2/2 + 1/2 = 1.5 3. Select agent using Randomize function algorithm. SelectAgent {R1,…….,R2} 4. E-optimal solution O(n+n2B)= O(n2B) Where n= number of agents, B= number of Fake objects. In this algorithm, the agent receives the entire data object that satisfies the condition of the agents’ data request. The following algorithm shows the working of Explicit Data Request:
  • 11.
    Sample Data Requests Withsample data requests, agents are not interested in particular objects. In this algorithm, the agent receives only the subset of data object that can be given. The working of Sample Data Request algorithm is same as the working of Explicit Data Request.
  • 12.
    ARCHITECTURE DIAGRAM: Data Distributor Agents AgentsRequesting Secured data from the Data Distributor Requesting sensitive data
  • 13.
    ARCHITECTURE DIAGRAM: Data distributorsending the secured data to the agents Sensitive data is sent Data Distributor Agents
  • 14.
    Internet Agent tries toleak the sensitive data
  • 15.
    Internet Agent tries toleak the sensitive data
  • 16.
    The system hasthe following • Data Allocation -- approach same as watermarking -- less sensitive -- add fake object in some cases • Fake Object -- Are real looking object -- Should not affect data -- Limit on fake object insertion(e-mail inbox) -- CREATEFAKEOBJECT (Ri, Fi, CONDi)
  • 17.
    • Optimization -- Oneconstraint and one objective -- Maximize the probability difference • Data Distributor • e-mail Filtering
  • 18.
    Algorithm: 1.Identify the data. 2.Removespamming stopping words. 3.Remove or change the synonyms. 4.Calculate the priority of the word depending upon the sensitivity of the data. 5.Compare data with predefine company data sets. 6.Filter the data if it has company’s important data sets.
  • 19.
    Attached data is notsensitive data E-mail sent successfully Attached data is a sensitive data E-mail not sent as the data it contains is sensitive Agent
  • 20.
    O/S : WindowsXP. Language : Asp.Net, c#. Data Base : Sql Server 2005 System : Pentium IV 2.4 GHz Hard Disk : 40 GB Monitor : 15 VGA colour Mouse : Logitech. Keyboard : 110 keys enhanced. RAM : 256 MB
  • 21.
    In the realscenario there is no need to hand over the sensitive data to the agents who will unknowingly or maliciously leak it. However, in many cases, we must indeed work with agents that may not be 100 percent trusted, and we may not be certain if a leaked object came from an agent or from some other source. In spite of these difficulties, it is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data . The algorithms we have presented implement a variety of data distribution strategies that can improve the distributor’s chances of identifying a leaker.
  • 22.
     R. Agrawaland J. Kiernan, “Watermarking Relational Databases,”Proc. 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB.Endowment, pp. 155-166, 2002.  S. Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio Watermarking,” http://www.scientificcommons. org/43025658, 2007.  F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An Improved Algorithm to Watermark Numeric Relational Data,” Information Security Applications, pp. 138- 149, Springer, 2006.  S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible Support for Multiple Access Control Policies,” ACM Trans. Database Systems, vol. 26, no. 2, pp. 214-260, 2001.  Panagiotis Papadimitriou and Hector Garcia-Molina, “Data Leakage Detection,” IEEE Transactions on Knowledge and Data Engineering, Vol 23, No.1 January 2011.  B. Mungamuru and H. Garcia-Molina, “Privacy, Preservation and Performance: The 3 P’s of Distributed Data Management,” technical report, Stanford Univ., 2008.

Editor's Notes

  • #17 Our approach and watermarking are similar in the sense of providing agents with some kind of receiver identifying information. However, by its very nature, a watermark modifies the item being watermarked. If the object to be watermarked cannot be modified, then a watermark cannot be inserted The distributor may be able to add fake objects to the distributed data in order to improve his effectiveness in detecting guilty agents. However, fake objects may impact the correctness of what agents do, so they may not always be allowable. perturbing data to detect leakage is not new