This document summarizes a seminar on data leakage detection. It introduces the topics of data leakage, the objectives of detecting and identifying the source of leaked data, existing watermarking techniques, and a proposed system using perturbation to detect leakage and assess guilt of agents. The proposed system aims to distribute data across agents in a way that improves the ability to identify the source of any leaks. The document also discusses types of employees that increase leakage risk and the impact of leaks on organizations.
3. INTRODUCTION
Data leakage is defined as the accidental or unintentional distribution
of private or sensitive data to an unauthorized entity .
Data leakage poses a serious issue for companies as the number of
incidents and the cost to those experiencing them continue to increase.
Data leakage is enhanced by the fact that transmitted data including
emails, instant messaging, website forms, and file transfers among
others, are largely unregulated and unmonitored on their way to their
destinations.
4. OBJECTIVE
A data distributor has given sensitive data to a set of
supposedly trusted agents (third parties).
Some of the data is leaked and found in an unauthorized place
(e.g., on the web or somebody’s laptop).
The distributor must assess the likelihood that the leaked data
came from one or more agents, as opposed to having been
independently gathered by other means.
We propose data allocation strategies (across the agents) that
improve the probability of identifying leakages.
5. EXISTING SYSTEM
Traditionally, leakage detection is handled by watermarking,
e.g., a unique code is embedded in each distributed copy.
If that copy is later discovered in the hands of an unauthorized
party, the leaker can be identified.
6. Disadvantages of Existing Systems
Watermarks can be very useful in some cases, but again, involve
some modification of the original data.
Furthermore, watermarks can sometimes be destroyed if the data
recipient is malicious. E.g. A hospital may give patient records to
researchers who will devise new treatments.
Similarly, a company may have partnerships with other
companies that require sharing customer data.
Another enterprise may outsource its data processing, so data
must be given to various other companies.
We call the owner of the data the distributor and the supposedly
trusted third parties the agents.
7. PROPOSED SYSTEM
Our goal is to detect when the distributor's sensitive data has
been leaked by agents, and if possible to identify the agent that
leaked the data.
Perturbation is a very useful technique where the data is
modified and made "less sensitive" before being handed to agents.
We develop unobtrusive techniques for detecting leakage of a set
of objects or records.
We develop a model for assessing the "guilt" of agents.
We also present algorithms for distributing objects to agents, in
a way that improves our chances of identifying a leaker.
8. Types of employees that put our company at
risk
The security illiterate
The unlawful residents
The malicious/disgruntled employees
9. IMPACT ON ORGANIZATIONS
Financial & reputational loss
Small leaks accumulate to big loss
Loss of customer & employee private information
Loss of competitive position
Lawsuits or regulatory consequences
10. MODULES
Admin Module
Administrator has to logon to the system.
Admin can add/view/delete/edit the user details.
User Module
A user must login to use the services.
A user can accept/reject data sharing requests from other users.
11. DATA LOSS PREVENTION
To protect against confidential data theft and loss, a multi-layered security
foundation is needed
Control/limit access to the data –firewalls, remote access controls, network
access controls, physical security controls
Secure information from threats –protect perimeter and endpoints from
malware, botnets, viruses, DoS, etc. with security technology
Control use of sensitive data once access is granted –policy-based content
inspection, acceptable use, encryption
Cisco’s Solution for Data Loss Prevention
Build a secure foundation with a Self-Defending Network
Integrate DLP controls into security devices to protect data and increase
visibility.
12. CONCLUSION
In the real scenario there is no need to hand over the sensitive data to
the agents who will unknowingly or maliciously leak it.
Though the leakers are identified using the traditional technique of
watermarking, certain data cannot admit watermarks.
In spite of these difficulties, it is possible to assess the likelihood that
an agent is responsible for a leak, based on the overlap of his data
with the leaked data