Data leakage detection

Submitted by:
SUVEEKSHA JAIN
Mtech I Sem
SJEC

 Data leakage is the unauthorized transmission of
sensitive data or information from within an
organization to an external destination or recipient.
 Sensitive data of companies and organization
includes
 intellectual property,
 financial information ,
 patient information,
 personal credit card data ,
and other information depending upon the business and the
industry.

 In the course of doing business, sometimes data must
be handed over to trusted third parties for some
enhancement or operations.
 Sometimes these trusted third parties may act as
points of data leakage.
 Example:
a) A hospital may give patient records to researcher who
will devise new treatments.
b) A company may have partnership with other
companies that require sharing of customer data.
c) An enterprise may outsource its data processing , so
data must be given to various other companies.

Development chains
Supply chains Outsourcing
Business hubs
Demand chains

Sept. 2011 Science Applications
International Corp
Backup tapes stolen from a car
containing 5,117,799 patients’
names, phone numbers ,Social
Security numbers, and medical
information.
July 2008 Google Data were stolen, not from
Google offices, but from the
headquarters of an HR
outsourcing company ,Colt
Express.
July 2009 American Express DBA stole a laptop containing
thousands of American Express
card numbers. The DBA
reported it stolen
Aug. 2007 Nuclear Laboratory
in Los Alamos
An employee of the U.S. nuclear
laboratory in Los Alamos
transmitted confidential
information by email.
Data leakage incidents

 Owner of data is termed as the distributor
and the third parties are called as the
agents .
 In case of data leakage, the distributor must
assess the likelihood that the leaked data
came from one or more agents, as opposed
to having been independently gathered by
other means.

Watermarking
Overview:
A unique code is embedded in each distributed
copy. If that copy is later discovered in the hands of an
unauthorized party, the leaker can be identified.
Mechanism:
The main idea is to generate a watermark [W(x; y)]
using a secret key chosen by the sender such that W(x;
y) is indistinguishable from random noise for any
entity that does not know the key (i.e., the recipients).

 The sender adds the watermark W(x; y) to the
information object I(x; y) and thus forms a transformed
object TI(x; y) before sharing it with the recipient(s).
 It is then hard for any recipient to guess the
watermark W(x; y) (and subtract it from the
transformed object TI(x; y));
 The sender on the other hand can easily extract and
verify a watermark (because it knows the key).

 It involves some modification of data that is making
the data less sensitive by altering attributes of the data.
 The second problem is that these watermarks can be
sometimes destroyed if the recipient is malicious.

 Using data allocation strategies, the distributer give
data to agents in order to improve the chances of
detecting guilty agents.
 Fake object is added to identify the guilty party.
 Distributer will be more confident when data leaked
by agents and they may stop doing business with him.

ARCHITECTURAL VIEW OF
THE SYSTEM

Thus we need a data leakage detection technique which fulfils
the following objective and abides by the given constraint.
CONSTRAINT
To satisfy agent requests by providing them with the number
of objects they request or with all available objects that satisfy their
conditions.
Avoid perturbation of original data before handing it to agents
OBJECTIVE
To be able to detect an agent who leaks any portion of his
data.

Different WaterMarking system:
 Embedding and extraction
 Secure speed spectrum Watermarking
 DCT-Based Watermarking
 Speed spectrum
 Wavelet based Watermarking
 Robust watermarking technique
 Invisible watermarking
 Watermarking of digital audio and image using Matlab
 Watermarking while preserving the critical path
 Buyer seller watermarking protocols
 Watermarking using cellular automata transform

 Data Allocation Module
 Fake Object Module
 Data Distributor Module
 Agent guilt Module

 Data Allocation: The main focus of our project is the
data allocation problem as how can the distributor
“intelligently” give data to agents in order to improve
the chances of detecting a guilty agent.
 Fake Object: Fake objects are objects generated by the
distributor in order to increase the chances of detecting
agents that leak data. The distributor may be able to add
fake objects to the distributed data in order to improve
his effectiveness in Detecting guilty agents. Our use of
fake objects is inspired by the use of “trace” records in
mailing lists.

 Data Distributor: A data distributor has given sensitive data to
a set of Supposedly trusted agents (third parties). Some of the
data is leaked and found in an unauthorized place (e.g., on the
web or somebody’s laptop). The distributor must assess the
likelihood that the leaked data came from one or more agents,
as opposed to having been independently gathered by other
means.
 Agent guilt: To compute prfgiijsg,we need an estimate for
the probability that values in S can be “guessed”by target.

 Cloud is large group of interconnected computers.
Any authorized user can access these apps from any
computer over internet.
Key properties of cloud computing:
 User centric
 Task centric
 Powerful
 Accessible
 Intelligent
 programmable

 Right protection is provided for relational data
 Watermarking technique for multimedia data
 Achieving K-Anonymity Privacy Protection
 Watermarking the relational databses
 Lineage tracing general data warehouse
transformations

 The presented strategies assume that there is a fixed
set of agents with requests known in advance.
 The distributor may have a limit on the number of fake
objects.

 Cloud computing technology enables data to be stored
in the cloud and enables users both inside and outside
the company to access the same data which increases
the usefulness of data

 It helps in detecting whether the distributor’s sensitive
data has been leaked by the trustworthy or authorized
agents.
 It helps to identify the agents who leaked the data.
 Reduces cybercrime.

 Though the leakers are identified using the traditional
technique of watermarking, certain data cannot admit
watermarks.
 In spite of these difficulties, it is possible to assess the
likelihood that an agent is responsible for a leak.
 We observed that distributing data judiciously can make a
significant difference in identifying guilty agents using the
different data allocation strategies.

Data leakage detection

More Related Content

What's hot

Viewers also liked

Similar to Data leakage detection

Recently uploaded

Data leakage detection