Data leakage detection

DATA LEAKAGE DETECTION
DEPARTMENT OF COMPUTER SCIENCE
BY
Anshika Singh
Guided By:
Mr V K Shukla

INTRODUCTION
 Data leakage is defined as the accidental or unintentional
distribution of private or sensitive data to an unauthorized entity
.
 Data leakage poses a serious issue for companies as the number
of incidents and the cost to those experiencing them continue to
increase.
 Data leakage is enhanced by the fact that transmitted data
including emails, instant messaging, website forms, and file
transfers among others, are largely unregulated and
unmonitored on their way to their destinations.

The main scope of this module is provide complete information about the
data/content that is accessed by the users within the website.
 Forms Authentication technique is used to provide security to the website in
order to prevent the leakage of the data.
 Continuous observation is made automatically and the information is send to the
administrator so that he can identify whenever the data is leaked.
 Above all the important aspect providing proof against the Guilty Objects. The
following techniques are used.
 Fake Object Generation.
 Watermarking.
DATA LEAKAGE DETECTION

Sept. 2011 Science Applications
International Corp
Backup tapes stolen from a car
containing 5,117,799 patients’ names,
phone numbers ,Social Security
numbers, and medical
information.
July 2008 Google Data were stolen, not from Google
offices, but from the headquarters of
an HR outsourcing company ,Colt
Express.
July 2009 American Express DBA stole a laptop containing
thousands of American Express card
numbers. The DBA reported it stolen
Aug. 2007 Nuclear Laboratory
in Los Alamos
An employee of the U.S. nuclear
laboratory in Los Alamos transmitted
confidential information by email.
Data leakage incidents

EXISTING SYSTEM
DATA LEAKAGE DETECTION IS HANDLED BY WATERMARKING
 Watermark is a unique code is embedded in each distributed copy. If
that copy is later discovered in the hands of an unauthorized party, the
leaker can be identified.
 Watermarks can be very useful in some cases, but again, involve some
modification of the original data. Furthermore, watermarks can
sometimes be destroyed if the data recipient is malicious.
Hence this technique proves to be inefficient.
 Example :- A company may have partnerships with other companies
that require sharing customer data. Another enterprise may outsource
its data processing, so data must be given to various other companies

PROPOSED SYSTEM
ADDITION OF FAKE OBJECTS
 The distributor may be able to add fake objects to the
distributed data in order to improve his effectiveness in
detecting guilty agents.
 Fake objects are objects generated by the distributor that
are not in the original set.
 The objects are designed which appear realistic, and are
distributed among the agents along with the original
objects.
 Different fake objects may be added to the data
sets of different agents in order to increase the chances of
detecting agents that leak data.

 Distributor:
The distributor is the main owner of the data.
 Agents:
These are supposedly trusted third parties who can make
requests for data to the distributor.
 Guilty Agent:
The agent who leaks the sensitive data of the distributor
to unauthorised party.
 Target:
The unauthorised party who receives the distributor’s
sensitive data leaked by the guilty agent
The distributor can send data to these agents by inserting
different fake objects into the data sets of different agents.
Now, suppose the distributor discovers his sensitive data
at an unauthorised party.

 Database Maintenance:
The sensitive data which is to be handed over to the agents is stored in the database.
 Agent Maintenance:
The registration detail about the agents as well as the data which is given to them by the distributor is
maintained.
 Addition of Fake Objects:
The distributor is able to add fake objects in order to improve the effectiveness in detecting the guilty agent.
 Data Allocation:
In this module, the original records fetched according to the agent’s request are combined with the fake
records generated by the administrator.
 Calculation Of Probability:
In this module, the request of every agent is evaluated and probability of each agent being guilty is
calculated.
MODULE DESCRIPTION

ARCHITECTURAL VIEW OF THE SYSTEM

Hardware Required:
 System : Pentium IV 2.4 GHz
 Hard Disk : 40 GB
 Floppy Drive : 1.44 MB
 Monitor : 15 VGA colour
 Mouse : Logitech.
 Keyboard : 110 keys enhanced.
 RAM : 256 MB
Software Required:
O/S :Windows XP.
Language :Asp.Net , c#.
Data Base :Sql Server 2005
IDE : Visual Studio 2008

Data Loss Prevention (DLP)
 DLP: Security measures to
protect confidential and
private data
 in-use
 in-motion
 at-rest
 From both intentional and
accidental loss of data

• Data-In-Motion
 Email
 Network access
 Wireless
• Data-at-Rest
 Portable/Removable media (USB)
 Authorized abuse
• Data-In-Use
 IM
 File share
 Web uploads
• Compliance Regulations
 Customer credit card information
 Medical Information
 Financial Information
DLP SOLUTIONS –FOUR FOCUS AREAS

 To protect against confidential data theft and loss, a multi-layered security
foundation is needed
Control/limit access to the data –firewalls, remote access controls, network
access controls, physical security controls
Secure information from threats –protect perimeter and endpoints from
malware, botnets, viruses, DoS, etc. with security technology
 Control use of sensitive data once access is granted –policy-based content
inspection, acceptable use, encryption
 Cisco’s Solution for Data Loss Prevention
Build a secure foundation with a Self-Defending Network
Integrate DLP controls into security devices to protect data and increase visibility
while decreasing the complexity and total cost of ownership of DLP deployments
DATA LOSS PREVENTION
An Integrated Approach to Data Loss Prevention through Security

 In the real scenario there is no need to hand over the sensitive data to the
agents who will unknowingly or maliciously leak it.
 However, in many cases, we must indeed work with agents that may not be
100 percent trusted, and we may not be certain if a leaked object came from an
agent or from some other source.
 In spite of these difficulties, it is possible to assess the likelihood that an agent
is responsible for a leak, based on the overlap of his data with the leaked data .
 The algorithms we have presented implement a variety of data distribution
strategies that can improve the distributor’s chances of identifying a leaker.
CONCLUSION

Data leakage detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data leakage detection

Similar to Data leakage detection (20)

Recently uploaded

Recently uploaded (20)

Data leakage detection