  ROBUST DATA LEAKAGE AND EMAIL FILTERING SYSTEMOBJECTIVE: The main objective of this project is to identify the guilty agents who areleaking the sensitive information or data by using explicit data request and sampledata request algorithm and also Email Filtering is done by using K-nearestneighbor algorithm.PROBLEM DIFINITION:  Data leakage detection is being handled by the technique of watermarking.  If the watermarking technique is used then the code which is embedded in the data can be modified because of which it becomes very difficult to identify the guilt agent or leaker.  Data Allocation Problem.  Data allocation depends on the request done by the agent and whether system can add fake object to it.ABSTRACT: Data leakage can be elaborated as in whena data distributor has givensensitive data to a set ofsupposedly trusted agents and some of the data isleakedand found in an unauthorized place. Anenterprise data leak is a scary proposition.Securitypractitioners have always had to deal with dataleakage issues that arisefrom various ways like email,1M and other Internet channels. In case ofdata
  leakage from trusted agents, the distributormust assess the likelihood that theleaked datacame from one or more agents. This can be doneby using a system which can identify those partieswho areguilty for such leakage even when data isaltered. For this the system can use dataallocationstrategies or can also inject "realistic but fake" datarecords to improveidentification of leakage.Moreover, data can also be leaked from withinanorganization through e-mails. Hence, there isalso a need to filter these e-mails.Thiscan be doneby blocking e-mails which contains images, videos orsensitive data foran organization. Principle used ine- mail filtering is we classify e-mail basedthefingerprints of message bodies, the white andblack lists of email addresses and thewords specificto spam.EXISTING SYSTEM: In Existing system, watermarking technique is used to identify the guiltyagents who are leaking the sensitive information or data.  Watermarking is one of the old techniques which contain a unique code. This unique code is embedded in each copy which is then distributed to the clients by the user.  If any particularclient leaks the given data to the third parties i.e. unauthorized users, then this leaked data and the leaker can be identified by the means of this watermarking technique.
  DISADVANTAGES:  However, in some cases it is important not to alter the original distributor's data. Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy.  If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified.  Watermarks can be very useful in some cases, but again, involve some modification of the original data.  Watermarks can sometimes be destroyed if the data recipient is malicious.PROPOSED SYSTEM: Themain aim of the proposed system is to find out when andwho has leakedthe sensitive data.In the proposed system; it is going to implement theconcept of"Fake Objects". Now if suppose the directorof the company wants to share somesensitive data(records) with clients of his company but he does notwant his data tobe leaked anywhere in between. Sobefore sending the sensitive data to the clientswhat theproposed system will do is it will add fake objects(record) in databasewhich will exactly look like originaldata. The Client will be unaware of these fakeobjects.Only the director of company knows that where andhow many fake objectsare inserted.The system is going to use different modules for addingfake objectsand for detecting fake objects.
  In worst case if agent modifies the database (deleted somerecords) and thenleaks the data then also system willmake sure that fake object will be there todetect the guilty agent.Also implementation concept of email filtering moduleinwhich if agent tries to e-mail the sensitive data thenfirst the request will go to theserver and then it willcheck for fake objects in that data, and if that data issensitivedata then that e-mail will automaticallydropped. Client will not be able to send thee-mail.ADVANTAGES:  If the distributor sees "enough evidence" that an agent leaked data, he may stop doing business with him, or may initiate legal proceedings.  In this project we develop a model for assessing the "guilt" of agents.  We also present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker.  Finally, we also consider the option of adding "fake" objects to the distributed set. Such objects do not correspond to real entities but appear.  If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty.  Agent will not be able to send sensitive data through e-mail.
  ALGORITHM USED: 1. Explicit Data Request 2. Sample Data Request 3. A-Priori Algorithm ARCHITECTURE DIAGRAM:Client 1 Client 2 Client 3 Client n E-Mail Data Leakage Detection Filtering System Module Non-sensitive Sensitive data data E-Mail Firewall E-Mail Sent Discarded Successfully Internet
  SYSTEM REQUIREMENTS: Hardware Requirements:  Intel Pentium iv  256/512 MB RAM  1 GB free disk space or greater  1 network interface card (NIC) Software Requirements:  MS Windows XP  MS IE Browser 6.0/later  MS Dot Net Framework 4.0  MS Visual Studio.NET 2010  Internet Information Server (IIS)  MS SQL Server 2005  Language :ASP.Net(C#.NET)APPLICATIONS: 1. Consultancy 2. Real Estate 3. Banks
