547 551


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

547 551

  1. 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 EVALUATE FAKE OBJECTS TO PREDICTS UNAUTHENTICATED DEVELOPERSA. Santhi Lakshmi Prof. C. Rajendra,Roll no: 10G21D2502 Head, Dept. Of CSE,M.Tech (SE),ASCET, ASCET,Gudur,Gudur,Andhra Pradesh, India. Andhra Pradesh, India,Mail: santhilakshmi14@gmail.com Mail:hod.cse@audisankara.comPh: 09493357739. Ph: 09248748404. Abstract: Modern business activities rely on Leakage detection is handled by watermarking, e.g., aextensive email exchange. Email leakages have become unique code is embedded in each distributed copy. Ifwidespread, and the severe damage caused by suchleakages constitutes a disturbing problem for that copy is later discovered in the hands of anorganizations. In this paper we studied the following unauthorized party, the leaker can be identified. Butproblem: In the course of doing business, sometimes the again it requires code modification. Watermarks candata distributor will give sensitive data to trusted third sometimes be destroyed if the data recipient isparties. Some of the data is leaked and found in anunauthorized place. The distributor cannot blame the malicious. Our goal is to detect when the distributor’sagent without any evidence. Current approaches can sensitive data has been leaked by agents, and ifdetect the hackers but the total number of evidence will possible to identify the agent that leaked the data.be less and the organization may not be able to proceedlegally for further proceedings. In this work, we Perturbation is a very useful technique where the dataimplement and analyze a guilt model that detects the is modified and made “less sensitive” before beingagents using allocation strategies without modifying the handed to agents. We develop unobtrusive techniquesoriginal data. Our work identifies the agent who leaked for detecting leakage of a set of objects or records. Inthe data with enough evidence. The main focus of thispaper is to distribute the sensitive data “intelligently” to this section we develop a model for assessing theagents in order to improve the chances of detecting a “guilt” of agents. We also present algorithms forguilty agent with strong evidence. The objective of our distributing objects to agents, in a way that improveswork is to improve the probability of identifyingleakages using Data allocation strategies across the our chances of identifying a leaker. Finally, we alsoagents and also to identify the guilty party who leaked consider the option of adding “fake” objects to thethe data by injecting “realistic but fake” data records. distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense,Keywords: Allocation strategies, sensitive data, data the fake objects acts as a type of watermark for theleakage, fake records, third parties. entire set, without modifying any individual members. 1. INTRODUCTION If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty In the course of doing business, sometimessensitive data must be handed over to supposedly 2 PROBLEM SETUP AND NOTATIONtrusted third parties. For example, a hospital may givepatient records to Researchers who will devise new Entities and Agentstreatments. Similarly, a company may havepartnerships with other companies that require sharing A distributor owns a set T={t1,....,tm} of valuable datacustomer data. Another enterprise may outsource its objects. The distributor wants to share some of thedata processing, so data must be given to various objects with a set of agents U1,U2,...Un. but does notother companies. There always remains a risk of data wish the objects be leaked to other third parties. The objects in T could be of any type and size, e.g., theygetting leaked from the agent. Perturbation is a very could be tuples in a relation, or relations in a database.useful technique where the data are modified andmade “less sensitive” before being handed to agents.For example, one can add random noise to certain An agent Ui receives a subset of objects 𝑅 𝑖 ⊆ 𝑇,attributes, or one can replace exact values by ranges. determined either by a sample request or an explicitBut this technique requires modification of data. request:  Sample request 𝑅 𝑖 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑇, 𝑚 𝑖 . : 547 All Rights Reserved © 2012 IJARCET
  2. 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 Any subset of mi records from T can be 4. DATA ALLOCATION PROBLEM given to Ui.  Explicit request The main focus of this paper is the data allocation 𝑅 𝑖 = 𝐸𝑋𝑃𝐿𝐼𝐶𝐼𝑇 𝑇, 𝑐𝑜𝑛𝑑 𝑖 . Agent problem: how can the distributor “intelligently” give 𝑈 𝑖 receives all T objects that satisfy 𝑐𝑜𝑛𝑑 𝑖 data to agents in order to improve the chances of detecting a guilty agent? As illustrated in Fig. 2, there are four instances of this problem we address,Example. Say that T contains customer records for a depending on the type of data requests made by agents given company A. Company A hires a marketing and whether “fake objects” are allowed. The two agency U1 to do an online survey of customers. types of requests we handle were defined in Section 2: Since any customers will do for the survey, U1 sample and explicit. Fake objects are objects requests a sample of 1,000 customer records. At generated by the distributor that are not in set T . The the same time, company A subcontracts with agent objects are designed to look like real objects, and are U2 to handle billing for all California customers. distributed to agents together with T objects, in order Thus, U2 receives all T records that satisfy the to increase the chances of detecting agents that leak condition “state is California.” data. We discuss fake objects in more detail in Section5.1. Although we do not discuss it here, our model can As shown in Fig. 2, we represent our four problembe easily extended to requests for a sample of objects instances with the names 𝐸𝐹 , 𝐸𝐹, 𝑆𝐹 𝑎𝑛𝑑 𝑆𝐹 where Ethat satisfy a condition (e.g., an agent wants any 100 stands for explicit requests, S for sample requests, FCalifornia customer records). Also note that we do not for the use of fake objects, and 𝐹 for the case where fake objects are not allowed.concern ourselves with the randomness of a sample. 4.1 Fake Objects(We assume that if a random sample is required, there The distributor may be able to add fakeare enough T records so that the to-be-presented objects to the distributed data in order to improve hisobject selection schemes can pick random records effectiveness in detecting guilty agents. However,from T). fake objects may impact the correctness of what agents do, so they may not always be allowable. The idea of perturbing data to detect leakage is not3 RELATED WORK new, e.g., [1]. However, in most cases, individualThe guilt detection approach we present is related to objects are perturbed, e.g., by adding random noise tothe data provenance problem [3]: tracing the lineage sensitive salaries, or adding a watermark to an image.of S objects implies essentially the detection of the In our case, we are perturbing the set of distributorguilty agents. Tutorial [4] provides a good overviewon the research conducted in this field. Suggested objects by adding fake elementssolutions are domain specific, such as lineage tracingfor data ware-houses [5], and assume some priorknowledge on the way a data view is created out ofdata sources. Our problem formulation with objectsand sets is more general and simplifies lineagetracing, since we do not consider any datatransformation from Ri sets to S. As far as the data allocation strategies areconcerned, our work is mostly relevant towatermarking that is used as a means of establishingoriginal ownership of distributed objects. Watermarkswere initially used in images [16], video [8], andaudio data [6] whose digital representation includesconsiderable redundancy. Recently, [1], [17], [10],[7], and other works have also studied marks insertionto relational data. Our approach and watermarking are 4.2 Optimization Problemsimilar in the sense of providing agents with somekind of receiver identifying information. However, by The distributor’s data allocation to agents has oneits very nature, a watermark modifies the item being constraint and one objective. The distributor’swatermarked. If the object to be watermarked cannot constraint is to satisfy agents’ requests, by providingbe modified, then a watermark cannot be inserted. In them with the number of objects they request or withsuch cases, methods that attach watermarks to the all available objects that satisfy their conditions. Hisdistributed data are not applicable. objective is to be able to detect an agent who leaks any portion of his data. 548 All Rights Reserved © 2012 IJARCET
  3. 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 We consider the constraint as strict. The distributor baseline. We present s-random in two parts:may not deny serving an agent request as in [13] and Algorithm 4 is a general allocation algorithm that ismay not provide agents with different perturbed used by other algorithms in this section. In line 6 ofversions of the same objects as in [1]. We consider Algorithm 4, there is a call to functionfake object distribution SELECTOBJECT() whose implementationas the only possible constraint relaxation. differentiates algorithms that rely on Our detection objective is ideal and intractable. Algorithm1 . Algorithm 2 shows functionDetection would be assured only if the distributor SELECTOBJECT()gave no data object to any agent (Mungamuru and for s-random.Garcia-Molina [11] discuss that to attain “perfect”privacy and security, we have to sacrifice utility). We Algorithm 1: Allocation for Sample Data Requestsuse instead the following objective: maximize the (𝑆𝐹 )chances of detecting a guilty agent that leaks all his Input: 𝑚1 , . . .,𝑚 𝑛 , 𝑇 .data objects.  Assuming mi ≤ 𝑇 Output: R1, . . .,Rn5. Allocation Strategies: 1: a  0 𝑇In this section, we describe allocation strategies that  a 𝑘 :number of agents who havesolve exactly or approximately the scalar versions of received object tk(8) for the different instances presented in Fig. 2. We 2: R1  , . . .,Rn   𝑛resort to approximate solutions in cases where it is 3: remaining  𝑖=1 𝑚 𝑖inefficient to solve accurately the optimization 4: while remaining > 0 doproblem. 5: for all i=1, . . . , n : 𝑅𝑖 <mi do We deal with problems with explicit data 6: k SELECTOBJECT 𝑖, 𝑅𝑖 .request, with problems with sample data requests.  May also use additional parameters5.1 Explicit Data Requests 7: Ri  Ri  𝑡𝑘 In problems of class EF , the distributor is 8: a 𝑘 a 𝑘  1not allowed to add fake objects to the distributed data. 9: remaining remaining  1So, the data allocation is fully defined by the agents’data requests. Therefore, there is nothing to optimize. Algorithm 2: Object Selection for s-random In EF problems, objective values are initialized by 1: function SELECTOBJECT 𝑖; 𝑅𝑖agents’ data requests. Say, for example, that 𝑇 = 2: k  select at random an element from 𝑡1 , 𝑡2 and there are two agents with explicit data set 𝑘 ′ 𝑡 𝑘 ′ ∉ 𝑅 𝑖requests such that 𝑅1 = 𝑡1 , 𝑡2 and 𝑅2 = 𝑡1 . The 3: return kvalue of the sum-objective is in this case 2 2 In s-random, we introduce vector a ∈ 𝑁 𝑇 that 1 1 1 shows the object sharing distribution. In particular, 𝑅 𝑖 ∩ 𝑅𝑗 = + = 1.5 𝑅1 2 1 element a 𝑘 shows the number of agents who receive 𝑖=1 𝑗 =1 object tk. Algorithm s-random allocates objects to agents in a5.2 Sample Data Requests round-robin fashion. After the initialization of vectors With sample data requests, each agent Ui may d and a in lines 1 and 2 of Algorithm 4, the main loop 𝑇receive any T subset out of 𝑚 different ones. in lines 4-9 is executed while there are still data 𝑛 𝑇 objects (remaining > 0) to be allocated to agents. InHence, there are 𝑖=1 𝑚 different object allocations.In every allocation, the distributor can permute T each iteration of this loop (lines 5-9), the algorithmobjects and keep the same chances of guilty agent uses function SELECTOBJECT() to find a randomdetection. The reason is that the guilt probability object to allocate to agent Ui. This loop iterates overdepends only on which agents have received the all agents who have not received the number of dataleaked objects and not on the identity of the leaked objects they have requested. 𝑛objects. The running time of the algorithm is 0(𝜏 𝑖=1 𝑚 𝑖 ) and depends on the running time 𝜏 of the object selectionTherefore, from the distributor’s perspective, there are function SELECTOBJECT(). In case of randomdifferent allocations. selection, we can have 𝜏 = 0 1 by keeping in 𝑛 𝑇 memory a set 𝑘 ′ 𝑡 𝑘 ′ ∉ 𝑅 𝑖 for each agent Ui . 𝑖=1 𝑚 Algorithm s-random may yield a 𝑇 poor data allocation.Say, for example, that the5.2.1 Random distributor set T has three objects and there are threeAn object allocation that satisfies requests and ignores agents who request one object each. It is possible thatthe distributor’s objective is to give each agent Ui a s-random provides all three agents with the samerandomly selected subset of T of size mi. We denote object.this algorithm by s-random and we use it as our 549 All Rights Reserved © 2012 IJARCET
  4. 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 20126. Experimental Results: [1] R. Agrawal and J. Kiernan, “Watermarking RelationalWe implemented the presented allocation algorithms Databases,” fact improves, on average, the minin Python and we conducted experiments with values, since the Proc. 28th Int’l Conf. Very Largesimulated Data Bases (VLDB ’02), VLDB Endowment, pp. 155-data leakage problems to evaluate their performance. 166, 2002.We present the evaluation for sample requests andexplicit data requests, respectively. [2] P. Bonatti, S.D.C. di Vimercati, and P. Samarati, “An Algebra for Composing Access Control Policies,” ACM6.1. Explicit Requests: Trans. Information and System Security, vol. 5, no. 1, pp. 1-In the first place, the goal of these experiments was to 35, 2002.see whether fake objects in the distributed data sets [3] P. Buneman, S. Khanna, and W.C. Tan, “Whyyield significant improvement in our chances of and Where: A Characterization of Data Provenance,”detecting a guilty agent. In the second place, wewanted to evaluate our optimal algorithm relative to a Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V.random allocation. den Bussche and V. Vianu, eds., pp. 316-330, Jan. 2001 [4] P. Buneman and W.-C. Tan, “Provenance in6.2 Sample Requests Databases,” Proc. ACM SIGMOD, pp. 1171-1173, 2007.With sample data requests, agents are not interested inparticular objects. Hence, object sharing is not [5] Y. Cui and J. Widom, “Lineage Tracing forexplicitly defined by their requests. The distributor is General Data Warehouse Transformations,” The“forced” to allocate certain objects to multiple agents VLDB J., vol. 12, pp. 41-58, 2003 𝑛only if the number of requested objects 𝑖=1 𝑚 𝑖exceeds the number of objects in set T . The more [6] S. Czerwinski, R. Fromm, and T. Hodes, “Digitaldata objects the agents request in total, the more Music Distribution and Audio Watermarking,”recipients, on average, an object has; and the more http://www.scientificcommons.objects are shared among different agents, the more org/43025658, 2007.difficult it is to detect a guilty agent. Consequently,the parameter that primarily defines the difficulty of a [7] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “Anproblem with sample data requests is the ratio. Improved Algorithm to Watermark Numeric Relational 𝑛 Data,” Information Security Applications, pp. 138-149, 𝑖=1 𝑚𝑖 Springer, 2006. 𝑇 [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and Compressed Video,” Signal Processing,7. Conclusion: vol. 66, no. 3, pp. 283-301, 1998. In a perfect world, there would be no need tohand over sensitive data to agents that may [9] S. Jajodia, P. Samarati, M.L. Sapino, and V.S.unknowingly or maliciously leak it. And even if we Subrahmanian, “Flexible Support for Multiple Accesshad to hand over sensitive data, in a perfect world, we Control Policies,” ACM Trans. Database Systems, vol. 26,could watermark each object so that we could trace its no. 2, pp. 214-260, 2001.origins with absolute certainty. However, in manycases, we must indeed work with agents that may not [10] Y. Li, V. Swarup, and S. Jajodia, “Fingerprintingbe 100 percent trusted, and we may not be certain if a Relational Databases: Schemes and Specialties,” IEEEleaked object came from an agent or from some other Trans. Dependable and Secure Computing, vol. 2, no. 1,source, since certain data cannot admit watermarks. pp. 34-45, Jan.-Mar. 2005. Our future work includes the investigation ofagent guilt models that capture leakage scenarios that [11] B. Mungamuru and H. Garcia-Molina, “Privacy,are not studied in this paper. For example, what is the Preservation and Performance: The 3 P’s of Distributedappropriate model for cases where agents can collude Data Management,” technical report, Stanford Univ., 2008.and identify fake tuples? A preliminary discussion ofsuch a model is available in [14]. Another open [12] V.N. Murty, “Counting the Integer Solutions of aproblem is the extension of our allocation strategies so Linear Equation with Unit Coefficients,” Math. Magazine,that they can handle agent requests in an online vol. 54, no. 2, pp. 79-81, 1981fashion (the presented strategies assume that there is afixed set of agents with requests known in advance). [13] S.U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani, “Towards Robustness in Query Auditing,”REFERENCES: Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB ’06), VLDB Endowment, pp. 151-162, 2006. 550 All Rights Reserved © 2012 IJARCET
  5. 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012[14] P. Papadimitriou and H. Garcia-Molina, “DataLeakage Detection,” technical report, Stanford Univ., 2008[15] P.M. Pardalos and S.A. Vavasis, “QuadraticProgramming with One Negative Eigenvalue Is NP-Hard,” J. Global Optimization[16] J.J.K.O. Ruanaidh, W.J. Dowling, and F.M. Boland,“Watermarking Digital Images for Copyright Protection,”IEE Proc. Vision Signal and Image Processing, vol. 143,no. 4, pp. 250-256, 1996.[17] R. Sion, M. Atallah, and S. Prabhakar, “RightsProtection for Relational Data,” Proc. ACM SIGMOD, pp.98-109, 2003.[18] L. Sweeney, “Achieving K-Anonymity PrivacyProtection Using Generalization and Suppression,”http://en.scientificcommons.org/43196131, 2002 551 All Rights Reserved © 2012 IJARCET