Incentive Compatible Privacy Preserving Data Analysis


Published on

Now a days, data management applications have evolved from pure storage and retrieval of information to finding interesting patterns and associations from large amounts of data. With the advancement of Internet and networking technologies, more and more computing applications, including data mining programs, are required to be conducted among multiple data sources that scattered around different spots, and to jointly conduct the computation to reach a common result. However, due to legal constraints and competition edges, privacy issues arise in the area of distributed data mining, thus leading to the interests from research community of both data mining.

In this project each party participates in a protocol to learn the output of some function f over the joint inputs of the parties. We mainly focus on the DNCC model instead of considering a probabilistic extension. Deterministic Non Cooperative Computation needs to be extended to include the possibility of collusion.

Published in: Career, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • In the above diagram contain actors like parties, NCC and Data mining model reaming models are use case like inputs, function over join the inputs. A party sends inputs to the NCC actor. This actor assign this work to TTP, it compute all the input parties and again send to function every parties. After then party sends some request related input in that time this model assign this work to data mining.
  • In the above diagram contains classes like parties, NCC model, and data mining and competitive model. Data mining class is maintained to store the information of parties either vertical portion or horizontal portion. After choose party store their data in mining all parties send their input data to the NCC class, this model assign this work to the TTP. It computes all the input information and send back to the parties.
  • It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. In the sequence e diagram after the login of both party and data analysis first step is data mining identify the party’s nodes after identifying the data analysis nodes next step is splitting the work. After the splitting the work next step is work allocation. Then next step is sending the inputs to the TTP nodes. Then we are computing the work finally finished work is reposted to data mining.
  • Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency. An activity diagram shows the overall flow of control. Above diagram tells about the activity processed in the party node. First dote is represented the starting point tin that we are starting the party node. It will go to the validation process if it is no valid it moves to the login page and stop. If its valid means the activity of data analysis is continuing again data mining activity is related to the identifying the requested inputs and distribution.
  • Incentive Compatible Privacy Preserving Data Analysis

    1. 1. Incentive Compatible PrivacyPreserving Data Analysis M.V.Rupa Sri 310204120033
    2. 2. ABSTRACT • In many cases, competing parties who have private data may collaboratively conduct privacy-preserving distributed data analysis (PPDA) tasks to learn beneficial data models or analysis results. The field of privacy has seen rapid advances in recent years because of the increases in the ability to store data. In particular, recent advances in the data mining field have lead to increased concerns about privacy. • It is often highly valuable for organizations to have their data analyzed by external agents. However, any program that computes on potentially sensitive data risks leaking information through its output. Differential privacy provides a theoretical framework for processing data while protecting the privacy of individual records in a dataset.
    3. 3. EXISTING SYSTEM • SECURE MULTIPARTY COMPUTATION • Definition: In existing, we generally assume that participating parties provide truthful inputs. This assumption is usually justified by the fact that learning the correct data analysis models or results is in the best interest of all participating parties. If any party does not want to learn data models and analysis results, the party should not participate in the protocol.
    4. 4. PROPOSED SYSTEM • The term incentive compatible means that participating parties have the incentive or motivation to provide their actual inputs when they compute functionality. Although SMC-based privacy-preserving data analysis protocols (under the malicious adversary model) can prevent participating parties from modifying their inputs once the protocols are initiated, they cannot prevent the parties from modifying their inputs before the execution. On the other hand, parties are expected to provide their true inputs to correctly evaluate a function that satisfies the NCC model. Therefore, any functionality that satisfies the NCC model is inherently incentive compatible under the assumption that participating parties prefers to learn the function result correctly and if possible exclusively. Now the question is which functionalities or data analysis tasks satisfy the NCC model.
    5. 5. ADVANTAGES IN PROPOSED SYSTEM • Each of these deals with the problem of ensuring truthfulness in data mining. However, each one requires the ability to verify the data after the calculation. • Although verification based techniques are very useful, there are cases where verification is not feasible due to legal, social and privacy concerns.
    6. 6. MODULES • • • • • User Interface Design Create Multiple Organizations Data Analysis and Integration Inputs computation model Association Data Mining
    7. 7. Module Description • USER INTERFACE DESIGN: • In this module we create a user page using Graphical User Interface(GUI), which will be the media to Connect user with the server and through which client can able to give request to the server and server can send the response to the client, through this module we can establish the communication between client and server using webpage. • A program interface that takes advantage of the computer's graphics capabilities to make the program easier to use. Well-designed graphical user interfaces can free the user from learning complex command languages. On the other hand, many users find that they work more effectively with a command-driven interface, especially if they already know the command language. Its goal is to enhance the efficiency and ease of use for the underlying logical design of a stored program. Thus the user interacts with information by manipulating visual widgets that allow for interactions appropriate to the kind of data they hold. The widgets of a well-designed interface are selected to support the actions necessary to achieve the goals of the user.
    8. 8. Module Description(contd..) • CREATE MULTIPLE ORGANIZATIONS: This is second module of our project. Here we are design no. of parties. Each and every party may have information to store their database. All the parties may send their inputs to Data Analysis module. Here all n no. of parties will send their inputs to single data analysis . The data analysis will store their inputs either horizontal or vertical partitions. In this module we can create no. of parties. Each and every party may nave own data base it can store their information either vertical portion or horizontal portion.
    9. 9. Module Description(contd..) • DATA ANALYSIS AND INTEGRATION: This is the third module of our project. Our Data Analysis designed using cryptographic techniques. Data are generally assumed to be either vertically or horizontally partitioned. In the case of horizontally partitioned data, different sites collect the same set of information about different entities. In the case of vertically partitioned data, we assume that different sites collect information about the same set of entities. A party can store their input data either vertical partition or horizontal partitioned. If parties choose horizontal partition then the input data for many different individuals. Same way if parties choose horizontal partition then the input data for many different individuals.
    10. 10. Module Description(contd..) • Inputs computation model • This is fourth module of our project. This model to design for compute all the truthful inputs of all participating parties here going to assumptions like the first priority for every participating party is to learn the correct result. Another one is, if possible, every participating party prefers to learn the correct result exclusively.
    11. 11. Module Description(contd..) • ASSOCIATION DATA MINING • This is last module of our project. Our data mining is summarize the association rule mining and analyze whether the association rule mining can be done in an incentive compatible manner over horizontally or vertically partitioned database. If get in the requested query then it search where it is located either horizontal partition or vertical partition retrieve the result from partition after that result send to particular party.
    12. 12. TECHNIQUE USED ASSOCIATION RULEMINING ALGORITHM The above definition simply states what function could be computed in NCC setting deterministically (i.e., computation result is correct with probability one), and no party could correctly compute the correct result once the party lies about his or her inputs in a way that changes the original function result. In other words, if a party i replaces its true input vi with v_ i and if f(v_ i, v−i) _= f(vi, v−i), then party i should not be able to calculate the correct f(vi, v−i) from f(v_ i, v−i). And vi. Note that strategy (ti, gi) means that the way the input is modified, denoted by ti, and the way the output is calculated, denoted by gi. In ti can be considered as choosing a value different from the actual input, and gi can be considered as the ways the correct μ and s2 are computed. Another implication of the above definition is that for any ti, the corresponding gi should be deterministic, because each party want to exactly compute the “correct” result.
    13. 13. • A two-party protocol is proposed to securely compute JC. The protocol consists of two stages
    14. 14. SYSTEM ARCHITECTURE User login DB Validate NCC Model Parties TTP Data analysis Vertical portion Horizontal potion Rule mining
    15. 15. System Architecture Description • In above diagram contains client Login, Database, Work Allocation, Worker Page, Computing, Reposting, and Work Grouping. First computation node will start running. After party node enter user name and password that is validated by compatible node. Then computation node assigns the work to the data mining nodes. Data mining node finishes his work and reposted to the compatible node. TTP collects the inputs of parties and group of parties input for particular work presented by party nodes.
    16. 16. USE CASE DIAGRAM party1 private inputs TTP input compute the input data party2 function over join the inputs NCC model vertical portion party3 horizantal portion Data mining
    17. 17. CLASS DIAGRAM
    18. 18. SEQUENCE DIAGRAM parties data analysis NCC Model Rule mining to store data either v ertical or horizantal sending the inputs all the inputs are compute diff inputs og parties stored sending requested data to NCC response
    19. 19. ACTIVITY DIAGRAM parties Data Ming vertical portion horizantal portion NCC Model
    20. 20. LOGIN FORM
    21. 21. Organization Login
    23. 23. Participating parties:
    24. 24. Data Sharing:
    25. 25. Conclusion • Even though privacy-preserving data analysis techniques guarantee that nothing other than the final result is disclosed, whether or not participating parties provide truthful input data cannot be verified. In this paper, we have investigated what kinds of PPDA tasks are incentives compatible under the NCC model. Based on our findings, there are several important PPDA tasks that are incentive driven. Table II classifies the common data analysis tasks studied in this paper into DNCC or Non-DNCC categories. Most often, data partition schemes can make a difference in determining DNCC or Non-DNCC classifications.