• Save
Towards Privacy-aware OpenSocial Applications
Upcoming SlideShare
Loading in...5
×
 

Towards Privacy-aware OpenSocial Applications

on

  • 2,652 views

Social-networking sites have grown tremendously in popularity in recent years. Services such as Facebook and MySpace allow millions of users to create online profiles and to share details of their ...

Social-networking sites have grown tremendously in popularity in recent years. Services such as Facebook and MySpace allow millions of users to create online profiles and to share details of their personal lives with vast networks of friends, and often, strangers. Inevitably, the disclosure of personal information has implications on users’ privacy: digital stalking and identity theft are some of the most common threats. Unfortunately, even sophisticated users who value privacy will often compromise it to improve their *presence* in the virtual world. They know that loss of control over their personal information poses a long-term threat, but they cannot assess the overall and long-term risk accurately enough to compare it to the short-term gain. Even worse, setting the privacy preferences in online services is often a complicated and time-consuming task that users usually skip. To address these issues, we (IBM Research) are developing mechanisms and platforms to measure and monitor users’ privacy risks and help them easily manage their information sharing. In this talk, I am going to introduce our work in this area, and also discuss how the work can be incorporated with OpenSocial.

Statistics

Views

Total Views
2,652
Views on SlideShare
2,563
Embed Views
89

Actions

Likes
4
Downloads
0
Comments
0

6 Embeds 89

http://www.securitytube.net 66
http://securitytube.net 13
http://www.slideshare.net 6
http://jisi.dreamblog.jp 2
http://connectsandbox.oraclecorp.com 1
http://connect.oraclecorp.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good afternoon, folks. My name is … It is my great pleasure to visit Google and introduce to you our work on privacy and risk management on social networks. As you know, social-networking sites have grown tremendously in popularity in recent years. Services such as Facebook and MySpace allow millions of users to create online profiles and to share details of their personal lives with vast networks of friends, and often, strangers. Inevitably, the disclosure and sharing of personal information have caused many privacy concerns. Open a news paper or a web browser and you are certain to encounter a spate of stories about the misuse or loss of data and how it puts personal information at risk. To address these issues, we are developing mechanisms and platforms to measure and monitor users’ privacy risks and help them easily manage their information sharing. In this talk, I am going to introduce our work in this area, and also discuss how the work can be incorporated with OpenSocial. =======================================Social-networking sites have grown tremendously in popularity in recent years. Services such as Facebook and MySpace allow millions of users to create online profiles and to share details of their personal lives with vast networks of friends, and often, strangers. Inevitably, the disclosure of personal information has implications on users’ privacy: digital stalking and identity theft are some of the most common threats. Unfortunately, even sophisticated users who value privacy will often compromise it to improve their *presence* in the virtual world. They know that loss of control over their personal information poses a long-term threat, but they cannot assess the overall and long-term risk accurately enough to compare it to the short-term gain. Even worse, setting the privacy preferences in online services is often a complicated and time-consuming task that users usually skip. To address these issues, we are developing mechanisms and platforms to measure and monitor users’ privacy risks and help them easily manage their information sharing. In this talk, I am going to introduce our work in this area, and also discuss how the work can be incorporated with OpenSocial.
  • Before I start, I’d like to introduce the team ...these are the great people I’ve been working with in this projectTy – my manager, who oversee this project K – this is me Max – who leads the architecture and system part of the project Evi – deep expertise in algorithm and data mining We also have three engineers from SVL who helped to covert many ideas from concepts to practice.
  • So how about sth like a privacy score? It indicates the potential privacy risks of you as a social-networking user. It tells u what sensitive info u have shared, and who can view that info. It guides u towards a better privacy configuration that makes ur online environment safer and more comfortable.As I am giving this talk, I hope that you think about this question. Do you want to create a privacy score in open social that will have the same impact as the credit score?
  • Next I am gonna discuss why you should want to do this, and how to do this if u really want to do this. First, I will elaborate on the motivation and goal. Then I will describe the theory behind privacy score computation and its applications. After that, I will discuss how this can be integrated with opensocial. Finally, as a proof of concept, Max and I will demonstrate a Facebook application we have developed – called PaMP. This application has adopted many of the privacy concepts that I will be covering today.
  • Social-networking sites have grown tremendously in popularity in recent years. Services such as Facebook and MySpace allow millions of users to create online profiles and to share details of their personal lives with vast networks of friends, and often, strangers. Inevitably, the disclosure of personal information has implications on users’ privacy: digital stalking and identity theft are some of the most common threats. =======================================Although all major online social networks provide privacy-enhancing functionalities (explain …), the majority of users typically accept the default settings (which usually means that they let their information open to the public), and do not revisit their options until damage is done [12]. This is either due to the poor user-interface design or the common belief that sharing personal information online is more cool than harmful. Users are overwhelmed by these settings, or simply ignore these settings. The consequences of sharing the current level of information is either unknown, under-estimated or ignored.The current set of popular social networks provide an ever-increasing set of privacy controls. The number of online social networks is also increasing.Many users belong to more than one social network.Likely Conclusion: The explosion in privacy settings/controls in each network, in the number of networks, the lack of awareness about the effects of these settings and even graver public privacy breaches, will herald in the end of the social network as we know it.
  • Unfortunately, even sophisticated users who value privacy will often compromise it to improve their digital *presence* in the virtual world. They know that loss of control over their personal information poses a long-term threat, but they cannot assess the overall and long-term risk accurately enough to compare it to the short-term gain. Even worse, setting the privacy controls in online services is often a complicated and time-consuming task that many users feel confused about and usually skip.
  • To address these issues, we are developing mechanisms and platforms to measure and monitor users’ privacy risks. Our goal is to boost public awareness of privacy and help users to easily manage their information sharing.It is very important to note that we are not trying to prevent people from sharing information online. We believe that simple and effective privacy and risk management techniques can make the online environment safer and more comfortable, which eventually facilitates the sharing, flow and integration of information. To achieve this goal, we developed the notion of privacy score, which indicates the potential privacy risks of online social networking users. With this score, there could be many applications …For example, similar to your credit report, we can provide a information sharing report to help user understand what sensitive info he has shared and who can view that info. The user can compare his score with other users in the network and see where he stands. In the case where overall privacy risk is lower than this user, the system may recommend a better privacy setting for this user automatically. I am sure you can think of other applications as is the case we do with credit score.
  • This is the life cycle of a privacy score. The system takes as inputs the current privacy settings of the users profile items and calculates the privacy score. The score is delivered to the user as a privacy-o-meter. With this score, the users can monitor his privacy take a more active role in safe-gurading his information if necessary. The user can also compare his privacy risk score with the rest of the population to know where he stands. In the case where the overall privacy risks of a user’s social graph are lower than that of the user, the system can recommend the user stronger privacy settings based on the information from his social neighbors.Privacy settings for a user profile control what subset of profile items are accessible by whom. For example, friends have full access, but strangers have restricted access to a user profile.
  • There could be many different ways to compute privacy score. Here is our way. It exhibits several advantages which I will discuss later. From the technical point of view, our definition of privacy score satisfies the following properties: 1) the more sensitive information a user reveals, the higher his privacy risk; and 2) the more visible the disclosed the information becomes in the network, the higher his privacy risk.Next I am going to show how to combine these two factors in the calculation of the privacy risk score.
  • To calculate a user’s privacy score, we need to first look at his profile items. These items include but are not limited to user’s real name, email, hometown, mobile-phone number, relationship status, sexual orientation, IM screen name, etc.The contribution of a single profile item i in the overall privacy score of a user j is a function of the sensitivity of the item and the visibility it gets.Note that x can be an arbitrary combination function as long as PR(i,j) is monotonically increasing with beta_i and V(i,j).
  • To compute the overall privacy score of user j, denoted by Pr (j), we can simply combine the privacy score of this user due to all different profile items.
  • How to compute the sensitivity and visibility? Here I am going to present two different ways. For simplicity, let’s consider a dichotomous case, where for a user and a profile item, the user will either share it with the public or not at all. Now we can represent the information sharing of all users and all profile items using a single big table. Each row represents one profile item, and each column a user. If the cell located at the i-row and j-th column is white, meaning user j will disclose item i, if grey user j will NOT disclose item i.
  • Intuitively speaking, if a profile item is very sensitive, then very few people will share it, many people will not share it. Thus, we simply compute the proportion of users that are reluctant to disclose item I, and use this value as the sensitivity. The sensitivity, as computed here takes values between 0 and 1; the higher the value of βi, the more sensitive item I, the more people who are reluctant to disclose it.
  • The simplest way to compute the visibility of an item i belonging to a user j is to use the explicit setting from the i-th row and j-th column table – i.e., the corresponding cell value in the table, i.e.., R(i, j), which is either 1 or 0, meaning share or not share. From a statistician’s point of view , what we have observed is just a sample from some underlying probability distribution. This table is no exception either. Therefore, we are more interested in the expected value in each cell, not just the observed value. In this case, the visibility of an item I belonging to user j becomes the probability that j will share I – that is, a cell value is 1. Then, what is the probability the value of a cell such as R(i, j) is equal to 1?Assuming independence between items and individuals, we can compute Pij to be the probability of an 1 in the i-th row times the probability of an 1 in the j-th column. In other words, if the user j has high tendency to disclose lots of his profile items, he is more likely to disclose item I too. Also, if many other users have shared this item, the sensitivity is low, then it is very likely that user j will also share this item.
  • French psychologist Alfred Binet and physician Theodore Simon developed a method of identifying intellectually deficient children for their placement in special education programs. An item of interest was administrated to a number of kids with age ranging from 5 to 12 years. The free response of each kid to the item was scored as right or wrong. The proportion of correct response at each age level was obtained and presented in tabular form. In his 1916 revision of the Binet-Simon scale, the Stanford psychologist Lewis Terman plotted the proportion of correct response at a function of age and fitted a smooth line to these points by graphical means. This fitted line, shown in this figure is called the ICC. From this example, it can be seen that the ICC is the functional relationship between the proportion of correct response to an item and a criterion variable. This relationship is characterized by the location and shape of the ICC. This curve has an appearance of a cumulative distribution function. The ICC can be defined as a member of a family of two-parameter monotonic functions of the ability variable. Let us next formalize the ICC using appropriate statistical notation.
  • The Nave approach adopts a very coarse assumption that users and profile items are independent. Now I am going to show you more a refined model based on Item-Response Theory (IRT). The basic idea is to explicitly model the interactions between users and profile items using a logistic function. As a generative model, this approach fits the observed data very well. In this model, the parameter beta_i quantifies the sensitivity of profile item i. The higher the sensitivity the lower the probability that user will share this item. The parameter theta_j represents the attitude of a user j – how concerns he is about his privacy; low value indicates a conservative user, while high value indicates a extrovert/careless user. The higher the attitude value, the more likely the user is going to disclose the item.IRT has its origins in psychometrics where it is used to analyze data from questionnaires and tests. The goal there is to measure the abilities of the examinees, the difficulty of the questions and the probability of an examinee to correctly answer a given question.
  • The problem is that we do not know these parameters. We only know the observations, and the objective is to use the observed data to estimate these parameters. IRT defines for each user and each profile item, how likely this user is to disclose/share this item. From this perspective, IRT can be viewed as a generative model. We can compute the log likelihood of the data using this generative model and then use MLE or EM to estimate the parameters.
  • The IRT model seems arbitrary. It seems that any model could be used. Why IRT?The advantages of the IRT framework can be summarized as follows: 1) IRT defines for each user and each profile item, how likely this user is to disclose/share this item. From this perspective, IRT can be viewed as a generative model. Our experiments shows that this generative model fits the real-world data very well in terms of χ2 goodness-of-fit test. That is, real data does follow the distributions defined by IRT.2) The quantities IRT computes, i.e., sensitivity, attitude and visibility, have an intuitive interpretation. For example, the attitude can serve as a psychometric instrument that sociologists can use to study online behaviors of people. The sensitivity of information can be used to send early alerts to users when the sensitivities of their shared profile items are out of the comfortable region. 3) Due some mild assumptions, many of the computations in MLE can be parallelized, which makes the algorithm practically efficient. Most importantly, the estimates obtained from IRT framework are sample independent. This property is also called group invariance.
  • To evaluate the model, we conducted experiments on real-data gathered from our user study. We collected the information-sharing preferences of 153 users on 49 profile items such as name, gender, birthday, political views, address, phone number, degree, job, etc. For each profile item, we ask the user to specify whether he wants to keep the item confidential, or share with friends, or friends of friends, et. In the figure we visualize, using a tag cloud, the sensitivity of the profile items we computed from the survey. The larger the fonts used to represent a profile item in the tag cloud, the higher its estimated sensitivity value. It is easily observed that Mother’s Maiden Name is the most sensitive item, while one’s Gender, which locates just right above the letter “h” of “Mother” has the lowest sensitivity; too small to be visually identified.We compute the privacy scores of the 153 respondents using the IRT-based computations. We then group the respondents based on their geographic location. The Figure shows the average values of the users’ privacy scores per location. The results indicate that people from North America and Europe have higher privacy risk than people from Asia and Australia. This experimental finding indicates that people from North America and Europe are more comfortable to reveal personal information on the social networks they participate. This can be either a result of inherent attitude or social pressure. Since online social-networking is more widespread in these regions, one can assume that people in North America and Europe succumb to the social pressure to reveal things about themselves online in order to appear “cool” and become popular.Restate that under social pressure people tend to share more and more information. At some point, they start to worry about their privacy, but it is too late that they have already lost control of their information.
  • So for instance, if I am selecting to change my privacy to a score of 25 from whatever score, we would analyze the population of users and select a partition of users with scores around 25, say from 20 to 30----various heuristics could be used for making 'around' more precise---and then from these users determine the average settings for each data item and use that to apply and thus get the score close to 25... Taking the arithmetic average is one approach and others could be used given more information about the current user, the population, and more importantly about the sensitivity of the data the settings will affect.An exact score of 25 could then be achieved by iteratively adjusting some settings and recalculating the score until the actual score reaches 25 +/- some delta. Very likely the average settings and partitioning of the user population would need to be done in batch mode (e.g., Map Reduce calculations) and the results saved in the database for quick access and score computation.
  • After having described the models and applications of privacy score, now I am going to discuss how we can work together to create a privacy-aware opensocial environments.
  • As the first step the simplest way is to Provide native implementation of Privacy Score calculations in opensocial, as well as APIs to enable developers to implement their own privacy scores.
  • To accomplish the first step, we need the following API support …
  • OAuth allows the user to authorize access to his or her privacy settings stored in social networks. The basic flow is:A user logs in to a website/applicaiton and performs some action that requires privacy settings of the user’s profile items.The website/application directs the user to a web page hosted on the social network's domain. This web page asks the user if the external website/application should to be able to access his or her privacy settings. If the user agrees, the website/application will receive an OAuth authorization token.The website/application can then include this token in requests made with the OpenSocial REST and RPC protocols.

Towards Privacy-aware OpenSocial Applications Towards Privacy-aware OpenSocial Applications Presentation Transcript

  • Towards Privacy-aware OpenSocial Applications IBM Research May 19, 2009 © 2009 IBM Corporation
  • IBM Almaden Research Center The Team IBM Almaden Research Center IBM Silicon Valley Lab  Tyrone Grandison  Sherry Guo  Kun Liu  Dwayne Richardson  Michael (Max) Maximilien  Tony Sun  Evimaria Terzi 2 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Wait a minute, how come the talk is related Credit Score? 4 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center From humble beginnings in 1956, Fair Isaac Corp.'s credit score has come to loom over consumer finance like no other statistical measure ever has. The ubiquitous three-digit FICO score now helps determine everything from the interest rates people pay on their credit cards to their attractiveness as job candidates. 5 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center So, how about a Privacy Score that indicates the privacy risks of online social-networking users? 6 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Do you want to design and integrate the privacy score and other advanced privacy and risk management modules in OpenSocial so that 50 years (or much less) later, people will appreciate your effort? 7 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Roadmap  Motivation and Goal  Privacy Score and Its Applications  Privacy Score and OpenSocial  Proof of Concept: The Privacy-Aware Market Place  Conclusions 8 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Motivation Millions of users share details of their Disclosure of personal info expose the personal lives with vast networks of users to identity theft, digital stalking, etc. friends, and often, strangers Courtesy to: http://www.contrib.andrew.cmu.edu/%7Egct/mygroup.html Courtesy to: http://getyourfirstmortgage.com/wp-content/uploads/ 2008/08/identity-theft-protect-yourself-300x225.jpg 9 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Motivation (Cont.) My God! What information I have shared all these years How to prevent my and who can view ex from seeing my these information? status updates? All my friends have shared their hometown and phone number, maybe I should also How to hide do this? my friend list in the search results? I enjoyed sharing my daily activities How to prevent the with the World! applications my But any adverse friends installed effects? from accessing my information? 10 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Goal  Our goal is to develop a mechanism and a platform that will  measure and monitor the privacy risks of online social-networking users  boost public awareness of privacy  help users to easily manage their information sharing  Our goal is NOT to prevent people from sharing information  How to achieve this goal?  privacy score calculation  comprehensive information sharing report  privacy settings recommendation  and more … 11 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Roadmap  Motivation and Goal  Privacy Score and Its Applications  Privacy Score and OpenSocial  Proof of Concept: The Privacy-Aware Market Place  Conclusions 12 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy Score Overview Privacy Score measures the potential privacy risks of online social-networking users. Privacy Settings privacy score of the user Privacy Score Calculation Utilize Privacy Scores Privacy Risk Monitoring Comprehensive Privacy Report Privacy Settings Recommendation 13 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center How is Privacy Score Calculated? – Basic Premises  Sensitivity: The more sensitive the information revealed by a user, the higher his privacy risk. mother’s maiden name is more sensitive than mobile-phone number  Visibility: The wider the information about a user spreads, the higher his privacy risk. home address known by everyone poses higher risks than by friends only 14 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy Score Calculation name, or gender, birthday, address, phone number, degree, job, etc. Privacy Score of User j due to Profile Item i PR(i, j )=βi ×V (i, j ). sensitivity of profile item i visibility of profile item i 15 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy Score Calculation name, or gender, birthday, address, phone number, degree, job, etc. Privacy Score of User j due to Profile Item i PR(i, j )=βi ×V (i, j ). sensitivity of profile item i visibility of profile item i Overall Privacy Score of User j PR(j ) = ∑ PR(i , j ) = ∑ β i × V (i, j ). i i 16 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center The Naïve Approach User_1 User_j User_N Profile Item_1 R(1, 1) R(1, 2) R(1, N) (birthday) Profile Item_i R(i, j) (cell phone #) Profile Item_n R(n, 1) R(n, N) share, R(i, j) = 1 not share, R(i, j) = 0 17 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center The Naïve Approach User_1 User_j User_N Profile Item_1 R(1, 1) R(1, 2) R(1, N) (birthday) Profile Item_i R(i, j) | Ri |= ∑ R (i, j ) (cell phone j #) Profile Item_n R(n, 1) R(n, N) share, R(i, j) = 1 not share, R(i, j) = 0 N − | Ri | Sensitivity: βi = N 18 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center The Naïve Approach User_1 User_j User_N Profile Item_1 R(1, 1) R(1, 2) R(1, N) (birthday) Profile Item_i R(i, j) | Ri |= ∑ R (i, j ) (cell phone j #) Profile Item_n R(n, 1) R(n, N) share, R(i, j) = 1 | R j |= ∑ R(i, j ) not share, R(i, j) = 0 i N − | Ri | Sensitivity: βi = N | Ri | | R j | |Rj | Visibility: V (i, j ) = Pr{R (i , j ) = 1} Pij = Pr{R (i, j ) = 1} = × = (1 − β i ) × N n n 19 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center The Naïve Approach User_1 User_j User_N Profile Item_1 R(1, 1) R(1, 2) R(1, N) (birthday) Profile Item_i R(i, j) | Ri |= ∑ R (i, j ) (cell phone j #) Profile Item_n R(n, 1) R(n, N) share, R(i, j) = 1 | R j |= ∑ R(i, j ) not share, R(i, j) = 0 i N − | Ri | Sensitivity: βi = N | Ri | | R j | |Rj | Visibility: V (i, j ) = Pr{R (i , j ) = 1} Pij = Pr{R (i, j ) = 1} = × = (1 − β i ) × N n n Privacy Score: PR(j ) = ∑β i i × V (i, j ). 20 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Item Response Theory (IRT)  IRT (Lawley,1943 and Lord,1952) has its origin in psychometrics.  It is used to analyze data from questionnaires and tests.  It is the foundation of Computerized Adaptive Test like GRE, GMAT Ability 21 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center The Item Response Theory (IRT) Approach User_1 User_j User_N Profile Item_1 R(1, 1) R(1, 2) R(1, N) (birthday) Profile Item_i R(i, j) (cell phone #) Profile Item_n R(n, 1) R(n, N) share, R(i, j) = 1 not share, R(i, j) = 0 Profile item’s discrimination 1 User’s attitude, Pij = Pr{R(i, j ) = 1} = − α i (θ j − β i ) e.g., conservative or extrovert 1+ e Profile item’s sensitivity Profile item’s visibility 22 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Calculating Privacy Score using IRT Overall Privacy Score of User j PR(j ) = ∑ β i × V (i, j ). i Sensitivity: β i Visibility: V (i , j ) = Pr{R(i, j ) = 1} 1 Pij = Pr{R (i , j ) = 1} = − α i (θ j − β i ) 1+ e byproducts: profile item’s discrimination and user’s attitude All the parameters can be estimated using Maximum Likelihood Estimation and EM. 23 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Advantages of the IRT Model  The mathematical model fits the observed data well  The quantities IRT computes (i.e., sensitivity, attitude and visibility) have intuitive interpretations  Computation is parallelizable using e.g. MapReduce  Privacy scores calculated within different social networks are comparable 24 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Interesting Results from User Study Sensitivity of The Profile Items Computed by IRT Model Survey We collected the information-sharing preferences of 153 users on 49 profile items such as name, gender, birthday, political views, address, phone number, degree, job, etc. Statistics • 49 profile items Average Privacy Scores Grouped by Geographical Regions • 153 users from 18 countries/regions • 53.3% are male and 46.7% are female • 75.4% are in the age of 23 to 39 • 91.6% hold a college degree or higher • 76.0% spend 4+ hours online per day 25 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Utilize Privacy Scores  Privacy Risk Monitoring  Privacy (Information Sharing) Report Score: 100 ~ 150 Score: 100 ~ 150  Privacy Settings Recommendation Score: 100 ~ 150 Score: 100 ~ 150 Score: 100 ~ 150 26 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Roadmap  Motivation and Goal  Privacy Score and Its Applications  Privacy Score and OpenSocial  Proof of Concept: The Privacy-Aware Market Place  Conclusions 27 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy Score and OpenSocial Provide native implementation of Enable application developers to Privacy Score calculations. implement their own Privacy Scores. Enable application developers to build Information Sharing Report modules and Privacy Settings Recommendation modules. 28 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Suggested APIs for OpenSocial double getPrivacyScore(userID, key) • Description: Gets the user’s privacy score, which is calculated by the native implementation. The key specifies the mathematical model, i.e., Naïve or IRT. double getPrivacyScore(userID) • Description: Gets the user’s privacy score int getPrivacySetting(userID, profileItemID) • Description: Gets user’s privacy setting for the profile item Boolean setPrivacySetting(userID, profileItemID, PrivacySetting) • Description: Sets the user’s privacy setting for the profile item profileItemID can be chosen from opensocial.Person.Field, opensocial.Address.Field, opensocial.Email.Field, opensocial.Name.Field, opensocial.Organization.Field, opensocial.Phone.Field, etc. 29 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy Settings Themselves Are Private/Sensitive OAuth allows the user to authorize access to his or her privacy settings stored in social networks. APPLICATION SOCIAL NETWORK USER Request to access protected information of the user 1 Direct user to Prompt user to provide User authorizes access social network for authorization 3 to private data authorization 2 Use the token to access Grant access token & 4 protected information Direct user to application 5 6 30 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Roadmap  Motivation and Goal  Privacy Score and Its Applications  Privacy Score and OpenSocial  Proof of Concept: The Privacy-Aware Market Place  Conclusions 31 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy-aware Marketplace (PaMP)  PaMP allows one to create posts that are related to items for sale, housing, and jobs.  PaMP adopts the privacy models previously discussed  PaMP empowers users to control all aspects of their data and enable privacy settings to be configured with least effort.  PaMP is a significant differentiator from other online marketplace offerings. 32 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Privacy-aware Marketplace (PaMP) http://apps.facebook.com/p_a_m_p 33 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Conclusions  In this talk, we have discussed  the importance of privacy score and privacy management  two ways to compute privacy score using privacy settings  how to integrate the scores with OpenSocial  Our goal is to develop a mechanism and a platform that will  measure and monitor the privacy risks of online social-networking users  boost public awareness of privacy risks  help users to easily manage information sharing  We believe that  simple and effective privacy management makes users feel safe and comfortable about sharing information online, which will eventually facilitate the information sharing and integration. 34 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Next Steps  The Initial Question to You  Do you want to design and integrate the privacy score and other advanced privacy and risk management modules in OpenSocial so that 50 years (or much less) later, people will appreciate your effort?  If Your Answer is Yes  Let’s collaborate on a privacy management roadmap for OpenSocial 35 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation
  • IBM Almaden Research Center Thank you and Questions? 36 IBM Almaden Research Center -- http://www.almaden.ibm.com/cs/projects/iis/ppn/ © 2009 IBM Corporation