Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
VISVESVARAYA TECHNOLOGICAL UNIVERSITY 
Belagavi, Karnataka 
A Seminar Report on 
“BIG DATA PRIVACY ISSUES IN PUBLIC SOCIAL...
Karnataka Reddyjana Sangha® 
VEMANA INSTITUTE OF TECHNOLOGY 
Koramangala, Bengaluru – 34 
(Affiliated to Visvesvaraya Tech...
ACKNOWLEDGEMENT 
I am indeed very happy to greatly acknowledge the numerous personalities involved in 
lending their help ...
ABSTRACT 
Big Data is a new label given to a diverse field of data intensive information in 
which the datasets are so lar...
CONTENTS 
CHAPTER NO. TITLE PAGE NO 
ABSTRACT iii 
CONTENTS iv 
LIST OF FIGURES v 
iv 
1 
1 INTRODUCTION 
1.1 Overview 1 
...
LIST OF FIGURES 
FIGURE NO. 
TITLE 
PAGE. NO 
Figure 1 Mixed zone arrangement 05 
Figure 2 Facebook message in Dec 2009 10...
Big Data Privacy Issues in Public Social Media 
CHAPTER 1 
INTRODUCTION 
In this electronic age, increasing number of orga...
Big Data Privacy Issues in Public Social Media 
The traditional Big Data applications such as astronomy and other e-scienc...
Big Data Privacy Issues in Public Social Media 
CHAPTER 2 
LITERATURE SURVEY 
2.1 Location Privacy in Pervasive Computing ...
Big Data Privacy Issues in Public Social Media 
In anonymous communications, the action is usually that of sending or rece...
Big Data Privacy Issues in Public Social Media 
Figure 1. A sample mix zone arrangement with three application zones. The ...
Big Data Privacy Issues in Public Social Media 
2.2 Facebook’s Privacy Train-wreck. Exposure, Invasion, and Social Converg...
Big Data Privacy Issues in Public Social Media 
Friends are friends, participants have varied reasons for maintaining Frie...
Big Data Privacy Issues in Public Social Media 
people’s inability to control private information in the same breath as we...
Big Data Privacy Issues in Public Social Media 
This message appeared when users logged into their account and it was impo...
Big Data Privacy Issues in Public Social Media 
Figure 2: The message Facebook users saw in December 2009. 
As privacy adv...
Big Data Privacy Issues in Public Social Media 
under what circumstances. As the discussions became more voracious, Facebo...
Big Data Privacy Issues in Public Social Media 
2.4 Protecting Location Privacy with Personalized k-Anonymity: Architectur...
Big Data Privacy Issues in Public Social Media 
dealing with location privacy breaches exemplified by the location inferen...
Big Data Privacy Issues in Public Social Media 
CHAPTER 3 
PROBLEM STATEMENT 
The amount of social media being uploaded in...
Big Data Privacy Issues in Public Social Media 
site like Foursquare, Google Latitude or Facebook Places. In this category...
Big Data Privacy Issues in Public Social Media 
CHAPTER 4 
SYSTEM ANALYSIS 
4.1 EXISTING SYSTEM 
Privacy issues are catego...
Big Data Privacy Issues in Public Social Media 
Secondly, a piece of media in question must contain harmful content for th...
Big Data Privacy Issues in Public Social Media 
there are also certainly users who do not wish their SN to know when and w...
Big Data Privacy Issues in Public Social Media 
CHAPTER 5 
THREAT ANALYSIS 
5.1 Awareness of Damaging Media in Big Dataset...
Big Data Privacy Issues in Public Social Media 
5.2 Analysis of Service Privacy 
The different types of watchdog services ...
Big Data Privacy Issues in Public Social Media 
TOR can be used, to ensure none of the participating nodes gets enough inf...
Big Data Privacy Issues in Public Social Media 
Instagram and PicPlz are services/mobile apps that allow posting images in...
Big Data Privacy Issues in Public Social Media 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 23
Big Data Privacy Issues in Public Social Media 
CHAPTER 6 
SURVEY OF METADATA IN SOCIAL MEDIA 
To underpin the growing pre...
Big Data Privacy Issues in Public Social Media 
Fig. 4. Public privacy-related metadata in 5.7k random and 1k mobile user ...
Big Data Privacy Issues in Public Social Media 
This it is unsurprising since most compact and DSLR cameras currently do n...
Big Data Privacy Issues in Public Social Media 
A 5k dataset of random photos from Locr were collected, and analyzed the m...
Big Data Privacy Issues in Public Social Media 
CHAPTER 7 
CONCLUSION AND FUTURE WORK 
An analysis of the threat to an ind...
Big Data Privacy Issues in Public Social Media 
These two appsshow the willingness of users to share this kind of informat...
Big Data Privacy Issues in Public Social Media 
Squicciarini et al. Prototype is implemented as a Facebook app and is base...
Big Data Privacy Issues in Public Social Media 
REFERENCES 
[1] S. Ahern, D. Eckles, N. Good, S. King, M. Naaman, and R. N...
Big Data Privacy Issues in Public Social Media 
[13] B. Gedik and L. Liu. Protecting Location Privacy with Personalized k-...
Upcoming SlideShare
Loading in …5
×

Big data privacy issues in public social media

1,930 views

Published on

big data, privacy issues

Published in: Education
  • Be the first to comment

Big data privacy issues in public social media

  1. 1. VISVESVARAYA TECHNOLOGICAL UNIVERSITY Belagavi, Karnataka A Seminar Report on “BIG DATA PRIVACY ISSUES IN PUBLIC SOCIAL MEDIA” Submitted in partial fulfillment of the requirements for the award of the Degree of Master of Technology In Computer Science and Engineering Submitted by SUPRIYA R 1VI14SCS11 Guided By Mrs. MARY VIDYA JOHN Assistant Professor, Dept. of CSE Vemana IT, Bengaluru – 34. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING VEMANA INSTITUTE OF TECHNOLOGY Koramangala, Bengaluru – 560 034. 2014-2015
  2. 2. Karnataka Reddyjana Sangha® VEMANA INSTITUTE OF TECHNOLOGY Koramangala, Bengaluru – 34 (Affiliated to Visvesvaraya Technological University, Belagavi) Department of Computer Science & Engineering Certificate This is to certify that SUPRIYA R, bearing U.S.N 1VI14SCS11, a student of I semester, Computer Science & Engineering has completed the seminar entitled “BIG DATA PRIVACY ISSUES IN PUBLIC SOCIAL MEDIA” in partial fulfillment for the requirement of the award of degree of Master of Technology in Computer Science and Engineering of the Visvesvaraya Technological University, Belagavi during the academic year 2014-2015. The seminar report has been approved as it satisfies the academic requirements in respect of seminar work prescribed for the said degree. ________________ ______________ Signature of the Guide Signature of HOD MRS. MARY VIDYA JOHN DR.NARASIMHA MURTHY K.N Name of the Examiner Signature with date 1.____________________ ________________ 2.____________________ ________________
  3. 3. ACKNOWLEDGEMENT I am indeed very happy to greatly acknowledge the numerous personalities involved in lending their help to make my seminar “Big Data Privacy Issues in Public Social Media” a successful one. Salutations to my beloved and esteemed institute “Vemana institute of technology” for having well qualified staff and labs furnished with necessary equipments. I express my gratitude to DR. P. GIRIDHARA REDDY, Principal, Vemana IT for providing the best facilities and without his constructive criticisms and suggestions I wouldn’t have been able to do this curriculum. I express my sincere gratitude to DR.NARASIMHA MURTHY K.N, Professor and H.O.D (PG), Dept of Computer Science and Engineering, a great motivator and think-tank. I am very much grateful to him for providing an opportunity to present my seminar. I also express my gratitude to internal guide Mrs. MARY VIDYA JOHN, Assistant Professor for giving the sort of encouragement in preparing the report, presenting the paper, spirit and useful guidance. I also thank my parents and friends for their moral support and constant guidance made ii my efforts fruitful. Date: SUPRIYA R Place: Bangalore 1VI14SCS11
  4. 4. ABSTRACT Big Data is a new label given to a diverse field of data intensive information in which the datasets are so large that they are difficult to work with effectively. The term has been mainly used in two contexts, firstly as a technological challenge when dealing with data-intensive domains such as high energy physics, astronomy or internet search, and secondly as a sociological problem when data about us is collected and mined by companies such as Facebook, Google, Mobile phone companies and Governments. The concentration is on the second issue from a new perspective, namely how the user can gain awareness of the personally relevant part Big Data that is publicly available in the social web. The amount of user-generated media uploaded to the web is expanding rapidly and it is beyond the capabilities of any human to sift through it all to see which media impacts our privacy. Based on an analysis of social media in Flickr, Locr, Facebook and Google+, privacy implications and potential of the emerging trend of geo-tagged social media are discussed. At last, a concept with which users can stay informed about which parts of the social Big Data is relevant to them. iii
  5. 5. CONTENTS CHAPTER NO. TITLE PAGE NO ABSTRACT iii CONTENTS iv LIST OF FIGURES v iv 1 1 INTRODUCTION 1.1 Overview 1 2 LITERARURE SURVEY 3 2.1 Location privacy in pervasive computing 3 2.2 Facebook’s trainwreck: exposure and invasion 6 2.3 Facebook’s privacy settings: Who cares? 8 2.4 Protecting location privacy with K-Anonymity 12 3 PROBLEM STATEMENT 14 4 SYSTEM ANALYSIS 16 4.1 Existing system 16 4.2 Proposed system 17 5 THREAT ANALYSIS 19 5.1 Awareness of damaging media in big datasets 19 5.2 Analysis of Service Privacy 20 6 SURVEY OF METADATA IN SOCIAL MEDIA 24 7 CONCLUSION AND FUTURE WORK 27 REFERENCES 30
  6. 6. LIST OF FIGURES FIGURE NO. TITLE PAGE. NO Figure 1 Mixed zone arrangement 05 Figure 2 Facebook message in Dec 2009 10 Figure 3 Facebook’s simplified setting 11 Figure 4 Public Privacy metadata from Flickr 22 Figure 5 Public Privacy metadata from Locr 23 v
  7. 7. Big Data Privacy Issues in Public Social Media CHAPTER 1 INTRODUCTION In this electronic age, increasing number of organizations are facing the problem of explosion of data and the size of the databases used in today’s enterprises has been growing at exponential rates. Data is generated through many sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form . Today's business applications are having enterprise features like large scale, data-intensive, web-oriented and accessed from diverse devices including mobile devices. Processing or analyzing the huge amount of data or extracting meaningful information is a challenging task. 1.1 Overview Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications. It is becoming a hot topic in many areas where datasets are so large that they can no longer be handled effectively or even completely. Or put differently, any task that is comparatively easy to execute when operating on a small set of data, but becomes unmanageable when dealing with the same problem with a large dataset can be classified as a Big Data problem. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, remote sensing, software logs, cameras, microphones and wireless sensor networks. Typical problems that are encountered when dealing with Big Data includes capture, storage, dissemination, search, analytics and visualization. The traditional data-intensive sciences such as astronomy, high energy physics, meteorology, and genomics, biological and environmental research in which peta- and exabytes of data are generated are common domain examples. Here even the capture and storage of the data is a challenge. But there are also new domains emerging on the Big Data paradigm such as data warehousing, Internet and social web search, finance and business informatics. Here datasets can be small compared to the previous domains, however the complexity of the data can still lead to the classification as a Big Data problem. M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 1
  8. 8. Big Data Privacy Issues in Public Social Media The traditional Big Data applications such as astronomy and other e-sciences usually operate on non-personal information and as such usually do not have significant privacy issues. The privacy critical Big Data applications lie in the new domains of the social web, consumer and business analytics and governmental surveillance. In these domains Big Data research is being used to create and analyze profiles of users, for example for market research, targeted advertisement, workflow improvement or national security. These are very contentious issues since it is entirely up to the controller of the Big Data sets if the information obtained from various sources is used for criminal purposes or not. In particular in the context of the social web there is an increasing awareness of the value, potential and risk of the personal data which users voluntarily upload to the web. However, the Big Data issue in this area has focused almost entirely on what the controlling companies do with this information. These concerns are being addressed by calls for regulatory intervention, i.e. regulating what companies are allowed to do with the data users give them or what data they are allowed to gather about users. Social media Big Data issue is examined. The growing proliferation and capabilities of mobile devices which is creating a deluge of social media which effects our privacy is discussed. Due to the vast amounts of data being uploaded every day it is next to impossible to be aware of everything which affects us. Then a concept which can be used to regain control of some of the Big Data deluge created by other social web users is discussed. M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 2
  9. 9. Big Data Privacy Issues in Public Social Media CHAPTER 2 LITERATURE SURVEY 2.1 Location Privacy in Pervasive Computing Mix zones Most theoretical models of anonymity and pseudonymity originate from the work of David Chaum, whose pioneering contributions include two constructions for anonymous communications (the mix network6 and the dining cryptographers algorithm) and the notion of anonymity sets. Introduction to a new entity for location systems, the mix zone, which is analogous to a mix node in communication systems is discussed. Using the mix zone to model the spatiotemporal problem, useful techniques from the anonymous communications field are adopted. Mix networks and mix nodes A mix network is a store-and-forward network that offers anonymous communication facilities. The network contains normal message-routing nodes alongside special mix nodes. Even hostile observers who can monitor all the links in the network cannot trace a message from its source to its destination without the collusion of the mix nodes. In its simplest form, a mix node collects n equal-length packets as input and reorders them by some metric (for example, lexicographically or randomly) before forwarding them, thus providing unlinkability between incoming and outgoing messages. For briefness, some essential details about layered encryption of packets are omitted; interested readers should consult Chaum’s original work. The number of distinct senders in the batch provides a measure of the unlinkability between the messages coming in and going out of the mix. Chaum later abstracted this last observation into the concept of an anonymity set. As paraphrased by Andreas Pfitzmann and Marit Köhntopp, whose recent survey generalizes and formalizes the terminology on anonymity and related topics, the anonymity set is “the set of all possible subjects who might cause an action.” M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 3
  10. 10. Big Data Privacy Issues in Public Social Media In anonymous communications, the action is usually that of sending or receiving a message. The larger the anonymity set’s size, the greater the anonymity offered. Conversely, when the anonymity set reduces to a singleton—the anonymity set cannot become empty for an action that was actually performed—the subject is completely exposed and loses all anonymity. Mix zones A mix zone for a group of users as a connected spatial region of maximum size in which none of these users have registered any application callback has been discussed; for a given group of users there might be several distinct mix zones. In contrast, an application zone as an area where a user has registered for a callback. The middleware system can define the mix zones and calculate them separately for each group of users as the spatial areas currently not in any application zone. Because applications do not receive any location information when users are in a mix zone, the identities are “mixed.” Assuming users change to a new, unused pseudonym whenever they enter a mix zone, applications that see a user emerging from the mix zone cannot distinguish that user from any other who was in the mix zone at the same time and cannot link people going into the mix zone with those coming out of it. If a mix zone has a diameter much larger than the distance the user can cover during one location update period, it might not mix users adequately. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 4
  11. 11. Big Data Privacy Issues in Public Social Media Figure 1. A sample mix zone arrangement with three application zones. The airline agency (A) is much closer to the bank (B) than the coffee shop (C). Users leaving A and C at the same time might be distinguishable on arrival at B. Figure 1 provides a plan view of a single mix zone with three application zones around the edge: an airline agency (A), a bank (B), and a coffee shop (C). Zone A is much closer to B than C, so if two users leave A and C at the same time and a user reaches B at the next update period, an observer will know the user emerging from the mix zone at B is not the one who entered the mix zone at C. Furthermore, if nobody else was in the mix zone at the time, the user can only be the one from A. If the maximum size of the mix zone exceeds the distance a user covers in one period, mixing will be incomplete. The amount to which the mix zone anonymizes users is therefore smaller than one might believe by looking at the anonymity set size. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 5
  12. 12. Big Data Privacy Issues in Public Social Media 2.2 Facebook’s Privacy Train-wreck. Exposure, Invasion, and Social Convergence The tech world has a tendency to view the concept of ‘private’ as a single bit that is either 0 or 1. Data are either exposed or not. When companies make a decision to make data visible in a more ‘efficient’ manner, it is often startling, prompting users to speak of a disruption of ‘privacy’. Facebook, did not make anything public that was not already public, but search disrupted the social dynamics. The reason for this is that privacy is not simply about the state of an inanimate object or set of bytes; it is about the sense of vulnerability that an individual experiences when negotiating data. On Facebook, people were wandering around accepting others as Friends, commenting on others’ pages, checking out what others posted, and otherwise participating in the networked environment. Now, imagine that everyone involved notices all this because it is displayed when they login. By aggregating all this information and projecting it to everyone’s News Feed, Facebook made what was previously obscure difficult to miss (and even harder to forget). Those data were all there before but were not efficiently accessible; they were not aggregated. The acoustics changed and the social faux pas was suddenly very visible. Without having privacy features, participants had to reconsider each change that they made because they knew it would be broadcast to all their Friends. With Facebook, participants have to consider how others might interpret their actions, knowing that any action will be broadcast to everyone with whom they consented to digital Friendship. For many, it is hard to even remember whom they listed as Friends, let alone assess the different ways in which they may interpret information. Without being able to see their Friends’ reactions, they are not even aware of when their posts have been misinterpreted. Facebook imply that people can maintain infinite numbers of friends provided they have digital management tools. While having hundreds of Friends on social network sites is not uncommon, users are not actually keeping up with the lives of all of those people. These digital Friends are not necessarily close friends and friendship management tools are not good enough for building and maintaining close ties. While Facebook assumes that all M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 6
  13. 13. Big Data Privacy Issues in Public Social Media Friends are friends, participants have varied reasons for maintaining Friendship ties on the site that have nothing to do with daily upkeep. News Feeds does not distinguish between these – all Friends are treated equally and updates come from all Friends, not just those that an individual deems to be close friends. To complicate matters, when data are there, people want to pay attention, even if it doesn’t help them. People relish personal information because it is the currency of social hierarchy and connectivity. Cognitive addiction to social information is great for Facebook because News Feeds makes Facebook sticky. Privacy is not simply about zero’s and ones, it is about how people experience their relationship with others and with information. Privacy is a sense of control over information, the context where sharing takes place, and the audience who can gain access. Information is not private because no one knows it; it is private because the knowing is limited and controlled. In most scenarios, the limitations are often more social than structural. Secrets are considered the most private of social information because keeping knowledge private is far more difficult than spreading it. There is an immense gray are between secrets and information intended to be broadcast as publicly as possible. By and large, people treated Facebook as being in that gray zone. Participants were not likely to post secrets, but they often posted information that was only relevant in certain contexts. The assumption was that if you were visiting someone’s page, you could access information in context. When snippets and actions were broadcast to the News Feed, they were taken out of context and made far more visible than seemed reasonable. In other words, with News Feeds, Facebook obliterated the gray zone. Social convergence occurs when disparate social contexts are collapsed into one. Even in public settings, people are accustomed to maintaining discrete social contexts separated by space. How one behaves is typically dependent on the norms in a given social context. How one behaves in a pub differs from how one behaves in a family park, even though both are ostensibly public. Social convergence requires people to handle disparate audiences simultaneously without a social script. While social convergence allows information to be spread more efficiently, this is not always what people desire. As with other forms of convergence, control is lost with social convergence. Can we celebrate M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 7
  14. 14. Big Data Privacy Issues in Public Social Media people’s inability to control private information in the same breath as we celebrate mainstream media’s inability to control what information is broadcast to us? Privacy is not an inalienable right – it is a privilege that must be protected social and structurally in order to exist. The question remains as to whether or not privacy is something that society wishes to support. When Facebook launched News Feeds, they altered the architecture of information flow. This was their goal and with a few tweets, they were able to convince their users that the advantages of News Feeds outweighed security through obscurity. Users quickly adjusted to the new architecture; they began taking actions solely so that they could be broadcast across to Friends’ News Feeds. 2.3 Facebook privacy settings: Who cares? Facebook’s approach to privacy was initially network–centric. By default, students’ content was visible to all other students on the same campus, but no one else. Through a series of redesigns, Facebook provided users with controls for determining what could be shared with whom, initially allowing them to share with “No One”, “Friends”, “Friends– of–Friends”, or a specific “Network”. When Facebook became a platform upon which other companies could create applications, users’ content could then be shared with third–party developers who used Facebook data as part of the “Apps” that they provided. The company introduced privacy settings to allow users to determine which third parties could access what content when users encountered a message whenever they chose to add an application. Over time, Facebook introduced the ability to share content with “Everyone” (inside Facebook or not). Increasingly, the controls got more complex and media reports suggest that users found themselves uncertain about what they meant (Bilton, 2010). Recognizing the validity of this point, Facebook eventually simplified its privacy settings page (Zuckerberg, 2010). At each point when Facebook introduced new options for sharing content, the default was to share broadly. Following a series of redesigns in December 2009, Facebook added a prompt when users logged on asking them to reconsider their privacy settings for various types of content on the site, including “Posts I Create: Status Updates, Likes, Photos, Videos, and Notes“ (see Figure 2). For each item, users were given two options: “Everyone” or “Old Settings.” The toggle buttons defaulted to “Everyone.” M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 8
  15. 15. Big Data Privacy Issues in Public Social Media This message appeared when users logged into their account and it was impossible to go to the rest of the site without addressing the prompt. Faced with this obligatory prompt, many users may well have just clicked through, accepting the defaults that Facebook had chosen. In doing so, these users made much of their content more accessible than was previously the case. As part of these changes, everyone’s basic profile and Friends list became available to the public, regardless of whether or not they had previously chosen to restrict access; this was later redacted. When Facebook was challenged by the Federal Trade Commission to defend its decision about this approach, the company representative noted that 35 percent of users who had never before edited their settings did so when prompted. Facebook used these data to highlight that more people engaged with Facebook privacy settings than the industry average of 5–10 percent. While 35 percent may be significantly more than the industry average, and Facebook did not specify what percentage of users had never adjusted their settings, there is still likely a sizeable majority that accepted the site’s defaults each time changes had been implemented. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 9
  16. 16. Big Data Privacy Issues in Public Social Media Figure 2: The message Facebook users saw in December 2009. As privacy advocates and regulators investigated what these changes meant, Facebook moved toward another series of modifications that depended on users’ willingness to share information. In April 2010, at f8 conference, Facebook announced Instant Personalizer and Social Plugins, two services that allowed partners to leverage the social graph — the information about one’s relationships on the site that the user makes available to the system — and provide a channel for sharing information between Facebook and third parties. For example, Web sites could implement a Like button on their own pages that enables users to share content from that site with the user’s connections on Facebook. Sites could also implement an Activity Feed that publicizes what a person’s Friends do on that site. These tools were built for developers and likely few users, journalists, or regulators had any sense of what data from their accounts and actions on Facebook were being shared with whom M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 10
  17. 17. Big Data Privacy Issues in Public Social Media under what circumstances. As the discussions became more voracious, Facebook was increasingly pressured to respond. On 26 May 2010, Zuckerberg announced that Facebook heard the concerns and believed that the major issue on the table was Facebook’s confusing privacy settings page. Facebook unveiled a new privacy settings page that, while simpler, also removed many of the controls that allowed users to limit what content could be restricted (see Figure 3). This quelled much of the news coverage, but it is unlikely that it would lead to the end of controversy as in July 2010, a Canadian law firm filed a class action suit against Facebook over privacy issues. Figure 3: Facebook’s “simplified” privacy settings, July 2010. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 11
  18. 18. Big Data Privacy Issues in Public Social Media 2.4 Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms There are two popular approaches to protect location privacy in the context of Location Based Services (LBS) usage: policy-based and anonymity-based approaches. In policy-based approaches, mobile clients specify their location privacy preferences as policies and completely trust that the third-party LBS providers adhere to these policies. In the anonymity-based approaches, the LBS providers are assumed to be semi honest instead of completely trusted. A k-anonymity preserving management of location information by developing efficient and scalable system-level facilities for protecting the location privacy through ensuring location k-anonymity. Assume that anonymous location-based applications do not require user identities for providing service. The concept of k-anonymity was originally introduced in the context of relational data privacy. It addresses the question of “how a data holder can release its private data with guarantees that the individual subjects of the data cannot be identified whereas the data remain practically useful”. For instance, a medical institution may want to release a table of medical records with the names of the individuals replaced with dummy identifiers. However, some set of attributes can still lead to identity breaches. These attributes are referred to as the quasi-identifier. For instance, the combination of birth date, zip code, and gender attributes in the disclosed table can uniquely determine an individual. By joining such a medical record table with some publicly available information source like a voters list table, the medical information can be easily linked to individuals. K-anonymity prevents such a privacy breach by ensuring that each individual record can only be released if there are at least k - 1 distinct individuals whose associated records are indistinguishable from the former in terms of their quasi-identifier values. In the context of LBSs and mobile clients, location k-anonymity refers to the k-anonymity usage of location information. A subject is considered location k-anonymous if and only if the location information sent from a mobile client to an LBS is indistinguishable from the location information of at least k - 1 other mobile clients. Paper proposses that location perturbation is an effective technique for supporting location k-anonymity and M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 12
  19. 19. Big Data Privacy Issues in Public Social Media dealing with location privacy breaches exemplified by the location inference attack scenarios. If the location information sent by each mobile client is perturbed by replacing the position of the mobile client with a coarser grained spatial range such that there are k- 1 other mobile clients within that range (k > 1), then the adversary will have uncertainty in matching the mobile client to a known location-identity association or an external observation of the location-identity binding. This uncertainty increases with the increasing value of k, providing a higher degree of privacy for mobile clients, even though the LBS provider knows that person A was reported to visit location L, it cannot match a message to this user with certainty. This is because there are at least k different mobile clients within the same location L. As a result, it cannot perform further tracking without ambiguity. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 13
  20. 20. Big Data Privacy Issues in Public Social Media CHAPTER 3 PROBLEM STATEMENT The amount of social media being uploaded into the web is growing rapidly and there is still no end to this trend in sight. The ease-of-use of modern smartphones and the proliferation of high-speed mobile networks is facilitating a culture of spontaneous and carefree uploading of user-generated content. To give an idea of the scale of this phenomenon: Just in the last two years the number of photos uploaded to Facebook per month has risen from 2 billion to over 6 billion. From a personal perspective an overwhelming majority of these photos have no privacy relevance for oneself. Finding the few that are relevant is a daunting task. While one’s own media is uploaded consciously, the flood of media uploaded by others is so huge that it is almost impossible to stay aware of all media in which one might be depicted. This can be classified as a Big Data problem on the user’s side, however not on the provider side. Current social networks and photo-sharing sites mainly focus on the privacy of users’ own media in terms of access control, but do little to deal with the privacy implications created by other users’ media. There are ever more complex settings allowing a user to decide who is allowed to see what content but only content owned by the user. However, the issue of staying on top of what others are uploading (mostly in good faith), that might also be relevant to the user, is still very much outside the control of that user. Social networks which allow tagging of users usually inform affected users when they are tagged. However, if no such tagging is done by the uploader or a third party, there are currently no mechanisms to inform users of relevant media. A second important emerging trend is the capability of many modern devices to embed geo-data and other metadata into the created content. While the privacy issues of location-based services such as Foursquare or Qype have been discussed at great length, the privacy issues of location information embedded into uploaded media have not yet received much attention. There is one very significant difference between these two categories. In the first, users reveal their current location to access online services, such as Google Maps, Yelp or Qype or the user actually publishes his location on a social network M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 14
  21. 21. Big Data Privacy Issues in Public Social Media site like Foursquare, Google Latitude or Facebook Places. In this category, the user mainly affects his own privacy. There is a large body of work examining privacy preserving techniques to protect a user’s own privacy, ranging from solutions which are installed locally on the user’s mobile device, to solutions which use online services relying on group-based anonymisation algorithms, as for instance mix zones or k-anonymity. The second category is created by media which contain location information. This can have all the same privacy implications for the creator of the media, however, a critical and hitherto often overlooked issue is the fact, that the location and other metadata contained in pictures and videos can also affect other people than the uploader himself. This is a critical oversight and an issue which will gain importance as the mobile smart device boom continues. M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 15
  22. 22. Big Data Privacy Issues in Public Social Media CHAPTER 4 SYSTEM ANALYSIS 4.1 EXISTING SYSTEM Privacy issues are categorized into two classes. Firstly, homegrown problems: Someone uploads a piece of compromising media of himself with insufficient protection or forethought which causes damage to his own privacy. A prime example of this category is someone uploading compromising pictures of himself into a public album instead of a private one or onto his Timeline instead of a message. The damage done in these cases is very obvious since the link between the content and the user is direct and the audience (often the peer circle) has direct interest in the content. One special facet of this problem is that what is considered damaging content by the user can and often does change over time. While this is a serious problem, especially amongst the Facebook generation, this issue is a small data problem and thus is not the focus of this work. Secondly we have the Big Data problems created by others: An emerging threat to users’ online privacy comes from other users’ media. What makes this threat particularly bad is the fact that the person harmed is not involved in the uploading process and thus cannot take any pre-emptive precautions and the amount of data being uploaded is so vast it cannot be manually sighted. Also there are currently no countermeasures, except post-priory legal ones, to prevent others from uploading potentially damaging content about someone. There are two requirements for this form of privacy threat to have an effect: Firstly, to cause harm to a person a piece of media needs to be able to be associated/linked to the person in some way. This link can either be non-technical, such as being recognizable in a photo, or technical such as a profile being (hyper-) linked to a photo. There is also the grey area of textual references to a person near to the photo or embedded in the metadata of the photo. This metadata does not directly create a technical link to a profile, but it opens the possibility for search engines to index the information and make it searchable, thus creating a technical link. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 16
  23. 23. Big Data Privacy Issues in Public Social Media Secondly, a piece of media in question must contain harmful content for the person linked to it. This can again be non-technical such as being depicted in a compromising way. However, more interestingly it can also be technical. In these cases metadata or associated data causes harm. For instance time and location data can indicate that a person has been at an embarrassing location, took part in a political event, or was not where he said he was. Since the uploading of this type of damaging media cannot be effectively prevented, awareness is the key issue in combating this emerging privacy problem. 4.2 PROPOSED SYSTEM The privacy awareness concept consists of a watchdog client a server side watchdog service. Using a GPS-enabled mobile device a user can activate the watchdog client to locally track his position during times he considers relevant to his privacy. Then his device can request the privacy watchdog to show him media that could potentially affect him whenever the user is interested in the state of his online privacy. For this the watchdog client sends the location traces to the watchdog server or servers which then respond with a list of media which was taken in the vicinity (both spatially and temporally) of the user. The watchdog service can be operated in three different ways. The first two would be value-added services which can be offered by the media sharing sites or social networks (SN) themselves. In both these cases the existing services would need to be extended by an API to allow users to search for media by time and location. The first type of service would do this via the regular user account. Thus, it would able to see all public pictures, but also all pictures restricted in scope but visible to the user. The benefit of this service type is that through the integration with the account of the user pictures which aren’t publicly available can be searched. These private pictures are typically from social network friends and thus the likelihood of pictures involving the user is higher and the scope of people able to view the pictures more relevant. This type of service also has the benefit-of-sorts that the location information is valuable to the SN, so it has an incentive to offer this kind of value-added service. The privacy downside of searching with the user’s account is that the SN receives the information of when and where a user was. While there are certainly users who would not mind this information being sent to their SN if it means they get to see and remove embarrassing photos in a timely manner, M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 17
  24. 24. Big Data Privacy Issues in Public Social Media there are also certainly users who do not wish their SN to know when and where they were, particularly amongst the clientele who wish to protect their online privacy. The second type of watchdog service would also be operated by the SN. However, it does not require a user account to do the search and can be queried anonymously. This type of service would thus have a smaller scope, since it can only access publicly available media. A further drawback of this type of service is that there is less of an incentive for the SN provider to implement such a service. While there are sites such as Locr that allow such queries, most SN sites do not. Without outside pressure there is less intrinsic value for them to include such a service compared to the first type. The third type of service would be a stand-alone servi which can be operated by a third party. The stand-alone service operates like an indexing search machine, which crawls publicly available media and its metadata and allows this database to be queried. Possible incentive models for this approach include pay-per-use, subscription, ad-based or community services. The visibility scope would be the same as for the second type of service. All three types of service are mainly focused on detecting relevant media events and breaking down the Big Data problem to humanly manageable sizes. The concept is mainly focused on bringing possibly relevant media to the attention of the user without overburdening him. The system does not explicitly protect from malicious uploads with which the uploader is intentionally trying to harm another while attempting the activity at the same time. Even though the watchdog service proposed here could make the subterfuge harder for the malicious uploader. But even without full protection from malicious activity we believe that such a watchdog would improve the current state of the art by enabling users to gain better awareness of the relevant part of the social media Big Dataset. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 18
  25. 25. Big Data Privacy Issues in Public Social Media CHAPTER 5 THREAT ANALYSIS 5.1 Awareness of Damaging Media in Big Datasets Most popular social networks and media sharing sites allow users to tag objects and people in their uploaded media. Additionally, some services also extract embedded metadata and use this information for indexing and linking. Media is annotated with names, comments, or is directly linked to users’ profiles. In particular the direct linking of profiles to pictures was initially met with an outcry of privacy concerns since it greatly facilitated finding information about people. For this reason, social networks quickly introduced the option to prevent people from linking them in media. However, there is also a positive side to this form of direct linking since the linked person is usually made aware about the linked media and can then take action to have unwanted content removed or restrict the visibility of the link. While the privacy mechanism of current services are still limited, hidden and often confusing, once the link is made the affected people can take action. A more critical case in view is the non-linked tagging of photos. In this case a free text tag contains identifying information and/or malicious comments. However there is no automated mechanism to inform a user that he was named in or near a piece of media. The named person might not even be a member of the service where the media was uploaded. The threat of this kind of linking is significantly different to the one depicted above. While the immediate damage can be smaller since no automated notification is sent to friends of the user, the threat can remain hidden far longer. The person can remain unaware of this media whereas others can stumble upon it or be informed by peers. The final case of damaging media does not contain any technical link. Without any link to the person in question this kind of media can only cause harm to the person if someone having some contact to that person stumbles across it and makes the connection. While the likelihood of causing noticeable harm is smaller, it is still possible. The viral spreading of media has caused serious embarrassment and harm in real world cases. The critical issue here is that there is currently no way for a person to pro-actively search for this kind of media in the Big Data deluge to mitigate this threat. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 19
  26. 26. Big Data Privacy Issues in Public Social Media 5.2 Analysis of Service Privacy The different types of watchdog services discussed in the proposed system can help to reduce the number of relevant pieces of media a user needs to keep an eye on if users don’t want uncontrolled media of themselves to be online. However, the devil is in the detail since this form of service can also have serious privacy implications itself if designed in the wrong way. Care must be taken to facilitate the different usage and privacy requirements of different users. The critical issue is the fact that to request the relevant media a user must send location information to the watchdog service. When using the first type of service, there is little which can be done to protect the location privacy of the user since the correlation between the location query and the user account is direct. One option to protect the privacy to a degree can be an obfuscation approach. For every true query a number of fake queries could also be sent, making it less easy (but far from impossible) for the SN provider to ascertain the true location. However, this approach does not scale well for two reasons. Firstly, it creates a higher load on the SN. However, it is more critical that the likelihood deducing the true location rises, if many queries are sent unless great care is taken in creating the fake paths and masking the source IP addresses. Protecting the user’s privacy in the second case is simpler. Since the queries do not require an account the only way a user could be tracked directly is his IP address. Using an anonymizing service such as TOR sequential queries cannot be linked together and creating a tracking profile becomes significantly harder. The anonymous trace data can of course still be used by the SN, but the missing link to the user makes it less critical for the user himself. The third type of service is probably the most interesting privacy wise, since the economic model behind the service will significantly impact the privacy techniques which can be applied to this model. Most payment models would require user credentials to log in and thus would allow the watchdog service provider to track the user. In this case it would have to be the reputation of the service provider which the user would have to trust, similar to the case of commercial anonymizing proxies. In an ad-based approach no user-credentials are needed, thus it would be possible to use the service anonymously via TOR. If so, the watchdog client would need to be open source to ensure that no secret tracking information is stored there. In community-based approaches a privacy model like that in M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 20
  27. 27. Big Data Privacy Issues in Public Social Media TOR can be used, to ensure none of the participating nodes gets enough information to track a single user. Each of these proposed service types has different privacy benefits and disadvantages. The following section overviews our privacy analysis of different media hosting sites. It includes media access control as well as metadata handling. Flickr provides the most fine-grained privacy/access control settings of all analyzed services. Privacy settings can be defined on metadata as well as the image itself. One particularly interesting feature of Flickr is the geo-fence. The geo-fence feature enables users to define privacy regions on a map by putting a pin on it and setting a radius. Access to GPS data of the user’s photos inside these regions is only allowed for a restricted set of users (friends, family, and contacts). Flickr allows its users to tag and add people to images. If a user revokes a person tag of himself in an image, no one can add the person to that image again. Facebook extracts the title and description of an image from metadata during the upload process of some clients. All photos are resized after upload and metadata is stripped off. Facebook uses face recognition for friend tagging suggestions based on already tagged friends. Access to images is restricted by Facebook’s ever changing, complex and sometimes abstruse privacy settings. Picasa Web & Google+ store the complete EXIF metadata of all images. It is accessible by everyone who can access the image. The access to images is regulated on a per-album base. It can be set to public, restricted to people who know the secret URL to the album, or to the owner only. A noteworthy feature is that geo-location data can be protected separately. Google+ and Picasa Web allow the tagging of people in images. Locr is a geo-tagging focused photo-sharing site. As such, location information is included in most images. By default all metadata is retained in all images. Access control is set on a per image basis. Anybody who can see an image can also see the metadata. There are also extensive location-based search options. Geo-data is extracted from uploaded files or set by people on the Locr website. Locr uses reverse geocoding to add textual location information to images in its database. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 21
  28. 28. Big Data Privacy Issues in Public Social Media Instagram and PicPlz are services/mobile apps that allow posting images in a Twitter like way. Resized images stripped of metadata but with optional location data are stored by the services. Additionally they allow the posting of photos to different services like Flickr, Facebook, Dropbox, Foursquare and more. Depending on the service used metadata is stored or discarded. For instance, when uploading a photo to Flickr metadata is stripped from the actual file, but title, description as well as geo location are extracted from the image and can be set by the user. In contrast, the Hipstamatic mobile app reserves in-file metadata when uploading images to Flickr. The logos for different social media apps: M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 22
  29. 29. Big Data Privacy Issues in Public Social Media M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 23
  30. 30. Big Data Privacy Issues in Public Social Media CHAPTER 6 SURVEY OF METADATA IN SOCIAL MEDIA To underpin the growing prevalence of privacy-relevant metadata and location data in particular and to judge potential dangers and benefits based on real-world data we analyzed a set of 20,000 publicly available Flickr images and their metadata. Flickr was chosen as the premiere photo-sharing website, because it can be legally crawled, offers the full extent of privacy mechanisms and does not remove metadata in general. Crawling one photo each from 20k random Flickr users. Of these, 68.8% were Pro users where the original file could be accessed as well. For the others only the metadata available via the Flickr API was accessed. This includes data automatically extracted from EXIF data during upload and data manually added via website or semi-automatically set by client applications. 23% of the 20k users denied access to their extracted EXIF data in the Flickr database. Also took a set of 3,000 images made with a camera phone from 3k random mobile Flickr users. 46.8% of the mobile users were Pro users and only 2% denied access to EXIF data in the Flickr database GPS location data was present in 19% of the 20k dataset and in 34% of the 3k mobile phone dataset. While Flickr hosts many semi-professional DSLR photos, mobile phones are becoming the dominant photo generation tool with the iPhone 4 currently being the most common camera on Flickr. Textual location information like street or city names are currently not used much on Flickr. However, as reverse geocoding becomes more common in client applications this will change (cf. Locr in Figure 4). To evaluate the potential privacy impact, we manual checked which photos contained people and geo-reference, but no user profile tags – i.e. images which could contain people who are unaware of the photo. In the set of 20k images we found 16% and in in the set of 3k mobile photos we found 28% fulfilling this criteria M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 24
  31. 31. Big Data Privacy Issues in Public Social Media Fig. 4. Public privacy-related metadata in 5.7k random and 1k mobile user original Flickr photos Further the subset of images which were available from Pro users were analyzed, since these can contain the unaltered metadata from the camera. From the 20k dataset, 5761 images contained in-file metadata. From the 3k dataset, 1050 images contained in-file metadata. Figure 5 shows the percentage values for the different types of metadata contained in the files. For the rest of the images metadata was either manually removed by the uploader or the image never has had any in the first place. Of the 20k dataset only 3% of the in-file metadata contained GPS data compared to 32% from the mobile 3k dataset. This shows a clear dominance of mobile devices when it comes to publishing GPS metadata. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 25
  32. 32. Big Data Privacy Issues in Public Social Media This it is unsurprising since most compact and DSLR cameras currently do not have GPS receivers and only few photographers add external GPS devices to these cameras – but this will likely change with future cameras. However, combined with the fact that mobile phones are becoming the dominant type of camera where it comes to the number of published pictures, it is to be expected that the amount of GPS data available for scrutiny, use and abuse will rise further. Fig. 5. Public privacy-related metadata of 5k random photos from Locr service M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 26
  33. 33. Big Data Privacy Issues in Public Social Media A 5k dataset of random photos from Locr were collected, and analyzed the metadata in the images plus the images’ HTML pages built from the Locr database. Figure 4 shows the results from this dataset. Particularly interesting is the high rate of non-GPS based location information. This is a trend to watch since most location stripping mechanisms only strip GPS information and leave other text-based tags intact. Furthermore, the amount of camera ID meta-data is notable since these IDs can be used to link different pictures and infer meta-data even if data has been stripped from some of the photos. To summarize, one third of the pictures taken by dominant camera devices contains GPS information. About one third of these images depict people on it. Thus, about 10% of all the photos could harm other peoples’ privacy without them knowing about it. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 27
  34. 34. Big Data Privacy Issues in Public Social Media CHAPTER 7 CONCLUSION AND FUTURE WORK An analysis of the threat to an individual’s privacy that is created by other peoples’ social media is discussed. A brief overview of privacy capabilities of common social media services regarding their capability of protecting users from other peoples’ activities were discussed.Based on the survey, analyzed the Big Data privacy implications and potential of the emerging trend of geo-tagged social media. Then the three concepts how the location information can actually help users to stay in control of the flood of potentially harmful or interesting social media uploaded by others are presented. Applications can be built or modified to use pseudonyms rather than true user identities, and route toward greater location privacy. Because different applications can collide and share information about user sightings, users should adopt different pseudonyms for different applications. Furthermore, to stress more on sophisticated attacks, users should change pseudonyms frequently, even while being tracked. Drawing on the methods developed for anonymous communication, conceptual tools of mix zones and anonymity sets to analyze location privacy can be used. Although anonymity sets provide a first quantitative measure of location privacy, the measure is only an upper-bound estimate. A better, more accurate metric, developed using an information-theoretic approach, uses entropy, taking into account the a priori knowledge that an observer can derive from historical data.The entropy measurement shows a more pessimistic picture—in which the user hasless privacy—because it models a more powerful adversary who might use historical data to de-anonymize pseudonyms more accurately. Two services worth mentioning which collect the type ofinformation needed for a privacy watchdog are Social Camera and Locaccino. Social Camera is a mobile app that detects faces in the picture and tries to recognize them with the help of Facebook profile pictures of persons that are in your friends list. Recognized people can be automatically tagged and pictures instantly uploaded to Facebook. Locaccino is a Foursquare type application which allows users to upload location-based information into Facebook. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 28
  35. 35. Big Data Privacy Issues in Public Social Media These two appsshow the willingness of users to share this kind of informationin the social web. Ahern et al. analyses in their work privacy decisions of mobile users in the photo sharing process. Ahern et al. identify relationships between the location of the photo capture and the corresponding privacy settings. Ahern et al. recommend the use context information to help users to set privacy preferences andto increase the users’ awareness of information aggregation. Work by Fang and LeFevre focuses on helping the user to find appropriate privacy settings in social networks. Fang and LeFevrepresent a system where the user initially only needs to setup a few rules. Through the use of active machine learning algorithms the system helps the user to protect private information based on the individual behavior and taste. Mannan et al. address the problem, that private user data is not only shared within social networks, but also through personal web pages. In their work they focus on a privacy-enabled web content sharing and utilize existing instant messaging friendship relations to create and enforce access policies. The three works shown above all focus on protecting auser’s privacy based on dangers created by the user himself while sharing media. It did not discuss how users can be protected from other peoples’ media. It is prevalent for most of the research work done in this area. Besmer et al present work which allows users that are tagged in photos to send a request to the owner to hide the linked photo from certain people. This approach also follows the idea that forewarned is forearmed and that creating awareness of critical content is the first step towards the solution of the problem. However the work relies on direct technical tags and as such does not cover the same scope asthe privacy watchdog suggested in the paper. Work that also takes into account other users’ media ispresented by Squicciarini et al. who postulate that mostof the shared data does not only belong to a single user. Therefore Squicciarini et al. propose a system to share the ownership of media items and by that strive to establish a collaborative privacy management for shared content. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 29
  36. 36. Big Data Privacy Issues in Public Social Media Squicciarini et al. Prototype is implemented as a Facebook app and is based on game theory, rewarding users that promote co-ownerships of media items. While the work does take into account other users’ media, unlike the approach it does not cope with previously unknown and unrelated users. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 30
  37. 37. Big Data Privacy Issues in Public Social Media REFERENCES [1] S. Ahern, D. Eckles, N. Good, S. King, M. Naaman, and R. Nair. Overexposed?:privacy patterns and considerations in online and mobile photosharing. In Proceedings of the SIGCHI conference on Human factorsin computing systems, pages 357–366, 2007. [2] C. a. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and P. Samarati. An Obfuscation-Based Approach for Protecting Location Privacy.IEEE Transactions on Dependable and Secure Computing, 8(1):13–27,June 2009. [3] A. Beresford and F. Stajano. Location privacy in pervasive computing.IEEE Pervasive Computing, 2(1):46–55, Jan. 2003. [4] A. Besmer and H. Richter Lipford. Moving beyond untagging. In Proceedings of the 28th international conference on Human factors incomputing systems - CHI ’10, page 1563, Apr. 2010. [5] D. Boyd. Facebook’s privacy trainwreck: Exposure, invasion, and social convergence. Convergence: The International Journal of Research into New Media Technologies, 14(1):13–20, 2008. [6] d. Boyd and K. Crawford. Six Provocations for Big Data. SSRNeLibrary, 2011. [7] D. Boyd and E. Hargittai. Facebook privacy settings: Who cares? FirstMonday, 15(8), 2010. [8] Carnegie Mellons Mobile Commerce Laboratory. Locaccino - a usercontrollable location-sharing tool. http://locaccino.org/, 2011. [9] E. Eldon. New Facebook Statistics Show Big Increase in Content Sharing, Local Business Pages. http://goo.gl/ebGQH, February 2010. [10] Facebook. Statistics. 2011. http://www.facebook.com/press/info.php? statistics. [11] L. Fang and K. LeFevre. Privacy wizards for social networking sites.In Proceedings of the 19th international conference on World wide web- WWW ’10, page 351. ACM Press, Apr. 2010. [12] Flickr. Camera Finder. http://www.flickr.com/cameras, October 2011. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 31
  38. 38. Big Data Privacy Issues in Public Social Media [13] B. Gedik and L. Liu. Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms. IEEE Transactions on MobileComputing, 7(1):1–18, Jan. 2008. [14] M. Mannan and P. C. van Oorschot. Privacy-enhanced sharing ofpersonal content on the web. In Proceeding of the 17th international conference on World Wide Web - WWW ’08, page 487. ACM Press,April 2008. [15] Metadata Working Group. Gartner special report –patternbasedstrategy: Getting value from big data. www.gartner.com/patternbasedstrategy, 2012. [16] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy managementin social networks. In Proceedings of the 18th international conference on World wide webWWW ’09, page 521. ACM Press, Apr.2009. [17] Viewdle. SocialCamera. http://www.viewdle.com/products/mobile/index.html. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 32

×