• Save
Greschbach 2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Greschbach 2012

on

  • 527 views

 

Statistics

Views

Total Views
527
Views on SlideShare
527
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Greschbach 2012 Document Transcript

  • 1. The Devil is in the Metadata –New Privacy Challenges in Decentralised Online Social NetworksBenjamin Greschbach, Gunnar Kreitz and Sonja BucheggerKTH Royal Institute of TechnologySchool of Computer Science and CommunicationStockholm, Sweden{bgre, gkreitz, buc}@csc.kth.seAbstract—Decentralised Online Social Networks (DOSN) areevolving as a promising approach to mitigate design-inherentprivacy flaws of logically centralised services such as Facebook,Google+ or Twitter. A common approach to build a DOSN is touse a peer-to-peer architecture. While the absence of a singlepoint of data aggregation strikes the most powerful attackerfrom the list of adversaries, the decentralisation also removessome privacy protection afforded by the central party’s in-termediation of all communication. As content storage, accessright management, retrieval and other administrative tasks ofthe service become the obligation of the users, it is non-trivial tohide the metadata of objects and information flows, even whenthe content itself is encrypted. Such metadata is, deliberatelyor as a side effect, hidden by the provider in a centralisedsystem.In this work, we aim to identify the dangers arising or mademore severe from decentralisation, and show how inferencesfrom metadata might invade users’ privacy. Furthermore, wediscuss general techniques to mitigate or solve the identifiedissues.I. INTRODUCTIONAs people use Social Network Services (SNS) to organisetheir social life, privacy issues are an inherent concern inthese services. Currently, a user must trust the SNS providerto enforce access rights management, not to misuse theprovided content, and to be sufficiently secured against third-party attacks. For today’s popular SNS providers, however,“people are not customers, but primarily products” [1]. Theirbusiness model is based on targeted advertisements, andthey have an infamous history of data leakages and privacybreaches.In response to these shortcomings, Decentralised OnlineSocial Networks (DOSN) have been proposed. There is awide range of designs spanning from centralised to de-centralised network architectures. In the completely decen-tralised approaches, the users themselves form a peer-to-peer(P2P) network in order to collaboratively provide the storageand communication infrastructure for the social networkservice. Access control for published content is enforced bycryptographic means so that users need not rely on policiesor the benignity of a central provider. In addition, userskeep the physical ownership of their content, which preventscensorship, yields higher resilience with respect to networkoutages, and facilitates data portability.When solving the privacy issues of the centralised systemby moving to a decentralised design, however, new pri-vacy challenges arise. Simply encrypting the content is notenough to hide all sensitive information from attackers, andalthough a powerful central provider is not present in thesekinds of systems, several other adversary models becomerelevant.We remark that although several of the issues raised inthis paper have been previously mentioned in the literature,the focus in SNS privacy research has been on contentconfidentiality and on removing the threat that central SNSproviders pose. While this was an important development,we believe that the logical next step is to systematicallystudy the effect of distributing the power of the providerover several entities and examining the possibilities forinferences that persist despite content encryption, includingtraffic analysis issues in this new context. Failing to protectagainst even a single one of these threats can lead to seriousprivacy breaches in an otherwise secure system.A. Our ContributionsIn this paper we highlight the new privacy challenges thatarise once a centralised SNS is replaced by a DOSN. Specif-ically, we systematically discuss possible privacy breachesstemming not from the content itself but from its metadata(like size or structure) or data handling (such as communi-cation flows). Furthermore, we discuss the role of differentadversaries in DOSNs. Finally, we summarise approaches tomitigate these problems, including those suggested by pro-posed DOSN implementations. To the best of our knowledgethere is no solution dealing with the whole range of theproblems we discuss.B. Paper OutlineThe rest of the paper is organised as follows. Afterreferring to related work in Section II, we sketch the differentmodels of SNS implementations including relevant attackersin Section III. Next, Section IV lists the possible metadataprivacy leakages, that is sensitive information which can beFourth International Workshop on SECurity and SOCial Networking, Lugano (19 March 2012)978-1-4673-0907-3/12/$31.00 ©2012 IEEE 333
  • 2. inferred even when the content does not leak. Section Vdiscusses countermeasures to approach these new challengesbefore Section VI concludes the paper.II. RELATED WORKResearch related to the scope of this paper can be foundin mainly three areas: privacy issues in SNS, decentralisedonline social networks, and metadata privacy in general.The impact of SNS on their users’ privacy has beenextensively studied. Gross et al. [2] have identified severalthreats of SNS usage such as stalking; de-anonymisationof external sensitive sources such as anonymised medicalrecords; identity theft, e. g. by social insurance numberreconstruction; user profiling by building a digital dossierand simplified social engineering. Danezis et al. [3] point outthat the position of a user in a social network reveals charac-teristics about the person, such as their status and potentialinfluence reach. Paul et al. [4] underline the consequencesof massive central data aggregation in conjunction with anadvertising-based business model of major SNS providers.They warn against the risks of direct misuse or unintendedleakage of this data that is not appropriately protected andhard to anonymise. Krishnamurthy and Wills [5] show thatrelevant leaks do occur in practice.One main approach to address the privacy issues in SNSis decentralisation. Buchegger et al. [6] propose the PeerSoNsystem where (encrypted) content is distributed using aP2P network formed by the users of the SNS. Aiello andRuffo [7] elaborate on a Distributed Hash Table (DHT)based architectural framework supporting SNS functionality.They propose authentication on the routing level and discussimplementations of SNS requirements such as access con-trol, reputation management and search operations. Cutilloet al. [8] introduce Safebook, an architectural approach fo-cusing on communication anonymisation. Content is storedat trusted friend nodes and requests are routed through amix-network formed by social links to obfuscate informationflow. Sharma and Datta [9] describe SuperNova, that is basedon a hybrid network architecture where highly available“super peers” are used for crucial tasks such as helping newusers to join the network. The Persona project by Badenet al. [10] proposes the use of an attribute-based encryptionscheme for social network operations. Finally, Bodriagovand Buchegger [11] scrutinise the proposals for DOSN-tailored encryption schemes and evaluate their performancefor SNS operations.One instance of a metadata privacy leakage in the contextof SNS is mentioned by Anderson et al. [12]. They point outthe threat of a friend learning about the existence of contentshe does not have access to. Chew et al. [13] identify threepossible leakages in SNS that are not caused directly by thedisclosure of content: entries in the user’s activity stream thatwere automatically generated based on the user’s activities(also on third party sites); unwanted linkage of different setsof user data; and identifying inferences by merging socialgraphs. Traffic analysis, extracting and inferring informationfrom network metadata, (see e. g. Danezis and Clayton [3]for an overview) is one of the attack techniques we consider.III. DECENTRALISING SOCIAL NETWORKSOn an abstract level and following a minimalistic def-inition (e. g. [14]), we assume an SNS to be merely anintegration of user generated content with social relationshipinformation.The latter is used mainly for access control, data pre-sentation, and friendship announcements. Content comprisesall active contributions of a user to the system, static (suchas profile attributes) as well as more dynamic ones (suchas status updates, text-, picture-, video- or link-posts). Italso includes interactions such as comments or simple like-indications in response to posts, enrichments of posts withsocial links (e. g. tags in pictures) as well as asynchronousor synchronous messaging (e. g. private messaging, chats).Timestamped notifications about this data are usually auto-matically pushed to the user in a “news feed”.In a concrete implementation of such a system, the degreeof centralisation of control over user data is an architecturaldesign choice that impacts both possible privacy leaks andtypes of attackers.A. ArchitecturesConsidering the proposed DOSN implementations in theliterature, one can observe a broad range of topologiesrather than a bipartite division between fully centralised andfully decentralised systems. Several hybrid approaches (e. g.Diaspora1) use dedicated, semi-trusted nodes to addressavailability, bootstrapping and other issues that are difficultto solve in a flat P2P network. For the rest of this paper,however, we focus on the differences between the twoextreme cases of logically centralised designs (Facebook orGoogle+) and completely decentralised approaches (Peer-SoN, Safebook, or Persona).In the centralised case, the relevant agents are the SNSprovider and the users of the SNS, with all communicationbetween users relayed by the central provider. In the decen-tralised approach, the provider is replaced by a P2P networkformed by the users.P2P approaches are varied, but here we sketch a simplifiedexample system. A user herself hosts all content she posts.To ensure availability even when she is offline, her content isalso replicated by a number of storage nodes. Access controlcan be implemented either by having the user and storagenodes requiring authorisation to serve data, or by encryptingobjects such that only authorised users can decrypt thecontent. We focus on the latter type of design, where anyuser can request any encrypted data. This means that storage1http://diasporaproject.org/334
  • 3. (a) All communication is relayedby the central provider.(b) Besides direct peer communica-tion, several nodes can be involved.Figure 1. Communication flows in a) centralised and b) decentralised SNSnodes need not be trusted and need not be informed of ACLrules for the content they replicate.Another type of service provided by nodes to each otherin the P2P scenario is relaying traffic in the overlay. Thismay include forwarding traffic for two nodes who cannotdirectly connect due to firewall restrictions, implementing aDHT, or to anonymise communication. We refer to nodesacting in such capacities as relay nodes. We illustrate thedifferent abstract communication flows in Figure 1.B. Privacy Advantages of DecentralisationThe most important privacy advantage of a decentralisedsystem is the absence of a central point of data aggregation.In the case of a centralised system with unencrypted storage,the provider can mine the data without limitations and inferinformation from both the content and metadata. In additionto deliberate privacy infringements, also by disclosure tothird parties, centralised data collections are vulnerable toaccidental leaks caused by inadvertent insider behaviour orattacks on the system. In a centralised system that employsuser-side encryption to protect the content (e. g. Pidder2), theprovider only observes metadata. When drawing inferencesfrom it, the provider is, however, in the best position possibleas it has a complete view of all users of the system at alltimes.In a P2P system, data might be replicated by friends (e. g.Safebook) or random strangers (e. g. SuperNova), but nosystematic accumulation of user data occurs.C. New Challenges from DecentralisationWhile removing the single point of data aggregationconstitutes a general advantage of more decentralised archi-tectures, there are also several drawbacks and new privacychallenges when building on a P2P network. In a centralisedsystem the users’ content is entrusted to a single partythat only gives access to entitled principals. Deliberatelyor as a side effect, this intermediation procedure hidesmetadata information from requesting users. Thus, whilethe operator of a centralised system can learn significantinformation from metadata (and content, if not encrypted),such information is hidden from everyone else.2http://pidder.com/In the decentralised systems we consider, several partiesare involved in storing and communicating user content, andauthorisation is performed via encryption of the data. Thisaggravates the problem of metadata privacy leakage becausemore parties can access such information as illustrated inseveral examples in Section IV. Unless the decentralisedsystem is carefully designed, it may admit similar privacyinvasions from peers in the system or third parties requestinglarge amounts of data as were possible by the centralprovider, thus weakening the privacy motivation for selectinga decentralised design.A new threat that arises from metadata in a decentralisedsystem is that of a more powerful friend adversary (an at-tacker that exploits its social ties to the user). One feature ofthe friend adversary contributes eminently to this problem:friends have more background knowledge related to the user– not necessarily acquired only via SNS communication –that enables them to accomplish effective inference attackseven on sparse raw data. If a friend for example knows abouta couple of preferred places the user usually visits, coarseIP address based location information suffices to determinethe user’s exact geographic location with high probability.Additionally, traffic analysis yields more information ina decentralised system where information is exchangeddirectly between communicating parties. The intermediationof very high volumes of communication via a few datacentres by centralised solutions serves to hide traffic patternsagainst outside adversaries (but not against the provider).In a fully decentralised setting, it is also more difficult toenforce a limit on the rate at which data can be requested.This may allow multiple third parties to collect significantamounts of public information from the DOSN. While suchinformation is by definition public, aggregating and indexinga massive amount of it can constitute a privacy invasion.D. Adversary ModelsWe distinguish between different adversaries in the con-text of SNS by their functional power resulting from theirrole and position in the network.Relay nodes and storage nodes can make use of theirspecial role and position in the network. Relay nodes caneasily observe all traffic they forward for other nodes, andstorage nodes can analyse the data entrusted to them as wellas log all requests they receive. Friends of a user – or othersocially close nodes like friends of a friend – can try toobtain more information than what the user chose to sharewith them. This can be done by exploiting the way datastorage, encryption and communication is implemented ina DOSN. Having additional background knowledge aboutthe user and possibly incentives for targeted attacks canturn a friend into a powerful attacker. Network sniffers whoobserve communication traffic at an arbitrary location in thenetwork constitute another category of possible attackers.335
  • 4. Harvesters are nodes that simply request data from thesystem to learn from the metadata they receive.In order to compare the decentralised system architecturewith the centralised SNS, we also list the central SNSprovider as an adversary. If present, it constitutes the mostpowerful attacker possible because it observes all contentand communication from all users of the system. Even ifthe content is encrypted, the provider still has a completepicture of communication traffic and content metadata.The adversary types discussed here have access to dif-ferent data. Here, we consider five categories of data thatan adversary may exploit. An adversary may learn accesspatterns, that is information about when content is requestedor modified. She may be able to access ciphertext repre-sentations of content. She may have intimate backgroundknowledge about the victim. She may see all or a fractionof the victim’s network traffic, and be able to relate itto the victim, which we refer to as micro-scale networkaccess. Finally, she may have a global but incomplete viewof network traffic in the system that we call macro-scale.Table IADVERSARY CAPABILITIESRelay Stor. Friend Sniff. Harv. Cent.Access patternCiphertextsBackgr. know.Net, microNet, macroWe summarise the capabilities of adversaries in Table I.From this overview, it can be seen that no single class ofattacker is as powerful as the central adversary, but unlessthe system design adequately addresses metadata concerns,the new attackers may be as powerful as the central one.Moreover, it is significantly easier to position oneself as anadversary in a decentralised system than in a centralised one.Collusions of several agents in the network also need tobe considered. This includes a single agent having severalof the roles outlined above (e. g. being both a friend and astorage node), and an adversary paying the cost to operatea large number of nodes.IV. INFERENCES FROM METADATABy metadata privacy leakages we mean disclosures ofsensitive personal information that do not stem from thecontent of published data but from properties of it (such assize or structure) or information generated while managingit (like communication flows).Possible inferences from metadata can invade a user’sprivacy in the same way as sensitive personal informationobtained from posted content. This includes identifyinginformation (directly or indirectly), general descriptive data(interests, political attitudes, health condition, etc.) as wellas more SNS-specific information such as social relation-ship data (number of friends, nature of relations, etc.), orbehavioural data (activity, location, etc.).While the DOSN approach is a substantial improvementcompared to common centralised systems, we want to il-lustrate which threats to the users’ privacy still remain andwhich new challenges arise. In the following we assumethe SNS to be decentralised with all content and commu-nication encrypted. We further assume that it is correctlyimplemented and perfectly protects the content. Besides that,we only consider a na¨ıve design of the DOSN and individualworst cases in order to give a comprehensive overview of thepossible problems. That implies that there are easy fixes forsome of the raised issues – this, however, is discussed laterin Section V. We have chosen not to study any particularproposed system, as source code for these is not generallyavailable, or only in beta version.A. Inferences from Stored ContentWhile encrypting the content solves many important pri-vacy concerns, there still remain possibilities of privacyleakages from the stored data. The size, structure, and(implicit) modification time of the ciphertexts may revealinformation that the user originally intended to hide. In thefollowing we give examples for each of these properties.1) Size: The size of an object’s ciphertext is an indicatorfor the content type of the stored object (e. g. like-flag,text, image, video). Additionally, characteristics such asan estimated word count or the length of a video can beinferred. Moreover, the size of larger files, such as video,may be reasonably unique (at least among objects postedduring a given time period or from a specific region). Suchuniqueness could allow the ISP of a regime sniffing thenetwork to trace which users have re-shared a forbiddenvideo on the DOSN by simply looking for posts of objectswith the exact same size as the video at issue. This typeof attack requires access to ciphertexts or network traffic,either micro-scale or macro-scale.2) Structure: An adversary may infer not only the sizeof single objects, but also statistical information about a setof objects of a certain kind, like the number of objects ina list (e. g. unencrypted documents with data references inPersona). Linking this knowledge with information aboutthe content type leads to another form of metadata privacyleakage – revealing the number of pictures in an album or thenumber of comments to a post. This attack requires accessto ciphertexts.Assume for example a user sharing pictures of her recentholidays with friends. Being asked for them at work, shedecides to grant access to a subset of them to a colleague.Inferring from the data structure that there are more objectsin the album than he can decrypt, the colleague learns theexact number of pictures that are hidden from him.336
  • 5. 3) Modification History: Once the storage location of aspecific object and some general information about its typeare identified by an adversary, monitoring the ciphertext forchanges reveals possibly sensitive information. The modi-fication history can for example tell something about thefrequency of a user’s status updates, the intensity of hercommenting activity, or other general usage patterns. Thisattack requires access pattern information, or, with polling,access to ciphertexts.Assume a friend observes frequent modifications of anobject, identified as the user’s encrypted status update rep-resentation. While the version displayed to her does notchange, the friend learns that she is excluded from at leastparts of the user’s updates.B. Inferences from Access Control MechanismsOne reason for a user to provide social relationshipinformation is the realisation of fine-grained access controlmechanisms. Depending on the implementation of thesemechanisms, the chosen access right settings might allowconclusions about a user’s social relationships to be drawn,as we outline in this section.1) Encryption Header: If an access control list (ACL)or other cryptographic key material is stored together withthe encrypted object – e. g. in a prepended header – thesize of this header can allow inferences about the identityor number of individuals who can access the content. Thisattack requires access to ciphertexts.Exploiting the same feature either for a central object ofa user – like her wall representation – or a representativeset of content objects belonging to her, can reveal the totalnumber of friends the user has.2) Key Distribution: Adding a new friend or revokingaccess rights of an existing friend will – depending on theencryption scheme and implementation – trigger re-keyingand/or key distribution mechanisms that can be observedeven by users who are not subject to the relationship changeitself. This attack requires micro-scale network access.When combined with background knowledge, significantlymore revealing conclusions can be drawn.Assume a user expels another user from her circle offriends. If a new group key is sent to her remaining friends,an adversary observing this revocation can, together withbackground information about the user’s social relationships,infer the specific person that was removed.3) Key Reutilisation: If the same key or encryptionheader is used for several objects, even adversaries whocannot decrypt the content, trivially learn that the sameaccess rights are in place for these objects. This informationcan be exploited in several ways. Mapping out relations fora large number of objects might allow inferences about thestructure of a user’s friend circle. A friend, who has accessto the objects and observes another user reacting to one ofthem (e. g. by a comment or a like-flag), immediately learnsthat this user has access to all the other objects as well.This attack requires access to ciphertexts. Backgroundknowledge enhances the attack.C. Inferences from Communication FlowsAn adversary can gain additional insights into a user’sactivities by capturing network traffic that is related to theuser. This might be performed by an external network snifferas well as persons related to the user, e. g. a node that ishosting some of the user’s content and observes the accesslogs.1) Direct Connections: SNS-related network traffic canalready on a very low protocol level (e. g. IP headerinformation) reveal sensitive information. In the case ofdirect communication with the user’s device – a commonscenario in P2P architectures – the IP address of this user istrivially obtained and can be tracked over time. This allowscorrelating with activities of the user on other internet ser-vices like file sharing or voice-over-IP (possible even whenlocated behind a NAT, see [15]), determining geographiclocation information about the user via geo-IP mappings, orinferring general usage patterns, such as the user’s onlinetimes or working habits (when does the user connect fromwhich device). This attack makes use of network access,either micro-scale or macro-scale. Background knowledgeallows more precise conclusions to be drawn.2) Content Requests: The access logs of content thatan adversary is hosting or providing to the user disclosethe user’s requests for specific objects – therefore actingas implicit reading receipts for new content – and mightallow general profiling of the user’s interests. Moreover,observing a set of users’ access patterns has the potentialto identify the ownership as well as possible access rightsof content objects. Companies may find it profitable tooperate a large number of storage nodes in order to monitorrequests. For instance, an insurance company may attemptto identify users accessing content posted in groups relatedto cancer or other diseases. This attack requires accesspattern information, or network access, either micro-scaleor macro-scale.3) Content Sharing: Storage nodes as well as sniffers thatcapture traffic to these can easily observe upload activity.This includes the frequency of changes to stored content andmight allow similar conclusions as sketched in Section IV-A.Furthermore, timing-based inferences are a possible way toinfer access rights if, for example, the distribution of keymaterial to a set of other users is observed shortly after anew content object was uploaded. By monitoring the uploadactivity of several users, sniffers might moreover learn own-ership relations between the stored content and the uploadingusers. This attack requires access pattern information, ornetwork access, either micro-scale or macro-scale.4) Control Messages: Depending on the protocol imple-mentation, specific user operations such as login, adding337
  • 6. friends, search requests, etc. can yield certain patterns ofcontrol messages a sniffer can observe and thus infer thekind of operation. The login procedure of a user maycomprise polling friends for updates that happened whilethe user was offline, communicating with storage nodes orsimilar characteristic sequences of administrative operations.This attack requires network access, either micro-scale ormacro-scale.V. COUNTERMEASURESThere exist several approaches to mitigate the describedmetadata privacy leakages but to the best of our knowledgeno comprehensive concept to cope with them all. In thefollowing, we discuss solutions from the DOSN literatureas well as from other fields.A. Stored ContentTo hinder inferences based on the size of ciphertexts,padding is one way to obfuscate the exact content length.That might help against fingerprinting objects by size butmay still allow inferences about the content type from theorder of magnitude. Another strategy could be to split upcontent objects into blocks of uniform sizes and hide theirconnection (e. g. [12]). The latter is, however, non-trivialespecially against an adversary performing communicationflow analysis.To hide the structure of composite or related storageobjects, an encryption scheme that conceals not only thecontent of the single objects but also indices and links is onesolution. To not solely rely on encryption for authorisation,but use it as one of several layers is another approach. Ifsemi-trusted storage nodes perform additional access rightvalidations before delivering encrypted objects, adversariesnot involved in storage cannot retrieve the ciphertext or themetadata information. However, this comes with the trade-off that the storage nodes must be given more explicit access-right information about the objects they keep. Additionally,dummy list entries and placeholder values for fixed fieldscan prevent an adversary from determining if values havebeen set in a user profile.Assuming an insider adversary model (e. g. the storagenode itself), hiding the modification history of a contentobject is very difficult. Baden et al. [10] propose to ob-fuscate the role of a storage object (e. g. a status updatedocument) for that reason. Another way to conceal user-triggered changes can be the introduction of noise in themodification process, but dummy-change operations can bequite costly in terms of performance for an SNS system.B. Access Control MechanismsMost of the presented access control related leakagescan be approached with more sophisticated cryptographicschemes. In Persona attribute-based encryption (ABE) isused to realise group encryption without encrypting thesymmetric content key with the public keys of all recipients.Groups defined by one user can even be reused by otherusers without them learning the explicit recipient list (andthereby enabling friend-of-a-friend access schemes). Theattribute access structure stored with the object, however,might still allow inferences about the audience, e. g. by theattribute names carrying semantic meaning. The PeerSoNproject suggest to use broadcast encryption schemes thathave hidden access structures. Encryption headers in thatcase do not reveal anything about the audience of thecontent.C. Communication FlowsIn the literature, several protection mechanisms againstcommunication flow analysis can be found – general onesas well as some explicitly related to SNS. Mix-networklike communication anonymisation is a central part of theapproach of the Safebook project. Information flows areobfuscated by routing them through a mix of socially relatednodes, starting by those that are assumed to be most trusted.Caching can also mitigate communication flow leakages byminimising message exchange in general and decorrelatingit from specific user actions. Obfuscation by noise – e. g.introducing dummy traffic – comes with the cost of highernetwork load but might be required in situations where othermeans are not applicable or not effective. Careful protocoldesign can help mitigate leakages as well by making controlmessages indistinguishable from content bearing messages.Another approach is to make the DOSN protocol andcommunication patterns difficult to distinguish from someexisting high-volume P2P protocol, such as BitTorrent.VI. CONCLUSIONTable II summarises the critical metadata in DOSNs andpossible privacy leakages identified in Section IV as wellas the approaches to tackle these problems discussed inSection V.We conclude that while DOSNs have great potential tomitigate inherent privacy flaws of today’s centralised SNS,simply encrypting the content is not sufficient. Metadatainformation like inferences from storage objects, accesscontrol mechanisms or traffic has the potential to expose theuser to severe privacy threats. Furthermore, new adversariesenter the stage when the SNS has no single provider, asthe decentralised network architecture exhibits more diversepoints of attack. Some of these attacks are easy to protectagainst but when implementing a DOSN, these issues haveto be considered in a systematic manner in order to offercomprehensive privacy protection.For future work, we plan to further investigate the specialcharacteristics of the friend-adversary model. The aim isto gain a better insight into which inferences are possiblefor a socially close attacker in a DOSN setting where onlysparse sensitive data but extensive background knowledge is338
  • 7. Table IISUMMARY OF METADATA LEAKAGESCategory Metadata Possible Inferences CountermeasuresStored Content Size content type (image, text), content property (word-count), content fingerprintpadding, uniform block sizesStructure list-sizes (number of comments, number of picturesin an album, ...)clean encryption headers, two-layer authorisation,placeholder entriesModification History frequencies of status updates, commenting, etc. noise (dummy change operations)Access ControlMechanismsEncryption Header content audience size or even identities, number offriendsadapted encryption schemes (attribute-basedencryption, broadcast encryption)Key Distribution friend status changesKey Reutilisation same content audienceCommunicationFlowsDirect Connections IP address (usage of other services, geographic loca-tion), usage patterns (online times, working habits)communication anonymisation (mix-networks),caching, noise (dummy traffic), careful protocoldesignContent Requests reading receipts, interest profiling, information ob-ject ownershipContent Sharing upload activities, access rights (by timing-basedattacks), information object ownershipControl Messages specific user operations (login, friend management)available. Furthermore, we plan to evaluate the efficacy ofthe discussed countermeasures for concrete DOSN imple-mentations when more mature code becomes available.VII. ACKNOWLEDGEMENTSThis research has been funded by the Swedish Founda-tion for Strategic Research grant SSF FFL09-0086 and theSwedish Research Council grant VR 2009-3793.REFERENCES[1] S. G¨urses and al., “SPION D2.1 - State of the Art,” in SBOSecurity and Privacy for Online Social Networks, S. G¨urses,Ed., 2011.[2] R. Gross and A. Acquisti, “Information revelation and privacyin online social networks,” in WPES. ACM, 2005, pp. 71–80.[3] G. Danezis and R. Clayton, “Introducing traffic analysis,”in Digital Privacy: Theory, Technologies, and Practices,A. Acquisti, S. Gritzalis, C. Lambrinoudakis, and S. D. C.di Vimercati, Eds. Auerbach, 2007, ch. 5, pp. 95–116.[4] T. Paul, S. Buchegger, and T. Strufe, “Decentralized socialnetworking services,” in Trustworthy Internet, L. Salgarelli,G. Bianchi, and N. Blefari-Melazzi, Eds. Springer, 2011,ch. 14, pp. 187–199.[5] B. Krishnamurthy and C. E. Wills, “On the leakage of per-sonally identifiable information via online social networks,”SIGCOMM Comput. Commun. Rev., vol. 40, pp. 112–117,2010.[6] S. Buchegger, D. Schi¨oberg, L.-H. Vu, and A. Datta, PeerSoN:P2P social networking: early experiences and insights. ACMPress, 2009.[7] L. M. Aiello and G. Ruffo, “Secure and flexible frameworkfor decentralized social network services,” IEEE PERCOM,pp. 594–599, 2010.[8] L. A. Cutillo, R. Molva, and T. Strufe, “Safebook: A privacy-preserving online social network leveraging on real-life trust,”IEEE Communications Magazine, vol. 47, no. 12, pp. 94–101,2009.[9] R. Sharma and A. Datta, “Supernova: Super-peers basedarchitecture for decentralized online social networks,” ArXive-prints, 2011.[10] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, andD. Starin, “Persona: an online social network with user-defined privacy,” SIGCOMM Comput. Commun. Rev., vol. 39,pp. 135–146, 2009.[11] O. Bodriagov and S. Buchegger, “Encryption for peer-to-peersocial networks,” in IEEE SPSN, 2011.[12] J. Anderson, C. Diaz, J. Bonneau, and F. Stajano, “Privacy-enabling social networking over untrusted networks,” inWOSN. ACM, 2009.[13] M. Chew, D. Balfanz, and B. Laurei, “(Under)mining privacyin social networks,” in IEEE W2SP, 2008.[14] A. Lenhart and M. Madden, “Social networking websites andteens: An overview,” Pew Internet project data memo, vol. 13,2007.[15] S. L. Blond, Z. Chao, A. Legout, K. Ross, and W. Dabbous,“I know where you are and what you are sharing: ExploitingP2P communications to invade users’ privacy,” in InternetMeasurement Conf., 2011.339