Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy Mathieu d'Aquin, SalmanElahi, Enrico Motta Knowledge Media Institute, The Open University, UK
Stating the obvious Personal information exchange on the Web is Big Heterogeneous Distributed Fragmented Sometimes implicit
Challenges to individuals Lack of control over personal information In sum, we don’t know the most important things about our personal data What are all the websites that know my e-mail address? What does amazon.co.uk or the website of my favorite airline know about me?
Why this is important Because these things are useful to know in general Because these things can tell us a lot about our own behavior, our attitudes towards information sharing and exchange Because this behavior has strong implications in terms of privacy and defines our trust relationships with website online
So, what do we do? Unrestricted monitoring of information exchange on the Web by an individual user Building a semantically represented and processable datasets of what was shared and with who Analyze these datasets in terms of building models of the user’s behavior related to privacy, levels of trust given to websites levels criticality associated to different pieces of data
Local Logging Proxy HTTP Requests HTTP Requests Local Web Agents (e.g., browser) External Web Sites HTTP Responses HTTP Responses Web Exchange RDF Logs Interaction Patterns Personal Information HTTP Ontology
<Request rdf:about="#request-1257949232709-1257949233757"> <startedAt>1257949232709</startedAt> <endedAt>1257949233757</endedAt> <origin rdf:resource="127.0.0.1" /> <onPort>80</onPort> <toHostrdf:resource="api.facebook.com" /> <method rdf:resource="POST"/> <toURLrdf:resource="http://api.facebook.com/restserver.php" /> <HTTPVersionrdf:resource="HTTP-1.1" /> <Host rdf:resource="api.facebook.com" /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /> <Refererrdf:resource="app:--TweetDeck.swf" /> <X-Flash-Version rdf:resource="10.0.32.18" /> <Accept rdf:resource="*--*" /> <Accept-Language rdf:resource="en-us" /> <Accept-Encoding rdf:resource="gzip._deflate" /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357 565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055 59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71 54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1 =9874874320812" /> <Content-Length rdf:resource="984" /> <Connection rdf:resource="keep-alive" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /> <response> <Response rdf:about="#response-1257949232709--1257949233757"> <HTTPVersionrdf:resource="HTTP--1.0" /> <responseCoderdf:resource="200_OK" /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate. _post-check=0._pre-check=0" /> <Content-Type rdf:resource="application--json" /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /> <Pragmardf:resource="no-cache" /> <Content-Encoding rdf:resource="gzip" /> <Content-Length rdf:resource="5943" /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /> </Response></response> </Request> Ran over a period of 2.5 months yielded around 100 Million triples, representing about 3 Million HTTP requests. Encodes all the info related to HTTP requests and responses. Data sent and received stored separately.
Focusing on personal data exchange Extract information sent through parameters of HTTP Requests http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2F format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v =1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM %20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1 %29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Fti me%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20messa ge%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%2 0type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20F ROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27ne wsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28cr eated%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5 Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=1 2565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9 e1a5ec6c5768%2D605559235 Map this data onto a representation of a user profile (set of attributes of personal data)
Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data
What this tells us about Trust and Criticality of data 36 attributes, 1,080 values, to 123 domains A model of what piece of personal information was sent where (can answer the questions) Taking the point of view of an external observer, we can derive an observed model of trust and criticality of data If this piece of data is critical to you and you give it to bob, you must trust bob If you give this piece of data to many untrusted people, you probably don’t consider it critical The goal being to help the user to better understand his own behavior
The model formally Trust in a domain = max of criticality of data it received Criticality of a piece of data= 1 / 1 + Σ (1- trust in websites that received the data) Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5
Interacting with the model Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior
What we can do with this Help a user understand his own data exchange Compare websites and data in terms of the observed trust and criticality “Correct” the model by re-aligning it with the intended behavior Detect fundamental conflicts between the observed behavior and the intended behavior Observe correlations in the data
Where that leads us 1 first tools exploiting logs of personal Web activity Demonstrate the need for better ways to personal information management as personal Web data exchange Need to exploit and integrate local and external sources of data together to create new mechanisms supporting individuals in interpreting, understating and managing their information online