Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy


Published on

Presentation at the SPOT 2010 workhop on Provacy and Trust on the Social and Semantic Web.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

  1. 1. Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy<br />Mathieu d'Aquin, SalmanElahi, Enrico Motta<br />Knowledge Media Institute, The Open University, UK<br />
  2. 2. Stating the obvious<br />Personal information exchange on the Web is <br />Big<br />Heterogeneous<br />Distributed<br />Fragmented<br />Sometimes implicit<br />
  3. 3. Challenges to individuals<br />Lack of control over personal information<br />In sum, we don’t know the most important things about our personal data<br />What are all the websites that know my e-mail address?<br />What does or the website of my favorite airline know about me?<br />
  4. 4. Why this is important<br />Because these things are useful to know in general<br />Because these things can tell us a lot about our own behavior, our attitudes towards information sharing and exchange<br />Because this behavior has strong implications in terms of privacy and defines our trust relationships with website online<br />
  5. 5. So, what do we do?<br />Unrestricted monitoring of information exchange on the Web by an individual user<br />Building a semantically represented and processable datasets of what was shared and with who<br />Analyze these datasets in terms of building models of the user’s behavior related to privacy, <br />levels of trust given to websites <br />levels criticality associated to different pieces of data <br />
  6. 6. Local Logging<br />Proxy<br />HTTP Requests<br />HTTP Requests<br />Local Web Agents <br />(e.g., browser)<br />External Web Sites<br />HTTP Responses<br />HTTP Responses<br />Web Exchange <br />RDF Logs<br />Interaction Patterns<br />Personal Information <br />HTTP Ontology<br />
  7. 7. <Request rdf:about="#request-1257949232709-1257949233757"><br /> <startedAt>1257949232709</startedAt><br /> <endedAt>1257949233757</endedAt><br /> <origin rdf:resource="" /><br /> <onPort>80</onPort><br /> <toHostrdf:resource="" /><br /> <method rdf:resource="POST"/><br /> <toURLrdf:resource="" /><br /> <HTTPVersionrdf:resource="HTTP-1.1" /><br /> <Host rdf:resource="" /><br /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /><br /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App<br />leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /><br /> <Refererrdf:resource="app:--TweetDeck.swf" /><br /> <X-Flash-Version rdf:resource="" /><br /> <Accept rdf:resource="*--*" /><br /> <Accept-Language rdf:resource="en-us" /><br /> <Accept-Encoding rdf:resource="gzip._deflate" /><br /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357<br />565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm<br />ccn= (referral)||utmcct=--tos.php|utmcmd=referral;_c_user=6055<br />59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71<br />54b9e228c29;_lsd=P3Zpn;;_lxs=3;_s_vsn_facebookpoc_1<br />=9874874320812" /><br /> <Content-Length rdf:resource="984" /><br /> <Connection rdf:resource="keep-alive" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /><br /> <response><br /> <Response rdf:about="#response-1257949232709--1257949233757"><br /> <HTTPVersionrdf:resource="HTTP--1.0" /><br /> <responseCoderdf:resource="200_OK" /><br /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.<br />_post-check=0._pre-check=0" /><br /> <Content-Type rdf:resource="application--json" /><br /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /><br /> <Pragmardf:resource="no-cache" /><br /> <Content-Encoding rdf:resource="gzip" /><br /> <Content-Length rdf:resource="5943" /><br /> <X-Cache rdf:resource="" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /><br /> </Response></response><br /></Request><br />Ran over a period of 2.5 months yielded around 100 Million triples, representing about 3 Million HTTP requests. <br />Encodes all the info related to HTTP requests and responses.<br />Data sent and received stored separately.<br />
  8. 8. Basic analytics<br />
  9. 9. Focusing on personal data exchange<br />Extract information sent through parameters of HTTP Requests<br /><br />format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v<br />=1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM<br />%20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1<br />%29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Fti<br />me%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20messa<br />ge%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%2<br />0type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20F<br />ROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27ne<br />wsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28cr<br />eated%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5<br />Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=1<br />2565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9<br />e1a5ec6c5768%2D605559235<br />Map this data onto a representation of a user profile (set of attributes of personal data)<br />
  10. 10. Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data<br />
  11. 11. What this tells us about Trust and Criticality of data<br />36 attributes, 1,080 values, to 123 domains<br />A model of what piece of personal information was sent where (can answer the questions)<br /> Taking the point of view of an external observer, we can derive an observed model of trust and criticality of data<br />If this piece of data is critical to you and you give it to bob, you must trust bob<br />If you give this piece of data to many untrusted people, you probably don’t consider it critical<br />The goal being to help the user to better understand his own behavior<br />
  12. 12. The model formally<br />Trust in a domain = <br />max of criticality of data it received<br />Criticality of a piece of data= <br />1 / 1 + Σ (1- trust in websites <br />that received the data)<br />Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5<br />
  13. 13. Interacting with the model<br />Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior <br />
  14. 14. What we can do with this<br />Help a user understand his own data exchange<br />Compare websites and data in terms of the observed trust and criticality<br />“Correct” the model by re-aligning it with the intended behavior<br />Detect fundamental conflicts between the observed behavior and the intended behavior<br />Observe correlations in the data<br />
  15. 15. Where that leads us<br />1 first tools exploiting logs of personal Web activity <br />Demonstrate the need for better ways to personal information management as personal Web data exchange <br />Need to exploit and integrate local and external sources of data together to create new mechanisms supporting individuals in interpreting, understating and managing their information online<br />
  16. 16. Thank you<br /><br />@mdaquin<br />