Making sense of users' Web activities

4,553 views

Published on

Keynote at the Personal Semantic Data (PSD) workshop, collocated with EKAW 2010

Published in: Technology

Making sense of users' Web activities

  1. Making sense of Users’ Web activities<br />Mathieu d'Aquin<br />Knowledge Media Institute, The Open University, UK<br />
  2. A bit of sci-fi to start with<br />“… from people who are afraid that someone else knows information that they don’t and is gaining an unfair advantage by it. For all the claims one hears about the liberating impact of the data-net, the truth is that it whished on most of us a brand-new reason for paranoia” <br /> John Brunner, <br /> The Shockwave Rider, 1975<br />
  3. What we don’t know that they know<br />Simple important things:<br />And more complex important things…<br />What are all the websites that know my e-mail address?<br />What does amazon.co.uk or the website of my favorite airline know about me?<br />
  4. Is this Personal Information Management?<br />Yes, but…<br />Looking at individual user’s information exchange and more generally activities on the Web<br />This is :<br />Big<br />Heterogeneous<br />Distributed<br />Fragmented<br />Sometimes implicit<br />And hard to collect!<br />
  5. So, what do we do?<br />Unrestricted monitoring of information exchange on the Web by an individual user<br />
  6. Local Logging<br />Proxy<br />HTTP Requests<br />HTTP Requests<br />Local Web Agents <br />(e.g., browser)<br />External Web Sites<br />HTTP Responses<br />HTTP Responses<br />Web Exchange <br />RDF Logs<br />
  7. <Request rdf:about="#request-1257949232709-1257949233757"><br /> <startedAt>1257949232709</startedAt><br /> <endedAt>1257949233757</endedAt><br /> <origin rdf:resource="127.0.0.1" /><br /> <onPort>80</onPort><br /> <toHostrdf:resource="api.facebook.com" /><br /> <method rdf:resource="POST"/><br /> <toURLrdf:resource="http://api.facebook.com/restserver.php" /><br /> <HTTPVersionrdf:resource="HTTP-1.1" /><br /> <Host rdf:resource="api.facebook.com" /><br /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /><br /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App<br />leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /><br /> <Refererrdf:resource="app:--TweetDeck.swf" /><br /> <X-Flash-Version rdf:resource="10.0.32.18" /><br /> <Accept rdf:resource="*--*" /><br /> <Accept-Language rdf:resource="en-us" /><br /> <Accept-Encoding rdf:resource="gzip._deflate" /><br /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357<br />565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm<br />ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055<br />59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71<br />54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1<br />=9874874320812" /><br /> <Content-Length rdf:resource="984" /><br /> <Connection rdf:resource="keep-alive" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /><br /> <response><br /> <Response rdf:about="#response-1257949232709--1257949233757"><br /> <HTTPVersionrdf:resource="HTTP--1.0" /><br /> <responseCoderdf:resource="200_OK" /><br /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.<br />_post-check=0._pre-check=0" /><br /> <Content-Type rdf:resource="application--json" /><br /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /><br /> <Pragmardf:resource="no-cache" /><br /> <Content-Encoding rdf:resource="gzip" /><br /> <Content-Length rdf:resource="5943" /><br /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /><br /> </Response></response><br /></Request><br /><Request rdf:about="#request-1257949232709-1257949233757"><br /> <startedAt>1257949232709</startedAt><br /> <endedAt>1257949233757</endedAt><br /> <origin rdf:resource="127.0.0.1" /><br /> <onPort>80</onPort><br /> <toHostrdf:resource="api.facebook.com" /><br /> <method rdf:resource="POST"/><br /> <toURLrdf:resource="http://api.facebook.com/restserver.php" /><br /> <HTTPVersionrdf:resource="HTTP-1.1" /><br /> <Host rdf:resource="api.facebook.com" /><br /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /><br /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App<br />leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /><br /> <Refererrdf:resource="app:--TweetDeck.swf" /><br /> <X-Flash-Version rdf:resource="10.0.32.18" /><br /> <Accept rdf:resource="*--*" /><br /> <Accept-Language rdf:resource="en-us" /><br /> <Accept-Encoding rdf:resource="gzip._deflate" /><br /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357<br />565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm<br />ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055<br />59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71<br />54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1<br />=9874874320812" /><br /> <Content-Length rdf:resource="984" /><br /> <Connection rdf:resource="keep-alive" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /><br /> <response><br /> <Response rdf:about="#response-1257949232709--1257949233757"><br /> <HTTPVersionrdf:resource="HTTP--1.0" /><br /> <responseCoderdf:resource="200_OK" /><br /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.<br />_post-check=0._pre-check=0" /><br /> <Content-Type rdf:resource="application--json" /><br /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /><br /> <Pragmardf:resource="no-cache" /><br /> <Content-Encoding rdf:resource="gzip" /><br /> <Content-Length rdf:resource="5943" /><br /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /><br /> </Response></response><br /></Request><br /><Request rdf:about="#request-1257949232709-1257949233757"><br /> <startedAt>1257949232709</startedAt><br /> <endedAt>1257949233757</endedAt><br /> <origin rdf:resource="127.0.0.1" /><br /> <onPort>80</onPort><br /> <toHostrdf:resource="api.facebook.com" /><br /> <method rdf:resource="POST"/><br /> <toURLrdf:resource="http://api.facebook.com/restserver.php" /><br /> <HTTPVersionrdf:resource="HTTP-1.1" /><br /> <Host rdf:resource="api.facebook.com" /><br /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /><br /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App<br />leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /><br /> <Refererrdf:resource="app:--TweetDeck.swf" /><br /> <X-Flash-Version rdf:resource="10.0.32.18" /><br /> <Accept rdf:resource="*--*" /><br /> <Accept-Language rdf:resource="en-us" /><br /> <Accept-Encoding rdf:resource="gzip._deflate" /><br /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357<br />565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm<br />ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055<br />59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71<br />54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1<br />=9874874320812" /><br /> <Content-Length rdf:resource="984" /><br /> <Connection rdf:resource="keep-alive" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /><br /> <response><br /> <Response rdf:about="#response-1257949232709--1257949233757"><br /> <HTTPVersionrdf:resource="HTTP--1.0" /><br /> <responseCoderdf:resource="200_OK" /><br /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.<br />_post-check=0._pre-check=0" /><br /> <Content-Type rdf:resource="application--json" /><br /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /><br /> <Pragmardf:resource="no-cache" /><br /> <Content-Encoding rdf:resource="gzip" /><br /> <Content-Length rdf:resource="5943" /><br /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /><br /> </Response></response><br /></Request><br />2.5 months = <br />3 Million HTTP Requests<br />100 Million RDF Triples<br /><Request rdf:about="#request-1257949232709-1257949233757"><br /> <startedAt>1257949232709</startedAt><br /> <endedAt>1257949233757</endedAt><br /> <origin rdf:resource="127.0.0.1" /><br /> <onPort>80</onPort><br /> <toHostrdf:resource="api.facebook.com" /><br /> <method rdf:resource="POST"/><br /> <toURLrdf:resource="http://api.facebook.com/restserver.php" /><br /> <HTTPVersionrdf:resource="HTTP-1.1" /><br /> <Host rdf:resource="api.facebook.com" /><br /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /><br /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App<br />leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /><br /> <Refererrdf:resource="app:--TweetDeck.swf" /><br /> <X-Flash-Version rdf:resource="10.0.32.18" /><br /> <Accept rdf:resource="*--*" /><br /> <Accept-Language rdf:resource="en-us" /><br /> <Accept-Encoding rdf:resource="gzip._deflate" /><br /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357<br />565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm<br />ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055<br />59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71<br />54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1<br />=9874874320812" /><br /> <Content-Length rdf:resource="984" /><br /> <Connection rdf:resource="keep-alive" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /><br /> <response><br /> <Response rdf:about="#response-1257949232709--1257949233757"><br /> <HTTPVersionrdf:resource="HTTP--1.0" /><br /> <responseCoderdf:resource="200_OK" /><br /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.<br />_post-check=0._pre-check=0" /><br /> <Content-Type rdf:resource="application--json" /><br /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /><br /> <Pragmardf:resource="no-cache" /><br /> <Content-Encoding rdf:resource="gzip" /><br /> <Content-Length rdf:resource="5943" /><br /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /><br /> </Response></response><br /></Request><br /><Request rdf:about="#request-1257949232709-1257949233757"><br /> <startedAt>1257949232709</startedAt><br /> <endedAt>1257949233757</endedAt><br /> <origin rdf:resource="127.0.0.1" /><br /> <onPort>80</onPort><br /> <toHostrdf:resource="api.facebook.com" /><br /> <method rdf:resource="POST"/><br /> <toURLrdf:resource="http://api.facebook.com/restserver.php" /><br /> <HTTPVersionrdf:resource="HTTP-1.1" /><br /> <Host rdf:resource="api.facebook.com" /><br /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /><br /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App<br />leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /><br /> <Refererrdf:resource="app:--TweetDeck.swf" /><br /> <X-Flash-Version rdf:resource="10.0.32.18" /><br /> <Accept rdf:resource="*--*" /><br /> <Accept-Language rdf:resource="en-us" /><br /> <Accept-Encoding rdf:resource="gzip._deflate" /><br /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357<br />565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm<br />ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055<br />59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71<br />54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1<br />=9874874320812" /><br /> <Content-Length rdf:resource="984" /><br /> <Connection rdf:resource="keep-alive" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /><br /> <response><br /> <Response rdf:about="#response-1257949232709--1257949233757"><br /> <HTTPVersionrdf:resource="HTTP--1.0" /><br /> <responseCoderdf:resource="200_OK" /><br /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.<br />_post-check=0._pre-check=0" /><br /> <Content-Type rdf:resource="application--json" /><br /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /><br /> <Pragmardf:resource="no-cache" /><br /> <Content-Encoding rdf:resource="gzip" /><br /> <Content-Length rdf:resource="5943" /><br /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /><br /> <Proxy-Connection rdf:resource="keep-alive" /><br /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /><br /> </Response></response><br /></Request><br />
  8. What this talk is about<br />Using ontologies and external datasets to <br />Generate abstractions of this low level data<br />Enrich it with external knowledge and models<br />Interpret to give back useful information to the user <br />
  9. Online Activities Ontology <br />HTTP Ontology <br />Parameters and Website info.<br />Personal Information<br />Web Site Information<br />Trust Model<br />Location Information<br />
  10. HTTP Ontology<br />Built bottom-up from the data<br />Can help inferring simple things from it<br />And answer questions through SPARQL queries<br />InternetPoint<br /> time: DateTime<br />origine<br />Request<br /> time: DateTime<br />toURL: URL<br />referer: URL<br />toHost<br />WebHost<br /> domain: String<br />User-Agent<br />WebAgent<br /> ID: String<br />hasResponse<br />Content<br />Content-Type<br />Response<br /> time: DateTime<br />responseCode: int<br />DataFile<br /> ID: String<br />Content<br />Content-Type<br />DataFormat<br />MineID: String<br />
  11. Simple examples<br />Requests per time of day<br />Requests per User Agents<br />Requests per Host<br />
  12. Integrating basic info<br />Domain name<br />IP<br />Location<br />“What!? What requests have I made to websites in Nigeria? What Data did I send?”<br />Can be answered in a SPARQL query<br />
  13. More information about websites<br />The linked data cloud is full of it.<br />Using the domain name to address this information.<br />CONSTRUCT <br />{<domain_name> ?p ?y}<br />WHERE {{{?xdbpedia:homepage <http://domain_name>}.<br /> {?x ?p ?y}}<br />UNION {{?xowl:sameAs ?z}.<br /> {?xdbpedia:homepage <http://domain_name>}.<br /> {?x ?p ?y}}}<br />
  14. Examples<br />Google Services<br />Entertainment Websites<br />Web Analytics<br />Internet Search Engine<br />subject/category<br />Video sharing<br />Video Hosting<br />www.google-analytics.com<br />Company<br />developer<br />Web Search Engine<br />Search Engine<br />type<br />subject/category<br />google<br />owner<br />subsediaryOf<br />www.youtube.com<br />www.google.com<br />parent<br />DBpedia<br />freebase<br />
  15. Activities<br />Can we now understand the user activities?<br />Based on website categories and on their parameters:<br />GET http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2F<br />POST format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v =1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM %20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1 %29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Ftime%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20message%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%20type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20FROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27newsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28created%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=12565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9e1a5ec6c5768%2D605559235<br />
  16. Activities in an Ontology<br />Derived in a bottom-up way from categories of activities/request<br />Can be used to characterize overall activities, individual activities or correlations between activities <br />ActivityBasedRequest<br />ImplicitActivity<br />ExplicitActivity<br />ReportToAnalytics<br />Search<br />CheckStatusFeed<br />SearchVideo<br />SearchImage<br />AutoCheckStatusFeed<br />FollowLink<br />ManualCheckStatusFeed<br />FollowSearchResult<br />
  17. Example Activity: Search<br />Search keywords<br />
  18. Example Activity: Search<br />inverseOf(linked-followed, referer)<br />InformationalSearch= SearchRequest and min 2 link-followed<br />NavigationalSearch= SearchRequest and =1 link-followed<br />Prominence of Navigational Searches<br />IndexedSite= exists refererNavigationalSearch<br />IndexedSite(?x), NavigationalSearch(?y), referer(?x, ?y), searchTerm(?y, ?z) IndexedWithKeyword(?x, ?z)<br />
  19. Example Activity: Search<br />Search Keywords<br />OpenCalais<br />Topics of interest<br />
  20. Personal data exchange<br />Request Parameters<br />Personal Information (Profile)<br />Trust Model<br />
  21. Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data<br />
  22. User profile re-constructed from Web activities<br />36 attributes, 1,080 values, to 123 domains<br />A model of what piece of personal information was sent where (can answer the questions)<br />
  23. What that tells us about trust<br />Taking the point of view of an external observer, we can derive an observed model of trust and criticality of data<br />If this piece of data is critical to you and you give it to bob, you must trust bob<br />If you give this piece of data to many untrusted people, you probably don’t consider it critical<br />
  24. Formally<br />Trust in a domain = <br />max of criticality of data it received<br />Criticality of a piece of data= <br />1 / 1 + Σ (1- trust in websites <br />that received the data)<br />Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5<br />
  25. Interacting with the model<br />Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior <br />
  26. Demo<br />
  27. Conclusion<br />First set tools exploiting logs of personal Web activity <br />Demonstrate the need for ways to abstract and interpreter activity data, to support Web Users<br />Demonstrate the ability of semantic technologies, ontologies and the enrichment through external data, to provide such abilities<br />
  28. So much more to do<br />Can I collect this tweet? From HTTPS? From my mobile phone?<br />Can I link it to where I am?<br />To what I’m doing? To what I have been doing?<br />To the abstract of the presentation? To the slides on SlideShare.net? To blogs mentioning it?<br />Can I cope with the scale of all this information? Can I decide what to share? Can I store all this securely? Can I get usable access to it? Can I learn something from it?<br />
  29. Thank you<br />m.daquin@open.ac.uk<br />@mdaquin<br />

×