Your SlideShare is downloading. ×
Making sense of users' Web activities
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Making sense of users' Web activities

2,365
views

Published on

Keynote at the Personal Semantic Data (PSD) workshop, collocated with EKAW 2010

Keynote at the Personal Semantic Data (PSD) workshop, collocated with EKAW 2010

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,365
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Making sense of Users’ Web activities
    Mathieu d'Aquin
    Knowledge Media Institute, The Open University, UK
  • 2. A bit of sci-fi to start with
    “… from people who are afraid that someone else knows information that they don’t and is gaining an unfair advantage by it. For all the claims one hears about the liberating impact of the data-net, the truth is that it whished on most of us a brand-new reason for paranoia”
    John Brunner,
    The Shockwave Rider, 1975
  • 3. What we don’t know that they know
    Simple important things:
    And more complex important things…
    What are all the websites that know my e-mail address?
    What does amazon.co.uk or the website of my favorite airline know about me?
  • 4. Is this Personal Information Management?
    Yes, but…
    Looking at individual user’s information exchange and more generally activities on the Web
    This is :
    Big
    Heterogeneous
    Distributed
    Fragmented
    Sometimes implicit
    And hard to collect!
  • 5. So, what do we do?
    Unrestricted monitoring of information exchange on the Web by an individual user
  • 6. Local Logging
    Proxy
    HTTP Requests
    HTTP Requests
    Local Web Agents
    (e.g., browser)
    External Web Sites
    HTTP Responses
    HTTP Responses
    Web Exchange
    RDF Logs
  • 7. <Request rdf:about="#request-1257949232709-1257949233757">
    <startedAt>1257949232709</startedAt>
    <endedAt>1257949233757</endedAt>
    <origin rdf:resource="127.0.0.1" />
    <onPort>80</onPort>
    <toHostrdf:resource="api.facebook.com" />
    <method rdf:resource="POST"/>
    <toURLrdf:resource="http://api.facebook.com/restserver.php" />
    <HTTPVersionrdf:resource="HTTP-1.1" />
    <Host rdf:resource="api.facebook.com" />
    <Content-Type rdf:resource="application--x-www-form-urlencoded" />
    <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App
    leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />
    <Refererrdf:resource="app:--TweetDeck.swf" />
    <X-Flash-Version rdf:resource="10.0.32.18" />
    <Accept rdf:resource="*--*" />
    <Accept-Language rdf:resource="en-us" />
    <Accept-Encoding rdf:resource="gzip._deflate" />
    <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357
    565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm
    ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055
    59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71
    54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1
    =9874874320812" />
    <Content-Length rdf:resource="984" />
    <Connection rdf:resource="keep-alive" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />
    <response>
    <Response rdf:about="#response-1257949232709--1257949233757">
    <HTTPVersionrdf:resource="HTTP--1.0" />
    <responseCoderdf:resource="200_OK" />
    <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.
    _post-check=0._pre-check=0" />
    <Content-Type rdf:resource="application--json" />
    <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />
    <Pragmardf:resource="no-cache" />
    <Content-Encoding rdf:resource="gzip" />
    <Content-Length rdf:resource="5943" />
    <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />
    </Response></response>
    </Request>
    <Request rdf:about="#request-1257949232709-1257949233757">
    <startedAt>1257949232709</startedAt>
    <endedAt>1257949233757</endedAt>
    <origin rdf:resource="127.0.0.1" />
    <onPort>80</onPort>
    <toHostrdf:resource="api.facebook.com" />
    <method rdf:resource="POST"/>
    <toURLrdf:resource="http://api.facebook.com/restserver.php" />
    <HTTPVersionrdf:resource="HTTP-1.1" />
    <Host rdf:resource="api.facebook.com" />
    <Content-Type rdf:resource="application--x-www-form-urlencoded" />
    <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App
    leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />
    <Refererrdf:resource="app:--TweetDeck.swf" />
    <X-Flash-Version rdf:resource="10.0.32.18" />
    <Accept rdf:resource="*--*" />
    <Accept-Language rdf:resource="en-us" />
    <Accept-Encoding rdf:resource="gzip._deflate" />
    <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357
    565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm
    ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055
    59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71
    54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1
    =9874874320812" />
    <Content-Length rdf:resource="984" />
    <Connection rdf:resource="keep-alive" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />
    <response>
    <Response rdf:about="#response-1257949232709--1257949233757">
    <HTTPVersionrdf:resource="HTTP--1.0" />
    <responseCoderdf:resource="200_OK" />
    <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.
    _post-check=0._pre-check=0" />
    <Content-Type rdf:resource="application--json" />
    <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />
    <Pragmardf:resource="no-cache" />
    <Content-Encoding rdf:resource="gzip" />
    <Content-Length rdf:resource="5943" />
    <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />
    </Response></response>
    </Request>
    <Request rdf:about="#request-1257949232709-1257949233757">
    <startedAt>1257949232709</startedAt>
    <endedAt>1257949233757</endedAt>
    <origin rdf:resource="127.0.0.1" />
    <onPort>80</onPort>
    <toHostrdf:resource="api.facebook.com" />
    <method rdf:resource="POST"/>
    <toURLrdf:resource="http://api.facebook.com/restserver.php" />
    <HTTPVersionrdf:resource="HTTP-1.1" />
    <Host rdf:resource="api.facebook.com" />
    <Content-Type rdf:resource="application--x-www-form-urlencoded" />
    <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App
    leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />
    <Refererrdf:resource="app:--TweetDeck.swf" />
    <X-Flash-Version rdf:resource="10.0.32.18" />
    <Accept rdf:resource="*--*" />
    <Accept-Language rdf:resource="en-us" />
    <Accept-Encoding rdf:resource="gzip._deflate" />
    <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357
    565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm
    ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055
    59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71
    54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1
    =9874874320812" />
    <Content-Length rdf:resource="984" />
    <Connection rdf:resource="keep-alive" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />
    <response>
    <Response rdf:about="#response-1257949232709--1257949233757">
    <HTTPVersionrdf:resource="HTTP--1.0" />
    <responseCoderdf:resource="200_OK" />
    <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.
    _post-check=0._pre-check=0" />
    <Content-Type rdf:resource="application--json" />
    <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />
    <Pragmardf:resource="no-cache" />
    <Content-Encoding rdf:resource="gzip" />
    <Content-Length rdf:resource="5943" />
    <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />
    </Response></response>
    </Request>
    2.5 months =
    3 Million HTTP Requests
    100 Million RDF Triples
    <Request rdf:about="#request-1257949232709-1257949233757">
    <startedAt>1257949232709</startedAt>
    <endedAt>1257949233757</endedAt>
    <origin rdf:resource="127.0.0.1" />
    <onPort>80</onPort>
    <toHostrdf:resource="api.facebook.com" />
    <method rdf:resource="POST"/>
    <toURLrdf:resource="http://api.facebook.com/restserver.php" />
    <HTTPVersionrdf:resource="HTTP-1.1" />
    <Host rdf:resource="api.facebook.com" />
    <Content-Type rdf:resource="application--x-www-form-urlencoded" />
    <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App
    leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />
    <Refererrdf:resource="app:--TweetDeck.swf" />
    <X-Flash-Version rdf:resource="10.0.32.18" />
    <Accept rdf:resource="*--*" />
    <Accept-Language rdf:resource="en-us" />
    <Accept-Encoding rdf:resource="gzip._deflate" />
    <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357
    565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm
    ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055
    59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71
    54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1
    =9874874320812" />
    <Content-Length rdf:resource="984" />
    <Connection rdf:resource="keep-alive" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />
    <response>
    <Response rdf:about="#response-1257949232709--1257949233757">
    <HTTPVersionrdf:resource="HTTP--1.0" />
    <responseCoderdf:resource="200_OK" />
    <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.
    _post-check=0._pre-check=0" />
    <Content-Type rdf:resource="application--json" />
    <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />
    <Pragmardf:resource="no-cache" />
    <Content-Encoding rdf:resource="gzip" />
    <Content-Length rdf:resource="5943" />
    <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />
    </Response></response>
    </Request>
    <Request rdf:about="#request-1257949232709-1257949233757">
    <startedAt>1257949232709</startedAt>
    <endedAt>1257949233757</endedAt>
    <origin rdf:resource="127.0.0.1" />
    <onPort>80</onPort>
    <toHostrdf:resource="api.facebook.com" />
    <method rdf:resource="POST"/>
    <toURLrdf:resource="http://api.facebook.com/restserver.php" />
    <HTTPVersionrdf:resource="HTTP-1.1" />
    <Host rdf:resource="api.facebook.com" />
    <Content-Type rdf:resource="application--x-www-form-urlencoded" />
    <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_App
    leWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />
    <Refererrdf:resource="app:--TweetDeck.swf" />
    <X-Flash-Version rdf:resource="10.0.32.18" />
    <Accept rdf:resource="*--*" />
    <Accept-Language rdf:resource="en-us" />
    <Accept-Encoding rdf:resource="gzip._deflate" />
    <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357
    565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utm
    ccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=6055
    59235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa71
    54b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1
    =9874874320812" />
    <Content-Length rdf:resource="984" />
    <Connection rdf:resource="keep-alive" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />
    <response>
    <Response rdf:about="#response-1257949232709--1257949233757">
    <HTTPVersionrdf:resource="HTTP--1.0" />
    <responseCoderdf:resource="200_OK" />
    <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate.
    _post-check=0._pre-check=0" />
    <Content-Type rdf:resource="application--json" />
    <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />
    <Pragmardf:resource="no-cache" />
    <Content-Encoding rdf:resource="gzip" />
    <Content-Length rdf:resource="5943" />
    <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />
    <Proxy-Connection rdf:resource="keep-alive" />
    <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />
    </Response></response>
    </Request>
  • 8. What this talk is about
    Using ontologies and external datasets to
    Generate abstractions of this low level data
    Enrich it with external knowledge and models
    Interpret to give back useful information to the user
  • 9. Online Activities Ontology
    HTTP Ontology
    Parameters and Website info.
    Personal Information
    Web Site Information
    Trust Model
    Location Information
  • 10. HTTP Ontology
    Built bottom-up from the data
    Can help inferring simple things from it
    And answer questions through SPARQL queries
    InternetPoint
    time: DateTime
    origine
    Request
    time: DateTime
    toURL: URL
    referer: URL
    toHost
    WebHost
    domain: String
    User-Agent
    WebAgent
    ID: String
    hasResponse
    Content
    Content-Type
    Response
    time: DateTime
    responseCode: int
    DataFile
    ID: String
    Content
    Content-Type
    DataFormat
    MineID: String
  • 11. Simple examples
    Requests per time of day
    Requests per User Agents
    Requests per Host
  • 12. Integrating basic info
    Domain name
    IP
    Location
    “What!? What requests have I made to websites in Nigeria? What Data did I send?”
    Can be answered in a SPARQL query
  • 13. More information about websites
    The linked data cloud is full of it.
    Using the domain name to address this information.
    CONSTRUCT
    {<domain_name> ?p ?y}
    WHERE {{{?xdbpedia:homepage <http://domain_name>}.
    {?x ?p ?y}}
    UNION {{?xowl:sameAs ?z}.
    {?xdbpedia:homepage <http://domain_name>}.
    {?x ?p ?y}}}
  • 14. Examples
    Google Services
    Entertainment Websites
    Web Analytics
    Internet Search Engine
    subject/category
    Video sharing
    Video Hosting
    www.google-analytics.com
    Company
    developer
    Web Search Engine
    Search Engine
    type
    subject/category
    google
    owner
    subsediaryOf
    www.youtube.com
    www.google.com
    parent
    DBpedia
    freebase
  • 15. Activities
    Can we now understand the user activities?
    Based on website categories and on their parameters:
    GET http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2F
    POST format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v =1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM %20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1 %29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Ftime%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20message%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%20type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20FROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27newsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28created%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=12565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9e1a5ec6c5768%2D605559235
  • 16. Activities in an Ontology
    Derived in a bottom-up way from categories of activities/request
    Can be used to characterize overall activities, individual activities or correlations between activities
    ActivityBasedRequest
    ImplicitActivity
    ExplicitActivity
    ReportToAnalytics
    Search
    CheckStatusFeed
    SearchVideo
    SearchImage
    AutoCheckStatusFeed
    FollowLink
    ManualCheckStatusFeed
    FollowSearchResult
  • 17. Example Activity: Search
    Search keywords
  • 18. Example Activity: Search
    inverseOf(linked-followed, referer)
    InformationalSearch= SearchRequest and min 2 link-followed
    NavigationalSearch= SearchRequest and =1 link-followed
    Prominence of Navigational Searches
    IndexedSite= exists refererNavigationalSearch
    IndexedSite(?x), NavigationalSearch(?y), referer(?x, ?y), searchTerm(?y, ?z) IndexedWithKeyword(?x, ?z)
  • 19. Example Activity: Search
    Search Keywords
    OpenCalais
    Topics of interest
  • 20. Personal data exchange
    Request Parameters
    Personal Information (Profile)
    Trust Model
  • 21. Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data
  • 22. User profile re-constructed from Web activities
    36 attributes, 1,080 values, to 123 domains
    A model of what piece of personal information was sent where (can answer the questions)
  • 23. What that tells us about trust
    Taking the point of view of an external observer, we can derive an observed model of trust and criticality of data
    If this piece of data is critical to you and you give it to bob, you must trust bob
    If you give this piece of data to many untrusted people, you probably don’t consider it critical
  • 24. Formally
    Trust in a domain =
    max of criticality of data it received
    Criticality of a piece of data=
    1 / 1 + Σ (1- trust in websites
    that received the data)
    Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5
  • 25. Interacting with the model
    Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior
  • 26. Demo
  • 27. Conclusion
    First set tools exploiting logs of personal Web activity
    Demonstrate the need for ways to abstract and interpreter activity data, to support Web Users
    Demonstrate the ability of semantic technologies, ontologies and the enrichment through external data, to provide such abilities
  • 28. So much more to do
    Can I collect this tweet? From HTTPS? From my mobile phone?
    Can I link it to where I am?
    To what I’m doing? To what I have been doing?
    To the abstract of the presentation? To the slides on SlideShare.net? To blogs mentioning it?
    Can I cope with the scale of all this information? Can I decide what to share? Can I store all this securely? Can I get usable access to it? Can I learn something from it?
  • 29. Thank you
    m.daquin@open.ac.uk
    @mdaquin

×