UCIAD overview

758 views
702 views

Published on

Overview of the UCIAD (user centric integration of activity data), presented during the JISC visit - 20/04/2011

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
758
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

UCIAD overview

  1. 1. User Centric Integration of Activity Data<br />Mathieu d’Aquin, Stuart Brown, SalmanElahi, Enrico Motta<br />The Open University<br />
  2. 2. Agenda<br />Introduction of the Team<br />Objectives and Hypothesis<br />Overview of technical realization<br />Challenges<br />Summary of results so far and dissemination<br />
  3. 3. Team<br />Dr Mathieu d’Aquin– Research fellow, KMi – project director<br />Stuart Brown – Web developments and online communities, communication services – member of the steering group, liaison with online services<br />SalmanElahi– Resarch assistant and PhD student, KMi – developer/researcher <br />Prof Enrico Motta – Professor of knowledge technologies, KMi – Chair of the steering group <br />
  4. 4. Objectives and Hypothesis<br />Hypothesis<br />Taking a user centric point of view can allow different types of analysis of logs/activity data, which are valuable to the organisation and the user<br />Ontologiesand Ontology-based reasoning can support the integration, consolidation and interpretation of activity data from multiple sources <br />
  5. 5. Organisation Centric Activity Data <br />Analytics = aggregated stats<br />Consolidation<br />Consolidation<br />Consolidation<br />Logs 2<br />Logs 4<br />Logs 1<br />Logs 3<br />Website 2<br />Website 4<br />Website 1<br />Website 3<br />Organisation<br />Users<br />
  6. 6. At the Open University<br />An analytics system building aggregated data from various university’s websites<br />Based on a manually defined sitemaps<br />Good for website optimization, marketing campaigns, etc.<br />But the data being pre-aggregated, it is limited with respect to what it can do<br />Limited control<br />No user view <br />
  7. 7. User Centric Activity Data<br />Activity analysis for and by individual users<br />Consolidation<br />Integration<br />Interpretation<br />Ontologies<br />Logs 2<br />Logs 4<br />Logs 1<br />Logs 3<br />Website 2<br />Website 4<br />Website 1<br />Website 3<br />Organisation<br />Users<br />
  8. 8. Ontologies<br />Formal conceptual models of a domain<br />Here, the domain is online user activity <br />At the basis of Semantic Web technologies<br />Standard languages for expressing ontologies and ontological data (RDF, OWL)<br />Tools to manipulate and work with ontologies and semantic data (NeOn Toolkit, OWLIM)<br />Many ontologies to reuse (cf. Watson)<br />Adhere to a logical formalism<br />Enable inferences on the data<br />
  9. 9. Objectives and Deliverables<br />Build the technical infrastructure that can hold traces of activity data as semantic data<br />Include triple store with reasoning capability, log parsers for different formats of logs, and renderers as semantic data (RDF)<br />Build the ontologies to interpret and reason upon activity data<br />Including various aspects of activity data in a way which is extensible <br />Tools to support users in analyzing their own activity data<br />Recognize a user from the different settings and provide view on his/her own data <br />Allow him/her to customize the view, by customizing the ontology<br />Test, validate, deploy, distribute<br />
  10. 10. Technical infrastructure<br />Semantic Triple Store<br />Scheduler/Manager<br />Daily RDF traces<br />Daily RDF traces<br />Parser/RDF renderer<br />Parser/RDF renderer<br />Daily RDF traces<br />Daily RDF traces<br />Daily RDF traces<br />Log<br />Log<br />Parser/RDF renderer<br />Parser/RDF renderer<br />Parser/RDF renderer<br />Application<br />Log<br />Log<br />Log<br />Application<br />Server1<br />Server2<br />Server3<br />
  11. 11. Technical infrastructure<br />Development of parsers for different kinds a log formats <br />Currently handle Apache web server log files, parameterized from the Apache configuration<br />Easily extensible for dedicated log formats<br />Provide a common data structure serialized in RDF by the RDF renderer<br />Each server produces a daily extract from the logs in RDF, which is being used to populate the semantic triple store<br />The triple store includes multiple repositories and sub-spaces depending on time/user/server <br />
  12. 12. Ontologies<br />Key concepts to be represented:<br />Actors (human users and robots)<br />Sitemaps<br />Traces (broad notion of logs)<br />Activities<br />Reusing existing ontologies<br />FOAF: for people and documents<br />Time Ontology: for traces<br />Action ontology: for traces and activities<br />(Planned) OPO: Online presence<br />(Planner) SIOC: Online communities<br />
  13. 13.
  14. 14. Iterative and extensible construction of the ontologies<br />Provide a base with actors, sitemaps and traces<br />Specific extensions with typologies of activities, depending on user and site<br />Dynamically building and integrating<br />
  15. 15. Tool for analysis<br />Need a tool which given<br />A set of ontologies<br />A data repository (which can be the overall one, the one restricted by time, and one for a given user)<br /> can provide a meaningful and interactive overview of the activity data<br />To be used for <br />Provide an ontology-specific view of data analytics<br />Support the iterative development of the ontologies<br />Provide a user centric view of the data<br />
  16. 16. Tools for analysis<br />
  17. 17. Example<br />In the ontology:<br />/robot.txt is a RobotTXT page<br />A Spider is an RobotAgent (ActorAgent)<br />An agent used to access a RobotTXT is a Spider<br />An AutomaticActivity is a Trace realized by a RobotAgent<br />Result:<br />Thousands of traces automatically classified as automatic activities.<br />
  18. 18. Example<br />In the ontology:<br />UCIAD-Blog and LUCERO-Blog are Blogs (Website)<br />A BlogPage is a page which is part of a Blog<br />An activity onBlog is an activity happening on a Blog Page<br />Result:<br />Can look specifically at activities happening on a Blog and specialize them (same applies to Wikis, and other types of websites)<br />
  19. 19. Example<br />In the ontology:<br />A SPARQLEndpoint is a specific type of Webpage<br />AccessingSparqlEnpoint is an activity on a SPARQLEndpoint<br />SPARLQQueryParameter is a parameter with the name “query” used in an AccessingSPARQLEndpoint activity<br />ExecutingSPARQLQuery is an AccessingSPARQLQuery activity attached to a SPARQLQueryParameter<br />Result:<br />Can explore the specific activity of executing SPARQL queries and its parameters<br />Can combine: Detect the activity of Automatically Accessing a SPARQL endpoint: and automatic activity and accessing a SPARQL endpoint.<br />
  20. 20. Next step: User support<br />Allow users <br />to log-in<br />detect setting <br />bring up the relevant data <br />explore it<br />But also, <br />to customize the view of the data<br />to extend the ontologies to provide a personalized analysis of activity data<br />to export (interpreted) activity data for reuse<br />
  21. 21. User support<br />User Logging or register<br />Detect setting (agent+IP)<br />unknown setting<br />It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your account?<br />Check setting non-ambiguous<br />non-ambiguous<br />ambiguous<br />known setting for user<br />Add setting to known setting<br />Register setting as ambiguous<br />Display Activity Data related to all known settings of the user<br />yes<br />no<br />
  22. 22. User support: data for a user<br />For a user <u> the SPARQL query <br /> Construct {?trace ?p ?y. <br /> ?y ?q ?z} where<br /> {<u> actor:hasKnownSetting ?s. <br /> ?trace trace:hasSetting ?s. <br /> ?trace ?p ?y. ?trace ?q ?z}<br />builds the traces of activities around the known setting of <u><br />Used to populate a specific repository with sub-spaces for each registered users <br />
  23. 23. Deployment, test, validation<br />At the moment, testing for websites of projects and events hosted on KMi servers:<br />Sssw.org, sssw09.org, loted.eu, lucero-project.info, uciad.info, data.open.ac.uk, lucero.open.ac.uk, …<br />Next level up, websites/systems from main open university website:<br />www.open.ac.uk, study at the OU, podcasts.open.ac.uk, VLE<br />Extend to deployment of instances for specific projects with distributed websites<br />
  24. 24. Challenges<br />Scalability<br />OWLIM triple store can handle billions of triples<br />But struggle with millions when inference is “on”<br /> 1 repository without inference with all historical data, 1 with inference with 1 week of data only, and 1 with inference for registered users<br />User management and privacy<br />Ensuring that the user who logs in from a particular setting is the one having the activity is difficult (e.g., in the case of shared computers)<br />Is this really a problem?<br />Check ambiguity – ask verification questions – moderate?<br />Distribution and IPR<br />Code and ontologies under open licenses (small uncertainty regarding code developed in other projects)<br />Overall data: privacy issues (is k-anonymity actually applicable? Would it work?)<br />Overall data: institutional issues (can we show the traffic on our websites to everybody)<br />User data export: what license?<br />
  25. 25. Summary and dissemination<br />Promising initial results<br />Can create new ways of analysis at run-time by editing the ontologies!<br />Mechanisms to provide personal views on own activity data across websites<br />First version of the ontologies: ongoing task<br />First version of the tools: test and validate!<br />Dissemination<br />Blog / Twitter #uciad<br />KMi’sinternal news letter (KMi Planet)<br />Salman’s paper at the ESWC 2011 PhD symposium: “Personal Semantics: Personal information management in the Web with Semantic Technologies”<br />Position paper at the W3C Web tracking and privacy workshop: “Self-Tracking on the Web: Why and How”<br />Submission to the Personal Semantic Data workshop at K-CAP 2011<br />
  26. 26. More info<br />UCIAD Blog: http://uciad.info<br />Code base: http://github.com/uciad<br />Twitter: #uciad<br />@mdaquin<br />

×