The talk titled "Federated Architecture with Provenance and Access Control to realize Open Digital Data for MGI" given by prof. Amit Sheth at the ICMSE-MGI Digital Data Workshop held at Kno.e.sis Center from November 13-14 2013. The talk emphasized important issues that material scientists encounter in publishing data - Provenance and Access Control.
workshop page: http://wiki.knoesis.org/index.php/ICMSE-MGI_Digital_Data_Workshop
May 2012 JaxDUG presentation by Zachary Gramana on using the Lucene.NET library to add search functionality to .NET applications. Contains an overview of search/information retrieval concepts and highlights some common use-cases.
Work in Progress on the Standardization of Online Laboratories for EducationMiguel R. Artacho
In the last years we have witness the increasing use of remote laboratories in education, encompassed by the development of technology enhanced learning from k-12 to higher education. In this context there is a need, on one hand to define and establish a common consensus on the structure and operation of remote laboratories (RL) from learning technologies perspective as open educational resources (OER) independent from LMSs. On the other hand, cloud computing and the concept of XaaS makes meaningful to consider laboratories as another lego piece of an educational resource in the cloud with a focus on potential standardardized information models and protocols as a common basis for instructional interoperability based on search, retrieval, labeling and services for educational purposes.
The talk titled "Federated Architecture with Provenance and Access Control to realize Open Digital Data for MGI" given by prof. Amit Sheth at the ICMSE-MGI Digital Data Workshop held at Kno.e.sis Center from November 13-14 2013. The talk emphasized important issues that material scientists encounter in publishing data - Provenance and Access Control.
workshop page: http://wiki.knoesis.org/index.php/ICMSE-MGI_Digital_Data_Workshop
May 2012 JaxDUG presentation by Zachary Gramana on using the Lucene.NET library to add search functionality to .NET applications. Contains an overview of search/information retrieval concepts and highlights some common use-cases.
Work in Progress on the Standardization of Online Laboratories for EducationMiguel R. Artacho
In the last years we have witness the increasing use of remote laboratories in education, encompassed by the development of technology enhanced learning from k-12 to higher education. In this context there is a need, on one hand to define and establish a common consensus on the structure and operation of remote laboratories (RL) from learning technologies perspective as open educational resources (OER) independent from LMSs. On the other hand, cloud computing and the concept of XaaS makes meaningful to consider laboratories as another lego piece of an educational resource in the cloud with a focus on potential standardardized information models and protocols as a common basis for instructional interoperability based on search, retrieval, labeling and services for educational purposes.
IEEE ISM 2008: Kalman Graffi: A Distributed Platform for Multimedia CommunitiesKalman Graffi
Online community platforms and multimedia content delivery are merging in recent years. Current platforms like Facebook and YouTube are client-server based which result in high administration costs for the provider. In contrast to that peer-to-peer systems offer scalability and low costs, but are limited in their functionality. In this paper we present a framework for peer-to-peer based multimedia online communities.We identified the key challenges for this new application of the peer-to-peer paradigm and built a plugin based, easily extendible and multifunctional framework. Further, we identified distributed linked lists as valuable data structure to implement the user profiles, friend lists, groups, photo albums and more. Our framework aims at providing the functionality of common online community platforms combined with the multimedia delivery capabilities of modern peer-to-peer systems, e.g. direct multimedia delivery and access to a distributed multimedia pool.
A presentation on the SageCite project given at the JISC MRD International Workshop in March 2011. Describes the application domain and citation challenges in SageCite.
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
Introduction to biological network analysis and visualization with Cytoscape (using the latest version 3.4).
This is a first half of the lecture for Applied Bioinformatics lecture at TSRI.
WoSC19: Serverless Workflows for Indexing Large Scientific DataUniversity of Chicago
The use and reuse of scientific data is ultimately dependent on the ability to understand what those data represent, how they were captured, and how they can be used. In many ways, data are only as useful as the metadata available to describe them. Unfortunately, due to growing data volumes, large and distributed collaborations, and a desire to store data for long periods of time, scientific “data lakes” quickly become disorganized and lack the metadata necessary to be useful to researchers. New automated approaches are needed to derive metadata from scientific files and to use these metadata for organization and discovery. Here we describe one such system, Xtract, a service capable of processing vast collections of scientific files and automatically extracting metadata from diverse file types. Xtract relies on function as a service models to enable scalable metadata extraction by orchestrating the execution of many, short-running extractor functions. To reduce data transfer costs, Xtract can be configured to deploy extractors centrally or near to the data (i.e., at the edge). We present a prototype implementation of Xtract and demonstrate that it can derive metadata from a 7 TB scientific data repository.
Using Implicit Preference Relations to Improve Content-based Recommendations,...Ladislav Peska
Slides from the paper
Ladislav Peska, Peter Vojtas:Using Implicit Preference Relations to Improve Content-based Recommendations.
on EC_WEB 2015 Conference (DEXA event), Valencia, Spain
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
Amit Sheth's Keynote talk given at: “Semantic Web in Action: Ontology-driven information search, integration and analysis,” Net Object Days 2003 and MATES03, Erfurt, Germany, September 23, 2003. http://knoesis.org
Note: slides 51-55 have audio.
IEEE ISM 2008: Kalman Graffi: A Distributed Platform for Multimedia CommunitiesKalman Graffi
Online community platforms and multimedia content delivery are merging in recent years. Current platforms like Facebook and YouTube are client-server based which result in high administration costs for the provider. In contrast to that peer-to-peer systems offer scalability and low costs, but are limited in their functionality. In this paper we present a framework for peer-to-peer based multimedia online communities.We identified the key challenges for this new application of the peer-to-peer paradigm and built a plugin based, easily extendible and multifunctional framework. Further, we identified distributed linked lists as valuable data structure to implement the user profiles, friend lists, groups, photo albums and more. Our framework aims at providing the functionality of common online community platforms combined with the multimedia delivery capabilities of modern peer-to-peer systems, e.g. direct multimedia delivery and access to a distributed multimedia pool.
A presentation on the SageCite project given at the JISC MRD International Workshop in March 2011. Describes the application domain and citation challenges in SageCite.
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
Introduction to biological network analysis and visualization with Cytoscape (using the latest version 3.4).
This is a first half of the lecture for Applied Bioinformatics lecture at TSRI.
WoSC19: Serverless Workflows for Indexing Large Scientific DataUniversity of Chicago
The use and reuse of scientific data is ultimately dependent on the ability to understand what those data represent, how they were captured, and how they can be used. In many ways, data are only as useful as the metadata available to describe them. Unfortunately, due to growing data volumes, large and distributed collaborations, and a desire to store data for long periods of time, scientific “data lakes” quickly become disorganized and lack the metadata necessary to be useful to researchers. New automated approaches are needed to derive metadata from scientific files and to use these metadata for organization and discovery. Here we describe one such system, Xtract, a service capable of processing vast collections of scientific files and automatically extracting metadata from diverse file types. Xtract relies on function as a service models to enable scalable metadata extraction by orchestrating the execution of many, short-running extractor functions. To reduce data transfer costs, Xtract can be configured to deploy extractors centrally or near to the data (i.e., at the edge). We present a prototype implementation of Xtract and demonstrate that it can derive metadata from a 7 TB scientific data repository.
Using Implicit Preference Relations to Improve Content-based Recommendations,...Ladislav Peska
Slides from the paper
Ladislav Peska, Peter Vojtas:Using Implicit Preference Relations to Improve Content-based Recommendations.
on EC_WEB 2015 Conference (DEXA event), Valencia, Spain
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
Amit Sheth's Keynote talk given at: “Semantic Web in Action: Ontology-driven information search, integration and analysis,” Net Object Days 2003 and MATES03, Erfurt, Germany, September 23, 2003. http://knoesis.org
Note: slides 51-55 have audio.
presents the foundational aspects of web analytics and some specifics such as the hotel problem. Discusses trace data, behaviorism, and other cool web analytics stuff
Marios Chatziangelou presents the EGI applications database | OSFair2017 Workshop
Workshop overview:
This collaborative workshop comes in the context of coordinating EOSC related activities across large European infrastructures at European and national level. The workshop will offer an opportunity for cross-pollination on issues ranging from open scholarship to technical service provision, training, community engagement and support. OpenAIRE NOADs, EGI NGIs, GEANT NRENs and other national e-Infrastructure representatives will discuss gaps, synergies, coordination and service integration opportunities.
DAY 3 - PARALLEL SESSION 6 & 7
BioCatalogue talk by Carole Goble. She outlines in these slides the reasons behind the BioCatalogue project. And present the BioCatalogue and its goals.
Author's workflow and the role of open accessPaola Gargiulo
This is a presentation made at 10th Fiesole Collection Development Retreat Series. The goal of the presentation is to describe some tools and solutions to make self-archiving easier for authors.
CREW (Collaborative Research Events on the Web) aims to improve access to research event content by capturing and publishing the scholarly communication that occurs at events like conferences and workshops. This is a Virtual Research Environment funded by JISC within the UK.
This slide show describes release 5 of the development. See site: http://www.crew-vre.net/
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEEMEMTECHSTUDENTPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
Tracking user activity logs using Loggastic #ApiPlatformConPaula Čučuk
Meet Loggastic: an Open Source library for easily tracking and storing user activity logs to Elasticsearch. Built on top of Symfony, Loggastic comes with the API Platform support. We will explore the library’s concepts and the different paths we took before arriving at our current approach.
You’ll learn how to integrate Loggastic into your application, adapt it to your needs, and discover how to scale it for large amounts of data.
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...eMadrid network
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estandarización de laboratorios online para educación basados en objetos de aprendizaje inteligentes. Miguel Rodríguez Artacho, UNED. 26/02/2016.
Getting Started with Splunk Enterprise
What is Splunk? At the end of this session you’ll have a high-level understanding of the pieces that make up the Splunk Platform, how it works, and how it fits in the landscape of Big Data. You’ll see practical examples that differentiate Splunk while demonstrating how to gain quick time to value.
Archonnex is a new software architecture developed by ICPSR for digital assets management systems. Built using modern technology stack to meet the current and emerging needs of social science research.
Site up an open access-ICAR
Institutional Repository-Hardware, Software, Policies and Personnel.
ICAR Initiatives
Under NATP Project – Integrated National Agricultural Resources Information System INARIS (Rai et. Al., 2007). A Central Data warehouse (CWD) of agricultural resources was established at IASRI
This project having collaborations with 13 other organizations of ICAR.
In this view 13 different data marts were designed.
This Project was available under this link (http://agdw.iasri.res.in)
My outlook Country should have agri-search engine
Agri-Search Engine should be developed in country to aggregate information from the internet and provide it to farmers in meaningful manner through using ICT tools.
Agri-Search Engine be coordinated with Govt. of India’s Agricultural Websites to monitor each website per day.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
UCIAD overview
1. User Centric Integration of Activity Data Mathieu d’Aquin, Stuart Brown, SalmanElahi, Enrico Motta The Open University
2. Agenda Introduction of the Team Objectives and Hypothesis Overview of technical realization Challenges Summary of results so far and dissemination
3. Team Dr Mathieu d’Aquin– Research fellow, KMi – project director Stuart Brown – Web developments and online communities, communication services – member of the steering group, liaison with online services SalmanElahi– Resarch assistant and PhD student, KMi – developer/researcher Prof Enrico Motta – Professor of knowledge technologies, KMi – Chair of the steering group
4. Objectives and Hypothesis Hypothesis Taking a user centric point of view can allow different types of analysis of logs/activity data, which are valuable to the organisation and the user Ontologiesand Ontology-based reasoning can support the integration, consolidation and interpretation of activity data from multiple sources
6. At the Open University An analytics system building aggregated data from various university’s websites Based on a manually defined sitemaps Good for website optimization, marketing campaigns, etc. But the data being pre-aggregated, it is limited with respect to what it can do Limited control No user view
7. User Centric Activity Data Activity analysis for and by individual users Consolidation Integration Interpretation Ontologies Logs 2 Logs 4 Logs 1 Logs 3 Website 2 Website 4 Website 1 Website 3 Organisation Users
8. Ontologies Formal conceptual models of a domain Here, the domain is online user activity At the basis of Semantic Web technologies Standard languages for expressing ontologies and ontological data (RDF, OWL) Tools to manipulate and work with ontologies and semantic data (NeOn Toolkit, OWLIM) Many ontologies to reuse (cf. Watson) Adhere to a logical formalism Enable inferences on the data
9. Objectives and Deliverables Build the technical infrastructure that can hold traces of activity data as semantic data Include triple store with reasoning capability, log parsers for different formats of logs, and renderers as semantic data (RDF) Build the ontologies to interpret and reason upon activity data Including various aspects of activity data in a way which is extensible Tools to support users in analyzing their own activity data Recognize a user from the different settings and provide view on his/her own data Allow him/her to customize the view, by customizing the ontology Test, validate, deploy, distribute
11. Technical infrastructure Development of parsers for different kinds a log formats Currently handle Apache web server log files, parameterized from the Apache configuration Easily extensible for dedicated log formats Provide a common data structure serialized in RDF by the RDF renderer Each server produces a daily extract from the logs in RDF, which is being used to populate the semantic triple store The triple store includes multiple repositories and sub-spaces depending on time/user/server
12. Ontologies Key concepts to be represented: Actors (human users and robots) Sitemaps Traces (broad notion of logs) Activities Reusing existing ontologies FOAF: for people and documents Time Ontology: for traces Action ontology: for traces and activities (Planned) OPO: Online presence (Planner) SIOC: Online communities
13.
14. Iterative and extensible construction of the ontologies Provide a base with actors, sitemaps and traces Specific extensions with typologies of activities, depending on user and site Dynamically building and integrating
15. Tool for analysis Need a tool which given A set of ontologies A data repository (which can be the overall one, the one restricted by time, and one for a given user) can provide a meaningful and interactive overview of the activity data To be used for Provide an ontology-specific view of data analytics Support the iterative development of the ontologies Provide a user centric view of the data
17. Example In the ontology: /robot.txt is a RobotTXT page A Spider is an RobotAgent (ActorAgent) An agent used to access a RobotTXT is a Spider An AutomaticActivity is a Trace realized by a RobotAgent Result: Thousands of traces automatically classified as automatic activities.
18. Example In the ontology: UCIAD-Blog and LUCERO-Blog are Blogs (Website) A BlogPage is a page which is part of a Blog An activity onBlog is an activity happening on a Blog Page Result: Can look specifically at activities happening on a Blog and specialize them (same applies to Wikis, and other types of websites)
19. Example In the ontology: A SPARQLEndpoint is a specific type of Webpage AccessingSparqlEnpoint is an activity on a SPARQLEndpoint SPARLQQueryParameter is a parameter with the name “query” used in an AccessingSPARQLEndpoint activity ExecutingSPARQLQuery is an AccessingSPARQLQuery activity attached to a SPARQLQueryParameter Result: Can explore the specific activity of executing SPARQL queries and its parameters Can combine: Detect the activity of Automatically Accessing a SPARQL endpoint: and automatic activity and accessing a SPARQL endpoint.
20. Next step: User support Allow users to log-in detect setting bring up the relevant data explore it But also, to customize the view of the data to extend the ontologies to provide a personalized analysis of activity data to export (interpreted) activity data for reuse
21. User support User Logging or register Detect setting (agent+IP) unknown setting It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your account? Check setting non-ambiguous non-ambiguous ambiguous known setting for user Add setting to known setting Register setting as ambiguous Display Activity Data related to all known settings of the user yes no
22. User support: data for a user For a user <u> the SPARQL query Construct {?trace ?p ?y. ?y ?q ?z} where {<u> actor:hasKnownSetting ?s. ?trace trace:hasSetting ?s. ?trace ?p ?y. ?trace ?q ?z} builds the traces of activities around the known setting of <u> Used to populate a specific repository with sub-spaces for each registered users
23. Deployment, test, validation At the moment, testing for websites of projects and events hosted on KMi servers: Sssw.org, sssw09.org, loted.eu, lucero-project.info, uciad.info, data.open.ac.uk, lucero.open.ac.uk, … Next level up, websites/systems from main open university website: www.open.ac.uk, study at the OU, podcasts.open.ac.uk, VLE Extend to deployment of instances for specific projects with distributed websites
24. Challenges Scalability OWLIM triple store can handle billions of triples But struggle with millions when inference is “on” 1 repository without inference with all historical data, 1 with inference with 1 week of data only, and 1 with inference for registered users User management and privacy Ensuring that the user who logs in from a particular setting is the one having the activity is difficult (e.g., in the case of shared computers) Is this really a problem? Check ambiguity – ask verification questions – moderate? Distribution and IPR Code and ontologies under open licenses (small uncertainty regarding code developed in other projects) Overall data: privacy issues (is k-anonymity actually applicable? Would it work?) Overall data: institutional issues (can we show the traffic on our websites to everybody) User data export: what license?
25. Summary and dissemination Promising initial results Can create new ways of analysis at run-time by editing the ontologies! Mechanisms to provide personal views on own activity data across websites First version of the ontologies: ongoing task First version of the tools: test and validate! Dissemination Blog / Twitter #uciad KMi’sinternal news letter (KMi Planet) Salman’s paper at the ESWC 2011 PhD symposium: “Personal Semantics: Personal information management in the Web with Semantic Technologies” Position paper at the W3C Web tracking and privacy workshop: “Self-Tracking on the Web: Why and How” Submission to the Personal Semantic Data workshop at K-CAP 2011
26. More info UCIAD Blog: http://uciad.info Code base: http://github.com/uciad Twitter: #uciad @mdaquin