This document provides an overview of a training course on RDF, SPARQL and semantic repositories. The training course took place in August 2010 in Montreal as part of the 3rd GATE training course. The document outlines the modules covered in the course, including introductions to RDF/S and OWL semantics, querying RDF data with SPARQL, semantic repositories and benchmarking triplestores.
Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.
However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.
The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.
Presented at the Lotico Berlin Semantic Web Meetup.
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
Dan Brickley, 3rd European Commission Metadata Workshop, Luxemburg, April 12th 1999
Understanding RDF: the Resource Description Framework in Context
http://ilrt.org/discovery/2001/01/understanding-rdf/
Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.
However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.
The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.
Presented at the Lotico Berlin Semantic Web Meetup.
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
Dan Brickley, 3rd European Commission Metadata Workshop, Luxemburg, April 12th 1999
Understanding RDF: the Resource Description Framework in Context
http://ilrt.org/discovery/2001/01/understanding-rdf/
RDF is a general method to decompose knowledge into small pieces, with some rules about the semantics or meaning of those pieces. The point is to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the RDF query language SPARQL from a user's perspective.
Comparison of features between ShEx (Shape Expressions) and SHACL (Shapes Constraint Language)
Changelog:
11/06/17
- Removed slides about compositionality
31/May/2017
- Added slide 30 about validation report
- Added slide 32 about stems
- Changed slides 7 and 8 adapting compact syntax to new operator .
23/05/2017:
Slide 14: Repaired typos in typos in sh:entailment, rdfs:range
21/05/2017:
- Slide 8. Changed the example to be an IRI and a datatype
- Added typically in slide 9
- Slide 10: Removed the phrase: "Target declarations can problematic when reusing/importing shapes"
and created slide 27 to talk about reuability
- Added slide 11 to talk about the differences in triggering validation
- Created slide 14 to talk about inference
- Renamed slide 15 as "Inference and triggering mechanism"
- Added slides 27 and 28 to talk about reuability
- Added slide 29 to talk about annotations
18/05/2017
- Slides 9 now includes an example using ShEx RDF vocabulary
- Slide 10 now says that target declarations are optional
- Slide 13 now says that some RDF Schema terms have special treatment in SHACL
- Example in slide 18 now uses sh:or instead of sh:and
- Added slides 22, 23 and 24 which show some features supported by SHACL but not supported by ShEx (property pair constraints, uniqueLang and owl:imports)
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
Full SPARQL training
Covers all SPARQL : basic graph patterns, FILTERs, functions, property paths, optional, negation, assignation, aggregation, subqueries, federated queries.
Does not cover except SPARQL updates.
Includes exercices on DBPedia.
CC BY license
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
RDF is a general method to decompose knowledge into small pieces, with some rules about the semantics or meaning of those pieces. The point is to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the RDF query language SPARQL from a user's perspective.
Comparison of features between ShEx (Shape Expressions) and SHACL (Shapes Constraint Language)
Changelog:
11/06/17
- Removed slides about compositionality
31/May/2017
- Added slide 30 about validation report
- Added slide 32 about stems
- Changed slides 7 and 8 adapting compact syntax to new operator .
23/05/2017:
Slide 14: Repaired typos in typos in sh:entailment, rdfs:range
21/05/2017:
- Slide 8. Changed the example to be an IRI and a datatype
- Added typically in slide 9
- Slide 10: Removed the phrase: "Target declarations can problematic when reusing/importing shapes"
and created slide 27 to talk about reuability
- Added slide 11 to talk about the differences in triggering validation
- Created slide 14 to talk about inference
- Renamed slide 15 as "Inference and triggering mechanism"
- Added slides 27 and 28 to talk about reuability
- Added slide 29 to talk about annotations
18/05/2017
- Slides 9 now includes an example using ShEx RDF vocabulary
- Slide 10 now says that target declarations are optional
- Slide 13 now says that some RDF Schema terms have special treatment in SHACL
- Example in slide 18 now uses sh:or instead of sh:and
- Added slides 22, 23 and 24 which show some features supported by SHACL but not supported by ShEx (property pair constraints, uniqueLang and owl:imports)
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
Full SPARQL training
Covers all SPARQL : basic graph patterns, FILTERs, functions, property paths, optional, negation, assignation, aggregation, subqueries, federated queries.
Does not cover except SPARQL updates.
Includes exercices on DBPedia.
CC BY license
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
Enabling access to Linked Media with SPARQL-MMThomas Kurz
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked Media. However, the value of connecting media to its semantic meta data is limited due to lacking access methods specialized for media assets and fragments as well as to the variety of used description models. With SPARQL-MM we extend SPARQL, the standard query language for the Semantic Web with media specific concepts and functions to unify the access to Linked Media. In this paper we describe the motivation for SPARQL-MM, present the State of the Art of Linked Media description formats and Multimedia query languages, and outline the specification and implementation of the SPARQL-MM function set.
Linking Media and Data using Apache Marmotta (LIME workshop keynote)LinkedTV
Sebastian Schaffert is CTO and co-founder of RedLink GmbH. He is also currently working as head of the "Knowledge and Media Technologies" department at Salzburg Research and occassionally as a lecturer at the University of Applied Sciences (FH) Salzburg. He received his diploma in Computer Science in 2001 and his PhD in 2004, both at the University of Munich, Germany. His current research focus is Semantic Web technologies, especially Linked Data, Semantic Search, Information Extraction, and Multimedia Information Systems.
Keynote at LIME workshop at ESWC 2014.
Smart Data Webinar (SLIDES): Agile Enterprise OntologyDATAVERSITY
An Enterprise Ontology is a formal representation of the meaning of an enterprise's information. Historically creating one has been a laborious process that took many months, often years to complete.
Recent advances in technology and methodology have now made it possible to build a core enterprise ontology in a matter of weeks, and to refine it gracefully over time. In this presentation we will describe what an enterprise ontology is and distinguish it from a conceptual or logical data model. We will present how the combination of structure free and formal inclusion gives power and flexibility simultaneously.
We will show through case studies how companies have been able to build ontologies rapidly and how they have leverage their enterprise ontology onto such diverse initiatives as: Enterprise Message Modeling, Integrating Structured and Unstructured Data, Model Driven Application Development and Master Data Management.
After attending this webinar, you will be able to:
* Describe what an Enterprise Ontology is
* Explain the value in creating an Enterprise Ontology
* Outline the approach and prerequisites for creating an Enterprise Ontology
* Suggest projects that could be greatly enhanced by applying an Enterprise Ontology
Milton Keynes is one of the fastest growing cities in the UK and a great economic success story. However, the challenge of supporting sustainable growth without exceeding the capacity of the infrastructure, and whilst meeting key carbon reduction targets, is a major one.
MK:Smart is a large collaborative initiative, partly funded by HEFCE (the Higher Education Funding Council for England) and led by The Open University, which is developing innovative solutions to support economic growth in Milton Keynes.
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τις Βιβλιοθήκες του Τμήματος Νομικής του ΕΚΠΑ και του Πανεπιστημίου Πειραιώς στις 18 και 19 Ιουνίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
Overview of the W3C Semantic Sensor Network (SSN) ontologyRaúl García Castro
The slides include an overview of the W3C Semantic Sensor Network (SSN) ontology along with an example of its use in a coastal flood emergency planning use case in the FP7 SSG4Env project.
Resource Description Framework (RDF) has entered the metadata scene for libraries in a major way over the last few years. While the promise of its Linked Data capabilities is exciting, the realities of changing data models, encoding practices, and even ontologies can put a check on that excitement. This session will explore these issues and discuss when this is worth doing and how to go about doing it.
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and other online resources by leveraging open linked data principles. The library has been actively evaluating linked data’s potential to replace current library processes and services (bibliographic services, finding aids, cataloging, and metadata work) as a more efficient and sustainable means, and one that could bring greater benefit to end users for research and learning. The Library’s initial focus was on workforce education and hands-on learning through real-time experiments: the Connections project was begun to prepare staff to work with linked data, a process that has culminated in a 3-month hands-on pilot to build and convert some data. The pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, metadata librarians, and programmers. Emory’s “silos” of data were interlinked with other open data sources as a way to enhance user discovery and use of library materials on a very limited scale.
Current metadata landscape in the library world Getaneh AlemuGetaneh Alemu
This workshop was presented at MTSR-2017 (Nov. 27, 2017) in Tallinn, Estonia http://www.mtsr-conf.org/index.php/programme The workshop aims to bring the current metadata landscape in libraries in context, with particular emphasis on emerging theory/principles and best practices covering:
• The theory of enriching and filtering
• Metadata enriching through RDA (Hands on - The RDA Toolkit and implementation of RDA at Southampton Solent University)
• Metadata filtering through FRBR (practical issues that cataloguers face in FRBRising their catalogue)
• Metadata management (metadata quality, authority control and subject headings)
• Metadata systems, tools and applications (practical issues of e-books and database cataloguing)
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
The talk was given at the 15th International Conference on Extending Database Technology (EDBT 2012) on March 29, 2012 in Berlin, Germany.
Abstract:
Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.
Consuming Linked Data by Machines - WWW2010Juan Sequeda
These are the Consuming Linked Data by Machines slides that we presented at the Consuming Linked Data tutorial at WWW2010 in Raleigh, NC on April 26, 2010. These slides are originally by Patrick Sinclair from BBC
Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011
http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/
Similar to RDF, SPARQL and Semantic Repositories (20)
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
High-performing engineering teams regularly dedicate time on measuring the performance & quality of the systems and applications they’re building or on measuring & improving the various aspects of the development lifecycle. High-performing product companies are also data-driven when it comes to measuring the impact of new features & products in terms of business KPIs and Northstar metrics.
Can a data-driven approach be applied to measuring the performance, maturity and continuous improvement of an engineering team or the whole engineering organisation? In this discussion we’ll cover various important topics related to quantifying the performance of an engineering organisation
The career development of our teammates is among the key responsibilities of a leader - and оur personal career development vision & plan plays a critical role for our long term growth and success. Despite their importance, our career vision is often not getting enough attention and level of detail, or is hampered by easily avoidable mistakes. In this discussion, we’ll address typical mistakes related to long-term career planning, some best practices, and practical steps for building our own long-term career development vision (or the ones of the teammates we are leading), so that career planning becomes a long term journey with clear why/how/what, rather than just a list of SMART goals
Uber began its open source journey in 2015 when three passionate engineers decided to contribute Uber’s work back to the community. In only four years, Uber’s open source program has fostered 350+ outstanding open source projects with 2,000+ contributors worldwide delivering over 70,000 commits. Since 2017, four of Uber’s open source projects have won InfoWorld’s Best of Open Source Software Awards. In this talk, Brian Hsieh & Marin Dimitrov will share more details on Uber’s open source journey, program and best practices, and how Uber enables open innovation by fostering a healthy and collaborative open source culture
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
>>> Most leaders agree that trust is a key factor for the success o the team and the organisation and that they are actively working to build trust. And yet, various studies imply that almost half of the teams and organisations worldwide experience lower trust levels with their managers, teammates and the rest of the organisation, which leads to decreased engagement, productivity and success.
>>> In this talk we will discuss why trust is a key success factor for every team and every organisation, some good practices for building, sustaining and rebuilding trust, as well as the most common mistakes related to trust building
talk @ the Computer Science department of Sofia University - practical advice for career growth for students
DEV.BG event http://dev.bg/%D1%81%D1%8A%D0%B1%D0%B8%D1%82%D0%B8%D0%B5/fmi-club-%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D1%87%D0%BD%D0%B8-%D1%81%D1%8A%D0%B2%D0%B5%D1%82%D0%B8-%D0%B7%D0%B0-%D0%BA%D0%B0%D1%80%D0%B8%D0%B5%D1%80%D0%BD%D0%BE-%D1%80%D0%B0%D0%B7%D0%B2%D0%B8%D1%82/
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
slides from our talk "Low-Cost Open Data as-a-service" from the Semantic Web Developers workshop of ESWC'2015 (full paper: http://ceur-ws.org/Vol-1361/paper7.pdf)
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
slides from the talk on "Text Analytics & Linked Data Management As-a-Service with S4" from the ESWC'2015 workshop on Semantic Web Enterprise Adoption & Best Practices
full paper available at http://2015.wasabi-ws.org/papers/wasabi15_1.pdf
overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
1. RDF, SPARQL and Semantic
Repositories
3rd GATE Training Course @ Montreal
Module 14
Marin Dimitrov (Ontotext)
August 2010
2. 3rd GATE Training Course and
Developer Sprint @ Montreal, Aug 2010
• https://gate.ac.uk/sale/images/gate-art/fig-
posters/fig3-poster.pdf
Aug 2010 #2
RDF, SPARQL and Semantic Repositories
3. Module 14 programme
9.45-11.00
• RDF/S and OWL formal semantics and profiles
• Querying RDF data with SPARQL
11.00-11.15
• Coffee break
11.15-12.30
• Semantic Repositories
• OWLIM overview
12.30-14.00 Lunch break
14.00-16.00
• Benchmarking triplestores
• Distributed approaches to RDF materialisation
• From RDBMS to RDF
• Other RDF tools
16.00-16.30 Coffee
RDF, SPARQL and Semantic Repositories Aug 2010 #3
4. RDF/S AND OWL FORMAL
SEMANTICS, DIALECTS & PROFILES
RDF, SPARQL and Semantic Repositories Aug 2010 #4
5. Resource Description Framework (RDF)
• A simple data model for
– Formally describing the semantics of information in a
machine accessible way
– representing meta-data (data about data)
• A set of representation syntaxes
– XML (standard) but also N3, Turtle, …
• Building blocks
– Resources (with unique identifiers)
– Literals
– Named relations between pairs of resources (or a resource
and a literal)
RDF, SPARQL and Semantic Repositories Aug 2010 #5
6. RDF (2)
• Everything is a triple
– Subject (resource), Predicate (relation), Object (resource
or literal)
• The RDF graph is a collection of triples
predicate
subject object
Concordia locatedIn
Montreal
University
hasPopulation
Montreal 1620698
RDF, SPARQL and Semantic Repositories Aug 2010 #6
7. RDF graph example
hasName “Concordia University”
dbpedia:Concordia_University
hasName
“Université Concordia”
Subject Predicate Object
http://dbpedia.org/resource/Concordia_University hasName “Concordia University”
http://dbpedia.org/resource/Concordia_University hasName “Université Concordia”
RDF, SPARQL and Semantic Repositories Aug 2010 #7
10. RDF advantages
• Simple but expressive data model
• Global identifiers of all resources (URIs)
– Reduces ambiguity
• Easier incremental data integration
– Can handle incomplete information (Open World
Assumption)
• Schema agility
• Graph structure
– Suitable for a large class of tasks
– Data merging is easier
RDF, SPARQL and Semantic Repositories Aug 2010 #10
11. RDF Schema (RDFS)
• RDFS provides means for:
– Defining Classes and Properties
– Defining hierarchies (of classes and properties)
• RDFS differs from XML Schema (XSD)
– Open World Assumption vs. Closed World Assumption
– RDFS is about describing resources, not about validation
• Entailment rules (axioms)
– Infer new triples from existing ones
RDF, SPARQL and Semantic Repositories Aug 2010 #11
14. RDFS – inferring new triples
owl:SymmetricProperty owl:inverseOf
inferred
rdf:type
rdf:type
owl:inverseOf
owl:relativeOf ptop:parentOf
rdfs:subPropertyOf
ptop:Agent
owl:inverseOf
owl:inverseOf
ptop:Person
rdf:type
rdfs:range
ptop:childOf
myData:Ivan
ptop:Woman
myData: Maria
RDF, SPARQL and Semantic Repositories Aug 2010 #14
15. Web Ontology Language (OWL)
• More expressive than RDFS
– Identity equivalence/difference
• sameAs, differentFrom, equivalentClass/Property
• More expressive class definitions
– Class intersection, union, complement, disjointness
– Cardinality restrictions
• More expressive property definitions
– Object/Datatype properties
– Transitive, functional, symmetric, inverse properties
– Value restrictions
RDF, SPARQL and Semantic Repositories Aug 2010 #15
16. OWL (2)
• What can be done with OWL?
– Consistency checks – are there contradictions in the logical
model?
– Satisfiability checks – are there classes that cannot have
any instances?
– Classification – what is the type of a particular instance?
• OWL sublanguages
– OWL Lite – low expressiveness / low computational
complexity
– OWL DL – high expressiveness / decidable & complete
– OWL Full – max expressiveness / no guarantees
RDF, SPARQL and Semantic Repositories Aug 2010 #16
18. The cost of “semantic clarity”
(c) Mike Bergman
RDF, SPARQL and Semantic Repositories Aug 2010 #18
19. OWL sublanguages – OWL Lite
• OWL Lite
– low expressiveness / low computational complexity
– All RDFS features
– sameAs/differentFrom, equivalent class/property
– inverse/symmetric/transitive/functional properties
– property restrictions, cardinality (0 or 1)
– class intersection
RDF, SPARQL and Semantic Repositories Aug 2010 #19
20. OWL sublanguages – OWL DL & OWL Full
• OWL DL
– high expressiveness / decidable & complete
– All OWL Lite features
– Class disjointness
– Complex class expressions
– Class union & complement
• OWL Full
– max expressiveness / no guarantees
– Same vocabulary as OWL-DL but less restrictions
• In OWL DL, a Class cannot also be an Individual or a Property
RDF, SPARQL and Semantic Repositories Aug 2010 #20
21. OWL 2 profiles
• Goals
– sublanguages that trade expressiveness for efficiency of
reasoning
– Cover important application areas
– Easier to understand by non-experts
• OWL 2 EL
– Best for large ontologies / small instance data (TBox
reasoning)
– Computationally optimal
• Satisfiability checks in PTime
RDF, SPARQL and Semantic Repositories Aug 2010 #21
22. OWL 2 profiles (2)
• OWL 2 QL
– Quite limited expressive power, but very efficient for
query answering with large instance data
– Can exploit query rewriting techniques
• Data storage & query evaluation can be delegated to a RDBMS
• OWL 2 RL
– Balance between scalable reasoning and expressive power
(ABox reasoning)
– OWL 2 RL rules can be expressed in RIF BLD
RDF, SPARQL and Semantic Repositories Aug 2010 #22
23. OWL 2 profiles (3)
(c) Axel Polleres
RDF, SPARQL and Semantic Repositories Aug 2010 #23
24. QUERYING RDF DATA WITH
SPARQL
RDF, SPARQL and Semantic Repositories Aug 2010 #24
25. SPARQL Protocol and RDF Query Language (SPARQL)
• SQL-like query language for RDF data
• Simple protocol for querying remote databases over
HTTP
• Query types
– select – projections of variables and expressions
– construct – create triples (or graphs) based on query
results
– ask – whether a query returns results (result is true/false)
– describe – describe resources in the graph
RDF, SPARQL and Semantic Repositories Aug 2010 #25
26. Example - describing resources
• Go to www.FactForge.net and execute (in the
“SPARQL query” tab):
PREFIX dbpedia: <http://dbpedia.org/resource/>
DESCRIBE dbpedia:Montreal
RDF, SPARQL and Semantic Repositories Aug 2010 #26
27. SPARQL select
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?university ?students
WHERE {
?university rdf:type dbpedia:University .
?university dbp-ont:numberOfStudents ?students .
?university dbp-ont:city dbpedia:Montreal .
FILTER (?students > 5000)
}
ORDER BY DESC (?students)
RDF, SPARQL and Semantic Repositories Aug 2010 #27
28. Triple patterns
• Whitespace separated list of Subject, Predicate,
Object
– ?x dbp-ont:city dbpedia:Montreal
– dbpedia:Concordia_University db-ont:city ?y
• Triple patterns with common Subject
?uni rdf:type dbpedia:University .
?uni dbp-ont:city dbpedia:Montreal .
?uni rdf:type dbpedia:University ;
dbp-ont:city dbpedia:Montreal .
RDF, SPARQL and Semantic Repositories Aug 2010 #28
29. Triple patterns (2)
• Triple patterns with common Subject and Predicate
?city rdf:label „Montreal‟@en .
?city rdf:label „Montréal‟@fr .
?city rdf:label „Montreal‟@en ,„Montréal‟@fr .
• “a” can be used as an alternative for rdf:type
?uni rdf:type dbpedia:University .
?uni a dbpedia:University .
RDF, SPARQL and Semantic Repositories Aug 2010 #29
30. Graph Patterns
• Basic Graph Pattern
– A conjunction of triple patterns
• Group Graph Pattern
– A group of 1+ graph patterns
– Patterns are enclosed in { }
– FILTERs can constrain the whole group
{
?uni a dbpedia:University ;
dbp-ont:city dbpedia:Montreal ;
dbp-ont:numberOfStudents ?students .
FILTER (?students > 5000)
}
RDF, SPARQL and Semantic Repositories Aug 2010 #30
31. Graph Patterns (2)
• Optional Graph Pattern
– Optional parts of a pattern
– pattern OPTIONAL {pattern}
SELECT ?uni ?students
WHERE {
?uni a dbpedia:University ;
dbp-ont:city dbpedia:Montreal .
OPTIONAL {
?uni dbp-ont:numberOfStudents ?students
}
}
RDF, SPARQL and Semantic Repositories Aug 2010 #31
32. Graph Patterns (3)
• Alternative Graph Pattern
– Combine results of several alternative patterns
– {pattern} UNION {pattern}
SELECT ?uni
WHERE {
?uni a dbpedia:University .
{
{ ?uni dbp-ont:city dbpedia:Vancouver }
UNION
{ ?uni dbp-ont:city dbpedia:Montreal }
}
}
RDF, SPARQL and Semantic Repositories Aug 2010 #32
33. Anatomy of a SPARQL query
• List of namespace prefixes
– PREFIX xyz: <URI>
• List of variables
– ?x, $y
• Graph patterns + filters
– Simple / group / alternative / optional
• Modifiers
– ORDER BY, DISTINCT, OFFSET/LIMIT
RDF, SPARQL and Semantic Repositories Aug 2010 #33
34. Anatomy of a SPARQL query (2)
SELECT ?var1 ?var2
WHERE {
triple-pattern1 .
triple-pattern2 .
{{triple-pattern3} UNION {triple-pattern4}}
OPTIONAL {triple-pattern5}
FILTER (filter-expr)
}
ORDER BY DESC (?var1)
LIMIT 100
RDF, SPARQL and Semantic Repositories Aug 2010 #34
35. SPARQL example
• List of namespace prefixes
• Go to www.FactForge.net (“SPARQL query” tab)
• Find all universities (and their number of
students) which are located in Quebec
• geo-ont:parentFeature, geo-ont:name, geo-
ont:alternativeName
RDF, SPARQL and Semantic Repositories Aug 2010 #35
36. #36 Aug 2010 RDF, SPARQL and Semantic Repositories
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
SELECT DISTINCT ?university ?city ?students
WHERE {
?university rdf:type dbpedia:University ;
dbp-ont:city ?city .
?city geo-ont:parentFeature ?province .
?province geo-ont:name “Québec” .
OPTIONAL {
?university dbp-ont:numberOfStudents ?students
}
}
ORDER BY ASC (?city)
SPARQL example (2)
38. Semantic Repositories
• Semantic repositories combine the features of:
– Database management systems (DBMS) and
– Inference engines
• Rapid progress in the last 5 years
– Every couple of years the scalability increases by an order
of magnitude
• “Track-laying machines” for the Semantic Web
– Extending the reach of the “data railways” and changing
the data-economy by allowing more complex data to be
managed at lower cost
RDF, SPARQL and Semantic Repositories Aug 2010 #38
39. Semantic Repositories as Track-laying Machines
RDF, SPARQL and Semantic Repositories Aug 2010 #39
40. Semantic Repositories as Track-laying Machines (2)
RDF, SPARQL and Semantic Repositories Aug 2010 #40
41. Semantic Repositories vs. RDBMS
• The major differences with the DBMS are
– They use ontologies as semantic schemata, which allows
them to automatically reason about the data
– They work with a more generic datamodel, which allows
them more agile to updates and extensions in the
schemata (i.e. in the structure of the data)
RDF, SPARQL and Semantic Repositories Aug 2010 #41
43. Major characteristics
• Easy integration of multiple data-sources
– Once the schemata of the data-sources is semantically
aligned, the inference capabilities of the engine assist the
interlinking and combination of facts from different
sources
• Easy querying against rich or diverse data schemata
– Inference is applied to match the semantics of the query to
the semantics of the data, regardless of the vocabulary
and data modeling patterns used for encoding the data
RDF, SPARQL and Semantic Repositories Aug 2010 #43
44. Major characteristics (2)
• Great analytical power
– Semantics will be thoroughly applied even when this
requires recursive inferences on multiple steps
– Discover facts, by interlinking long chains of evidence
• the vast majority of such facts would remain hidden in the DBMS
• Efficient data interoperability
– Importing RDF data from one store to another is straight-
forward, based on the usage of globally unique identifiers
RDF, SPARQL and Semantic Repositories Aug 2010 #44
45. Reasoning Strategies
• The strategies for rule-based inference are:
– Forward-chaining: start from the known (explicit) facts
and perform inference in an inductive manner until the
complete closure is inferred
– Backward-chaining: to start from a particular fact and to
verify it against the knowledge base using deductive
reasoning
• the reasoner decomposes the query (or the fact) into simpler facts
that are available in the KB or can be proven through further
recursive decompositions
RDF, SPARQL and Semantic Repositories Aug 2010 #45
46. Reasoning Strategies (2)
• Inferred closure
– the extension of a KB (a graph of RDF triples) with all the
implicit facts (triples) that could be inferred from it, based
on the pre-defined entailment rules
• Materialization
– Maintaining an up-to-date inferred closure
RDF, SPARQL and Semantic Repositories Aug 2010 #46
47. Pros and cons of forward-chaining based
materialization
• Relatively slow upload/store/addition of new facts
– inferred closure is extended after each transaction
• Deletion of facts is slow
– facts that are no longer true are removed from the
inferred closure
• The maintenance of the inferred closure requires
considerable resources
• Querying and retrieval are fast
– no reasoning is required at query time
– RDBMS-like query evaluation & optimisation techniques
are applicable
RDF, SPARQL and Semantic Repositories Aug 2010 #47
48. OWLIM OVERVIEW
RDF, SPARQL and Semantic Repositories Aug 2010 #48
49. Semantic Repository for RDFS and OWL
• OWLIM is a family of semantic repositories
– SwiftOWLIM – fast in-memory operations, scales to
~100M statements
– BigOWLIM – optimized for data integration, massive query
loads and critical applications, scales up to 20B statements
• OWLIM is designed and tuned to provide
– Efficient management, integration and analysis of
heterogeneous data
– Light-weight, high-performance reasoning
RDF, SPARQL and Semantic Repositories Aug 2010 #49
50. Naïve OWL Fragments Map
Complexity*
Expressivity supported
by OWLIM
OWL Full
SWRL
OWL DL
OWL Lite
OWL/WSML Flight
Datalog
OWL 2 RL
OWL Horst / Tiny
OWL Lite- / DHL
OWL DLP
RDFS
Rules, LP DL
RDF, SPARQL and Semantic Repositories Aug 2010 #50
51. BigOWLIM performance
• BigOWLIM is the only engine that can reason with
more than 10B statements, on a $10,000 server
– It passes LUBM(90000), indexing over 20B explicit and
implicit statements and being able to efficiently answer
queries
– it offers the most efficient query evaluation - the only RDF
database for which full-cycle benchmarking results are
published for the LUBM(8000) benchmark or higher
RDF, SPARQL and Semantic Repositories Aug 2010 #51
52. Key Features
• Clustering support brings resilience, failover and
horizontally scalable parallel query processing
• Optimized owl:sameAs handling
• Integrated full-text search
• High performance retraction of statements and their
inferences
• Consistency checking mechanisms
• RDF rank for ordering query results by relevance
• Notification mechanism, to allow clients to react to
updates in the data stream
RDF, SPARQL and Semantic Repositories Aug 2010 #52
53. Replication Cluster
• Improve scalability with respect to concurrent user
requests
• How does it work?
– Each data write request is multiplexed to all repository
instances
– Each read request is dispatched to one instance only
– To ensure load-balancing,
read requests are sent to the
instance with the shortest
execution queue
RDF, SPARQL and Semantic Repositories Aug 2010 #53
54. RDF Rank
• BigOWLIM uses a modification of PageRank over RDF
graphs
• The computation of the RDFRank-s for FactForge
(couple of billion statements) takes just a few
minutes
• Results are available through a system predicate
– Example: get the 100 most “important” nodes in the RDF
graph
SELECT ?node {?node onto:hasRDFRank ?rank}
ORDER BY DESC(?rank) LIMIT 100
RDF, SPARQL and Semantic Repositories Aug 2010 #54
55. example
• Go to www.FactForge.net ( “SPARQL query” tab)
• Find the 10 “most important” universities located in
Quebec
– geo-ont:parentFeature, geo-ont:name, geo-
ont:alternativeName, om:hasRDFRank
RDF, SPARQL and Semantic Repositories Aug 2010 #55
56. example (2)
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX owlim: <http://www.ontotext.com/owlim/>
SELECT DISTINCT ?university ?rank
WHERE {
?university rdf:type dbpedia:University ;
dbp-ont:city ?city ;
owlim:hasRDFRank ?rank .
?city geo-ont:parentFeature ?province .
?province geo-ont:name “Québec” .
}
ORDER BY DESC (?rank)
LIMIT 10
RDF, SPARQL and Semantic Repositories Aug 2010 #56
58. RDF Search – advanced full-text search in RDF graphs
• Objective
– Search in an RDF graph by keywords
– Get more usable results (standalone literals are not useful
in most cases)
• What to index:
– URIs
– The RDF molecule for an URI
• the description of the node, including all outgoing statements
• Results returned
– List of URIs, ranked by FTS + RDF Rank metric
RDF, SPARQL and Semantic Repositories Aug 2010 #58
59. RDF Search (2)
• The ranking is based on the standard vector-space-
model relevance, boosted by RDF Rank
PREFIX gossip: <http://www.....gossipdb.owl#>
PREFIX onto: <http://www.ontotext.com/>
SELECT * WHERE {
?person gossip:name ?name .
?name onto:luceneQuery "American AND life~".
}
RDF, SPARQL and Semantic Repositories Aug 2010 #59
60. owl:sameAs optimisation
• owl:sameAs declares that two different URIs denote
one and the same resource or object in the world
– it is used to align different identifiers of the same real-
world entity used in different data sources
• Example: five different URIs for Montreal in four
different datasets
dbpedia:Montreal owl:sameAs geonames:6077244
dbpedia:Montreal owl:sameAs geonames:6077243
geonames:6077243 owl:sameAs fbase:guid.9202a8c04000641f8000000000028aa7
fbase:guid.9202a8c04000641f8000000000028aa7 owl:sameAs
nytimes:N59179828586486930801
RDF, SPARQL and Semantic Repositories Aug 2010 #60
61. owl:sameAs optimisation (2)
• According to the standard semantics of owl:sameAs:
– It is a transitive and symmetric relationship
– Statements, asserted using one of the equivalent URIs
should be inferred to appear with all equivalent URIs
placed in the same position
– Thus the four statements in the previous example lead to
ten inferred statements
RDF, SPARQL and Semantic Repositories Aug 2010 #61
63. owl:sameAs optimisation (4)
• BigOWLIM features an optimisation that allows it to
use a single master-node in its indices to represent a
class of sameAs-equivalent URIs
• Avoids inflating the indices with multiple equivalent
statements
• Optionally expands query results
– The owl:sameAs equivalence leads to multiplication of the
bindings of the variables in the process of query evaluation
(both forward- and backward-chaining)
RDF, SPARQL and Semantic Repositories Aug 2010 #63
65. Tasks to be benchmarked
• Data loading
– parsing, persistence, and indexing
• Query evaluation
– query preparation and optimization, fetching
• Data modification
– May involve changes to the ontologies and schemata
• Inference is not a first-level activity
– Depending on the implementation, it can affect the
performance of the other activities
RDF, SPARQL and Semantic Repositories Aug 2010 #65
66. Performance factors for data loading
• Materialization
– Whether forward-chaining is performed at load time & the
complexity of forward-chaining
• Data model complexity
– Support for extended RDF data models (e.g. named
graphs), is computationally more expensive
• Indexing specifics
– Repositories can apply different indexing strategies
depending on the data loaded, usage patterns, etc.
• Transaction Isolation
RDF, SPARQL and Semantic Repositories Aug 2010 #66
67. Performance factors for query evaluation
• Deduction
– Whether and how complex backward-chaining is involved
• Size of the result-set
– Fetching large result-sets can take considerable time
• Query complexity
– Number of constraints (e.g. triple-pattern joins)
– Semantics of the query (e.g. negation- and disjunction-
related clauses)
– Use of operators that cannot be optimized (e.g. LIKE)
• Number of concurrent clients
RDF, SPARQL and Semantic Repositories Aug 2010 #67
68. Performance dimensions
• Scale
– The size of the repository (number of triples)
• Schema and data complexity
– The complexity of the ontology/logical language
– The specific ontology (or schema) and dataset
• E.g. a highly interconnected dataset, with long chains of transitive
properties, can appear quite challenging for reasoning
– Sparse versus dense datasets
– Presence and size of literals
– Number of predicates used
– Use of owl:sameAs
RDF, SPARQL and Semantic Repositories Aug 2010 #68
69. Scalability metrics
• Number of inserted statements (NIS)
• Number of stored statements (NSS)
– How many statements have been stored and indexed?
– Duplicates can make NSS smaller than NIS
– For engines using forward-chaining and materialization,
the volume of data to be indexed includes the inferred
triples
• Number of retrievable statements (NRS)
– How many different statements can be retrieved?
– This number can be different from NSS when the
repository supports some sort of backward-chaining
RDF, SPARQL and Semantic Repositories Aug 2010 #69
70. Full-cycle benchmarking
• We call full-cycle benchmarking any methodology
that provides a complete picture of the performance
with respect to the full “life cycle” of the data within
the engine
– publication of data for both loading and query evaluation
performance in the framework of a single experiment or
benchmark run
– Full-cycle benchmarking requires load performance data to
be matched with query evaluation performance
• “5 billion triples of LUBM were loaded in 30 hours”
• “… and the evaluation of the 14 queries took 1 hour on a warm
database.”
RDF, SPARQL and Semantic Repositories Aug 2010 #70
71. Full-cycle benchmarking (2)
• Typical set of activities to be covered:
1. Loading input RDF files from the storage system
2. Parsing the RDF files
3. Indexing and storing the triples
4. Forward-chaining and materialization (optional)
5. Query parsing
6. Query optimization
• Query re-writing (optional)
7. Query evaluation, involving
• Backward-chaining (optional)
• Fetching the results
RDF, SPARQL and Semantic Repositories Aug 2010 #71
77. Distributed RDF materialisation with MapReduce
• Distributed approach by Urbani et al., ISWC’2009
– “Scalable Distributed Reasoning using MapReduce”
• 64 node Hadoop cluster
• MapReduce
– Map phase – partitions the input space by some key
– Reduce phase – perform some aggregated processing on
each partition (from the Map phase)
• The partition contains all elements for a particular key
• Skewed key distribution leads to uneven load on Reduce nodes
• Balanced Reduce load almost impossible to achieve (major M/R
drawback)
RDF, SPARQL and Semantic Repositories Aug 2010 #77
80. RDF materialisation with MapReduce – “naïve”
approach
• applying all RDFS rules iteratively on the input until
no new data is derived (fixpoint)
– rules with one antecedent are easy
– rules with 2 antecedents require a join
• Map function
– Key is S, P or O, value is original triple
– 3 key/value pairs generated for each input triple
• Reduce function – performs the join
RDF, SPARQL and Semantic Repositories #80
Aug 2010(c) Urbani et al.
81. RDF materialisation with MapReduce – optimised
approach
• Problems with the “naïve” approach
– One iteration is not enough
– Too many duplicates generated
• Ratio of unique:duplicate triples is around 1:50
• Optimised approach
– Load schema triples in memory (0.001-0.01% of triples)
• On each node joins are made between a very small set of schema
triples and a large set of instance triples
• Only the instance triples are streamed by the MapReduce pipeline
RDF, SPARQL and Semantic Repositories Aug 2010 #81
82. RDF materialisation with MapReduce – optimised
approach (2)
• Data grouping to avoid duplicates
– Map phase: set as key those parts of the input (S/P/O) that
are also used in the derived triple. All triples that produce
the same triple will be sent to the same Reducer
– Join schema triples during the Reduce phase to reduce
duplicates
• Ordering the sequence of rule application
– Analyse the ruleset and determine which rules may
triggered other rules
– Dependency graph, optimal application of rules from
bottom-up
RDF, SPARQL and Semantic Repositories Aug 2010 #82
83. RDF materialisation with MapReduce – rule reordering
(c) Urbani et al.
RDF, SPARQL and Semantic Repositories Aug 2010 #83
84. RDF materialisation with MapReduce – benchmarks
• Performance benchmarks
• 4.3 million triples / sec (30 billion in ~2h)
(c) Urbani et al.
RDF, SPARQL and Semantic Repositories Aug 2010 #84
85. FROM RDBMS TO RDF
RDF, SPARQL and Semantic Repositories Aug 2010 #85
86. RDB2RDF
• RDB2RDF working group @ W3C
– http://www.w3.org/2001/sw/rdb2rdf/
– standardize a language (R2RML) for mapping relational
data / database schemas into RDF / OWL
• Existing RDF-izers
– Triplify, D2RQ, RDF Views
RDF, SPARQL and Semantic Repositories Aug 2010 #86
87. RDB2RDF – table-to-class mapping approach
• Each RDBMS table is a RDF class
• Each RDBMS record is a RDF node
• Each PK value value is a RDF Subject URI
• Each non-PK column name in a RDB table is a RDF
Predicate
• Each RDBMS table cell value (non-PK) is a RDF Object
RDF, SPARQL and Semantic Repositories Aug 2010 #87
88. RDB2RDF mapping example
RDF, SPARQL and Semantic Repositories Aug 2010 #88
(c) RDB2RDF @ W3C.
89. OTHER RDF TOOLS
RDF, SPARQL and Semantic Repositories Aug 2010 #89
90. Ontology editors
• TopBraid Composer
– http://www.topquadrant.com
RDF, SPARQL and Semantic Repositories Aug 2010 #90
91. Ontology editors (2)
• Altova SemanticWorks
– http://www.altova.com/semanticworks.html
RDF, SPARQL and Semantic Repositories Aug 2010 #91
92. Ontology editors (3)
• Protégé
– http://protege.stanford.edu
– http://webprotege.stanford.edu
RDF, SPARQL and Semantic Repositories Aug 2010 #92
93. RDF-izers
• Triplify
– http://triplify.org
– Transform relational data into RDF / Linked Data
• D2RQ platform
– http://www4.wiwiss.fu-berlin.de/bizer/d2rq/index.htm
– D2RQ mapping language
– D2RQ plugin for Sesame/Jena
– D2R server
– Linked Data & SPARQL endpoint
RDF, SPARQL and Semantic Repositories Aug 2010 #93
94. RDF APIs
• Jena
– http://jena.sourceforge.net
– RDF/OWL API (Java)
– In-memory or persistent storage
– SPARQL query engine
• OpenRDF (Sesame)
– http://www.openrdf.org
– RDF API (Java), high performance parser
– Persistent storage
– SPARQL query engine
RDF, SPARQL and Semantic Repositories Aug 2010 #94
95. Sesame API
• Architecture
– RDF Model
– RDF I/O (parsers & serializars)
– Storage & Inference Layer (for reasoners & databases)
– High-level Repository API
– HTTP & REST service frontend
(c) OpenRDF.org
RDF, SPARQL and Semantic Repositories Aug 2010 #95
96. Sesame API (2)
• RDF model
– Create statements, get S/P/O
• SAIL API
– Initialise, shut-down repositories
– Add/remove/iterate statements
– Queries
• Repository API
– Create a repository (in-memory, filesystem), connect
– Add, retrieve, remove triples
– Queries
RDF, SPARQL and Semantic Repositories Aug 2010 #96
97. Sesame API – REST interface
resource GET POST PUT DELETE
/repositories List available - - -
repositories
/repositories/ID query Query - -
evaluation evaluation
• query text
• Bindings
• Inference
/repositories/ID/statements Get Adds/updates Modifies existing Deletes
statements statements statements statements
in repository
• S, P, O
• S, P, O
• inference
/repositories/ID/namesapces List - - Deletes all
Namespace namespace
definitions definitions
/repositories/ID/namesapces/PREF Gets the - Defines/updates Removes a
namespace the namespace for namespace
for a prefix a prefix declaration
RDF, SPARQL and Semantic Repositories Aug 2010 #97
98. Sesame API – OpenRDF Workbench
RDF, SPARQL and Semantic Repositories Aug 2010 #98
99. Summary of this module
• Benefits of using RDF
– Simple data model, global identifiers, schema agility,
incremental data integration
• RDFS/OWL entailment rules make it possible to infer
new triples (knowledge) from existing ones
• OWL language
– Identity equivalence, class intersection / union /
complement, cardinality restrictions, object / datatype
properties, transitive / symmetric / inverse / functional
properties
RDF, SPARQL and Semantic Repositories Aug 2010 #99
100. Summary of this module (2)
• OWL languages and profiles
– OWL Lite, OWL DL, OWL Full
– OWL 2 EL, OWL 2 QL, OWL 2 RL
• SPARQL is a SQL-like language for querying RDF data
• Semantic repositories combine features of DBMS and
logic inference engines
• Forward chaining vs. Backward chaining stategies
• OWLIM features
– Massively scalable, replication cluster, owl:sameAs
optimisation, RDF search, RDF Rank
RDF, SPARQL and Semantic Repositories Aug 2010 #100
101. Summary of this module (3)
• Benchmarking
– Performance factors and dimensions
• MapReduce can be used for massively distributed
RDF materialisation
• RDBMS-to-RDF mapping approaches and tools
• RDF tools
– Ontlogy editors, RDF-izers, RDF APIs
RDF, SPARQL and Semantic Repositories Aug 2010 #101