This tutorial explains the Data Web vision, some preliminary standards and technologies as well as some tools and technological building blocks developed by AKSW research group from Universität Leipzig.
A Semantic Data Model for Web ApplicationsArmin Haller
This presentation gives a short overview of the Semantic Web, RDFa and Linked Data. The second part briefly discusses ActiveRaUL, our model and system for developing form-based Web applications using Semantic Web technologies.
Lecture at the advanced course on Data Science of the SIKS research school, May 20, 2016, Vught, The Netherlands.
Contents
-Why do we create Linked Open Data? Example questions from the Humanities and Social Sciences
-Introduction into Linked Open Data
-Lessons learned about the creation of Linked Open Data (link discovery, knowledge representation, evaluation).
-Accessing Linked Open Data
This tutorial explains the Data Web vision, some preliminary standards and technologies as well as some tools and technological building blocks developed by AKSW research group from Universität Leipzig.
A Semantic Data Model for Web ApplicationsArmin Haller
This presentation gives a short overview of the Semantic Web, RDFa and Linked Data. The second part briefly discusses ActiveRaUL, our model and system for developing form-based Web applications using Semantic Web technologies.
Lecture at the advanced course on Data Science of the SIKS research school, May 20, 2016, Vught, The Netherlands.
Contents
-Why do we create Linked Open Data? Example questions from the Humanities and Social Sciences
-Introduction into Linked Open Data
-Lessons learned about the creation of Linked Open Data (link discovery, knowledge representation, evaluation).
-Accessing Linked Open Data
For everybody who gets tired of questions like “when is the Semantic Web actually going to happen”, or any other suggestion that the Semantic Web programme is “only vision, no progress”.
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
My Linked Data tutorial presentation that I presented at Semtech 2012.
http://semtechbizsf2012.semanticweb.com/sessionPop.cfm?confid=65&proposalid=4724
For everybody who gets tired of questions like “when is the Semantic Web actually going to happen”, or any other suggestion that the Semantic Web programme is “only vision, no progress”.
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
My Linked Data tutorial presentation that I presented at Semtech 2012.
http://semtechbizsf2012.semanticweb.com/sessionPop.cfm?confid=65&proposalid=4724
Linked data the next 5 years - From Hype to ActionAndreas Blumauer
How can we shape the future of Linked Data and the Semantic Web, to make it even more widely spread in enterprises and other organizations? Which developments around linked data technologies should we expect, and how can we implement various use cases successfully?
Presentation created for the CILIP Cataloguing Interest Group event on Linked Data, 25th November 2013 (http://www.cilip.org.uk/cataloguing-and-indexing-group/events/linked-data-what-cataloguers-need-know-cig-event)
Linked Open Data Principles, benefits of LOD for sustainable developmentMartin Kaltenböck
Presentation held on 18.09.2013 at the OKCon 2013 in Geneva, Switzerland in the course of the workshop: How Linked Open data supports Sustainable Development and Climate Change Development by Martin Kaltenböck (SWC), Florian Bauer (REEEP) and Jens Laustsen (GBPN).
This book explains the Linked Data domain by adopting a bottom-up approach: it introduces the fundamental Semantic Web technologies and building blocks, which are then combined into methodologies and end-to-end examples for publishing datasets as Linked Data, and use cases that harness scholarly information and sensor data. It presents how Linked Data is used for web-scale data integration, information management and search. Special emphasis is given to the publication of Linked Data from relational databases as well as from real-time sensor data streams. The authors also trace the transformation from the document-based World Wide Web into a Web of Data. Materializing the Web of Linked Data is addressed to researchers and professionals studying software technologies, tools and approaches that drive the Linked Data ecosystem, and the Web in general.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Incremental Export of Relational Database Contents into RDF GraphsNikolaos Konstantinou
In addition to tools offering RDF views over databases, a variety of tools exist that allow exporting database contents into RDF graphs; tools proven that in many cases demonstrate better performance than the former. However, in cases when database contents are exported into RDF, it is not always optimal or even necessary to dump the whole database contents every time. In this paper, the problem of incremental generation and storage of the resulting RDF graph is investigated. An implementation of the R2RML standard is used in order to express mappings that associate tuples from the source database to triples in the resulting RDF graph. Next, a methodology is proposed that enables incremental generation and storage of an RDF graph based on a source relational database, and it is evaluated through a set of performance measurements. Finally, a discussion is presented regarding the authors’ most important findings and conclusions.
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
As far as digital repositories are concerned, numerous benefits emerge from the disposal of their contents as Linked Open Data (LOD). This leads more and more repositories towards this direction. However, several factors need to be taken into account in doing so, among which is whether the transition needs to be materialized in real-time or in asynchronous time intervals. In this paper we provide the problem framework in the context of digital repositories, we discuss the benefits and drawbacks of both approaches and draw our conclusions after evaluating a set of performance measurements. Overall, we argue that in contexts with infrequent data updates, as is the case with digital repositories, persistent RDF views are more efficient than real-time SPARQL-to-SQL rewriting systems in terms of query response times, especially when expensive SQL queries are involved.
This chapter introduces the semantic modeling procedure, detailing its technical characteristics, possibilities and limitations. First, we present the languages that are used for semantic description. We present RDF, RDFS and OWL, describe their expressiveness in terms of describing Web Resources, and the abilities they provide in order to describe, query, administer and manage resources at a semantic layer. Next, we present the vocabularies that are used in order to provide common grounds in understanding and communicating ideas and concepts. The technologies, together with the vocabularies used, altogether comprise the modern landscape of Semantic Web/Linked Data applications and serve as the basis for maintaining, analyzing datasets and building applications on top of them.
In this Chapter, we summarize and discuss the material presented throughout this book. We recapitulate what is presented and discussed in each Chapter. We discuss the most interesting aspects of the Web of Data landscape, highlighting its main contributions, and then continue with a discussion, mentioning our most important observations, including domain-specific benefits in the LOD domain. We conclude the Chapter with a discussion of open research challenges in the Linked Data domain.
This chapter provides an overview of the methodologies and technologies that support Linked Data designing and publishing. More specifically, this chapter starts with a presentation of the rationale and a discussion about how data can be opened up (i.e. published under an open license). Basic principles are first introduced regarding the cases in which content can be opened up and also, the most common approaches are presented in accomplishing this. Next, we discuss about how data can be modeled, authored, serialized and stored. In this chapter we also provide an overview of the most common technical solutions and widely used software tools that can serve this purpose. Overall, the chapter aims to provide an analysis of the sub-problems into which the Linked Open Data publishing task is to be broken down, namely opening, modeling, linking, processing, and visualizing content, followed by a presentation of the most representative software solutions.
In this chapter, we introduce and discuss the problems that Linked Data solve and the concepts that are related to these problems. We introduce and analyze the basic concepts that are related to the generation of Linked Data and the Semantic Web in general. We provide a brief history of the Semantic Web and the associated evolution of concepts, problem frameworks and solution approaches, all targeted at offering efficient and intelligent solutions to information representation, management and exploitation. More specifically, we introduce the main reasons for the creation of the Semantic Web and the problems that it addresses. Next, we discuss the distinctions between basic terms such as data, information, knowledge, metadata, ontologies, semantic annotations etc. We introduce the notions of interoperability, integration, merging, mapping, and continue with introducing ontologies, reasoners, knowledge bases, all fundamental concepts in the Linked Data ecosystem.
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
The Semantic Web is about to grow up. By efforts such as the Linked Open Data initiative, we finally find ourselves at the edge of a Web of Data becoming reality. Standards such as OWL 2, RIF and SPARQL 1.1 shall allow us to reason with and ask complex structured queries on this data, but still they do not play together smoothly and robustly enough to cope with huge amounts of noisy Web data. In this talk, we discuss open challenges relating to querying and reasoning with Web data and raise the question: can the emerging Web of Data ever catch up with the now ubiquitous HTML Web?
Integrating Semantic Web with the Real World - A Journey between Two Cities ...Juan Sequeda
(The original version of this talk was a Keynote at KCAP2017. This is the final version of the slides after giving this talk 14 times in 2018)
An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with the Semantic Web using the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs.
However, where is the mainstream industry adoption? What are the barriers to adoption? Are these engineering and social barriers or are they open scientific problems that need to be addressed?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, and scientific and engineering challenges that require attention.
Integrating Semantic Web in the Real World: A Journey between Two Cities Juan Sequeda
Keynote at The 9th International Conference on Knowledge Capture (KCAP2017), Austin, Texas, Dec 2017
An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with the Semantic Web using the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs.
However, where is the mainstream industry adoption? What are the barriers to adoption? Are these engineering and social barriers or are they open scientific problems that need to be addressed?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, and scientific challenges that require attention.
Integrating Relational Databases with the Semantic Web: A ReflectionJuan Sequeda
This is a lecture given at the 2017 Reasoning Web Summer School
It has been clear from the beginning that the success of the Semantic Web hinges on integrating the vast amount of data stored in Relational Databases. In 2007, the W3C organized a workshop on RDF Access to Relational Databases. In 2012, two standards were ratified that map relational data to RDF: Direct Mapping and R2RML.
In this lecture, I will reflect on the last 10 years of research results and systems to integrate Relational Databases with the Semantic web. I will provide an answer to the following question: how and to what extent can Relational Databases be integrated with the Semantic Web? I will review how these standards and systems are being used in practice for data integration and discuss open challenges.
Graph Query Languages: update from LDBCJuan Sequeda
The Linked Data Benchmark Council (LDBC) is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. The Graph Query Language task force of LDBC is studying query languages for graph data management systems, and specifically those systems storing so-called Property Graph data. The goals of the GraphQL task force are to:
Devise a list of desired features and functionalities of a graph query language.
Evaluate a number of existing languages (i.e. Cypher, Gremlin, PGQL, SPARQL, SQL), and identify possible issues.
Provide a better understanding of the design space and state-of-the-art.
Develop proposals for changes to existing query languages or even a new graph query language.
This query language should cover the needs of the most important use-cases for such systems, such as social network and Business Intelligence workloads.
This talk will present an update of the work accomplished by the LDBC GraphQL task force. We also look for input from the graph community.
Virtualizing Relational Databases as Graphs: a multi-model approachJuan Sequeda
Talk given at Smart Data 2017
Relational Databases are inflexible due to the rigid constraints of the relational data model. If you have new data that doesn’t fit your schema, you will need to alter your schema (add a column or a new table). This is a task that is not always possible. IT departments don't have time, or they won't allow it - just more nulls that can lead to query performance degradation, etc.
A goal of graph databases is to address this problem with their schema-less graph data model. However, many businesses have large investments in commercial RDBMSs and their associated applications and can't expect to move all of their data to a graph database.
In this talk, I will present a multi-model graph/relational architecture solution. Keep your relational data where it is, virtualize it as a graph, and then connect it with additional data stored in a graph database. This way, both graph and relational technologies can seamlessly interact together.
Presentation at Data/Graph Day Texas Conference.
Austin, Texas
January 14, 2017
This talk grew out Juan Sequeda's office hours following the Seattle Graph Meetup. Some of the questions posed were: How do I recognize problem best solved with a graph solution? How do I determine the best type of graph to solve the problem? How do I manage the data where both graph and relational operations will be performed? Juan did such a great job of explaining the options, we asked him to develop his responses into a formal talk.
Consuming Linked Data by Humans - WWW2010Juan Sequeda
These are the Consuming Linked Data by Humans slides that we presented at the Consuming Linked Data tutorial at WWW2010 in Raleigh, NC on April 26, 2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
These are the Consuming Linked Data by Machines slides that we presented at the Consuming Linked Data tutorial at WWW2010 in Raleigh, NC on April 26, 2010. These slides are originally by Patrick Sinclair from BBC
These are the Linked Data Applications slides that we presented at the Consuming Linked Data tutorial at WWW2010 in Raleigh, NC on April 26, 2010.
This slide set was not part of our tutorial that was presented at ISWC2009
Open Research Problems in Linked Data - WWW2010Juan Sequeda
These are the Open Research Problems of Linked Data slides that we presented at the Consuming Linked Data tutorial at WWW2010 in Raleigh, NC on April 26, 2010
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Consuming Linked Data SemTech2010
1. Consuming Linked Data Juan F. Sequeda Department of Computer Science University of Texas at Austin SemTech 2010
2. How many people are familiar with RDF SPARQL Linked Data Web Architecture (HTTP, etc)
3. History Linked Data Design Issues by TimBL July 2006 Linked Open Data Project WWW2007 First LOD Cloud May 2007 1st Linked Data on the Web Workshop WWW2008 1stTriplification Challenge 2008 How to Publish Linked Data Tutorial ISWC2008 BBC publishes Linked Data 2008 2nd Linked Data on the Web Workshop WWW2009 NY Times announcement SemTech2009 - ISWC09 1st Linked Data-a-thon ISWC2009 1st How to Consume Linked Data Tutorial ISWC2009 Data.gov.uk publishes Linked Data 2010 2st How to Consume Linked Data Tutorial WWW2010 1st International Workshop on Consuming Linked Data COLD2010 …
17. The Modigliani Test Show me all the locations of all the original paintings of Modigliani Daniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpedia Thanks Richard MacManus - ReadWriteWeb
18.
19. Results of the Modigliani Test AtanasKiryakov from Ontotext Used LDSR – Linked Data Semantic Repository Dbpedia Freebase Geonames UMBEL Wordnet Published April 26, 2010: http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php
34. So what is the problem? We aren’t always interested in documents We are interested in THINGS These THINGS might be in documents We can read a HTML document rendered in a browser and find what we are searching for This is hard for computers. Computers have to guess (even though they are pretty good at it)
35. What do we need to do? Make it easy for computers/software to find THINGS
36. How can we do that? Besides publishing documents on the web which computers can’t understand easily Let’s publish something that computers can understand
49. Resource Description Framework (RDF) A data model A way to model data i.e. Relational databases use relational data model RDF is a triple data model Labeled Graph Subject, Predicate, Object <Juan> <was born in> <California> <California> <is part of> <the USA> <Juan> <likes> <the Semantic Web>
50. RDF can be serialized in different ways RDF/XML RDFa (RDF in HTML) N3 Turtle JSON
51. So does that mean that I have to publish my data in RDF now?
55. Databases back up documents THINGS have PROPERTIES: A Book as a Title, an author, … This is a THING: A book title “Programming the Semantic Web” by Toby Segaran, …
56. Lets represent the data in RDF Programming the Semantic Web title author book Toby Segaran isbn 978-0-596-15381-6 publisher name Publisher O’Reilly
57. Remember that we are on the web Everything on the web is identified by a URI
58. And now let’s link the data to other data Programming the Semantic Web title author http://…/isbn978 Toby Segaran isbn 978-0-596-15381-6 publisher name http://…/publisher1 O’Reilly
59. And now consider the data from Revyu.com hasReview http://…/review1 http://…/isbn978 description reviewer Awesome Book http://…/reviewer name Juan Sequeda
60. Let’s start to link data hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web title description sameAs hasReviewer Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher name http://…/publisher1 O’Reilly
61. Juan Sequeda publishes data too http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda
62. Let’s link more data hasReview http://…/review1 http://…/isbn978 description hasReviewer Awesome Book http://…/reviewer name Juan Sequeda sameAs http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda
63. And more hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web title description sameAs hasReviewer Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher sameAs http://…/publisher1 name O’Reilly http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda
64. Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA
65. Linked Data Principles Use URIs as names for things Use HTTP URIs so that people can look up (dereference) those names. When someone looks up a URI, provide useful information. Include links to other URIs so that they can discover more things.
67. I can query a database with SQL. Is there a way to query Linked Data with a query language?
68. Yes! There is actually a standardize language for that SPARQL
69. FIND all the reviews on the book “Programming the Semantic Web” by people who live in Austin
70. hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web title description sameAs hasReviewer Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher sameAs name http://…/publisher1 O’Reilly http://juansequeda.com http://dbpedia.org/Austin livesIn name Juan Sequeda
71. This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
72. What was your incentive to publish an HTML page in 1990?
73. 1) Share data in documents2) Because you neighbor was doing it
79. Publishing Linked Data Legacy Data in Relational Databases D2R Server Virtuoso Triplify Ultrawrap CMS Drupal 7 Native RDF Stores Databases for RDF (Triple Stores) AllegroGraph, Jena, Sesame, Virtuoso Talis Platform (Linked Data in the Cloud) In HTML with RDFa
87. Google and Yahoo are starting to crawl RDFa! The Semantic Web is a reality!
88. The Reality Yahoo is crawling data that is in RDFa and Microformats under a specific vocabularies FOAF GoodRelations … Google is crawling RDFa and Microformats that use the Google vocabulary
90. Linked Data Browsers Not actually separate browsers. Run inside of HTML browsers View the data that is returned after looking up a URI in tabular form (IMO) UI lacks usability
103. SPARQL Endpoints Linked Data sources usually provide a SPARQL endpoint for their dataset(s) SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol* Send your SPARQL query, receive the result * http://www.w3.org/TR/rdf-sparql-protocol/
104. Where can I find SPARQL Endpoints? Dbpedia: http://dbpedia.org/sparql Musicbrainz: http://dbtune.org/musicbrainz/sparql U.S. Census: http://www.rdfabout.com/sparql Semantic Crunchbase: http://cb.semsol.org/sparql http://esw.w3.org/topic/SparqlEndpoints
105. Accessing a SPARQL Endpoint SPARQL endpoints: RESTful Web services Issuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1 URL-encoded string with the SPARQL query
106. Query Results Formats SPARQL endpoints usually support different result formats: XML, JSON, plain text (for ASK and SELECT queries) RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
110. Query Result Formats Use the ACCEPT header to request the preferred result format: GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1 Accept: application/sparql-results+json
111. Query Result Formats As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out GET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1
112. Accessing a SPARQL Endpoint More convenient: use a library SPARQL JavaScript Library http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html ARC for PHP http://arc.semsol.org/ RAP – RDF API for PHP http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
114. Accessing a SPARQL Endpoint Example with Jena/ARQ import com.hp.hpl.jena.query.*; String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecutione = QueryExecutionFactory.sparqlService(service, query) ResultSet results = e.execSelect(); while ( results.hasNext() ) { QuerySolutions = results.nextSolution(); // ... } e.close();
115. Querying a single dataset is quite boring compared to: Issuing SPARQL queries over multiple datasets How can you do this? Issue follow-up queries to different endpoints Querying a central collection of datasets Build store with copies of relevant datasets Use query federation system
116. Follow-up Queries Idea: issue follow-up queries over other datasets based on results from previous queries Substituting placeholders in query templates
117. String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql"; String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }"; String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) { QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() ); QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close(); } e1.close(); Find a list of companies Filtered by some criteria and return DbpediaURIs from them
118. Follow-up Queries Advantage Queried data is up-to-date Drawbacks Requires the existence of a SPARQL endpoint for each dataset Requires program logic Very inefficient
119. Querying a Collection of Datasets Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets Example: SPARQL endpoint over a majority of datasets from the LOD cloud at: http://uberblic.org http://lod.openlinksw.com/sparql
120. Querying a Collection of Datasets Advantage: No need for specific program logic Drawbacks: Queried data might be out of date Not all relevant datasets in the collection
121. Own Store of Dataset Copies Idea: Build your own store with copies of relevant datasets and query it Possible stores: Jena TDB http://jena.hpl.hp.com/wiki/TDB Sesame http://www.openrdf.org/ OpenLink Virtuoso http://virtuoso.openlinksw.com/ 4store http://4store.org/ AllegroGraphhttp://www.franz.com/agraph/ etc.
122. Populating Your Store Get RDF dumps provided for the datasets (Focused) Crawling ldspiderhttp://code.google.com/p/ldspider/ Multithreaded API for focussed crawling Crawling strategies (breath-first, load-balancing) Flexible configuration with callbacks and hooks
123. Own Store of Dataset Copies Advantages: No need for specific program logic Can include all datasets Independent of the existence, availability, and efficiency of SPARQL endpoints Drawbacks: Requires effort to set up and to operate the store Ideally, data sources provide RDF dumps; if not? How to keep the copies in sync with the originals? Queried data might be out of date
124. Federated Query Processing Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
125. Federated Query Processing Instance-based federation Each thing described by only one data source Untypical for the Web of Data Triple-based federation No restrictions Requires more distributed joins Statistics about datasets required (both cases)
126. Federated Query Processing DARQ (Distributed ARQ) http://darq.sourceforge.net/ Query engine for federated SPARQL queries Extension of ARQ (query engine for Jena) Last update: June 28, 2006 Semantic Web Integrator and Query Engine(SemWIQ) http://semwiq.sourceforge.net/ Actively maintained
127. Federated Query Processing Advantages: No need for specific program logic Queried data is up to date Drawbacks: Requires the existence of a SPARQL endpoint for each dataset Requires effort to set up and configure the mediator
128. In any case: You have to know the relevant data sources When developing the app using follow-up queries When selecting an existing SPARQL endpoint over a collection of dataset copies When setting up your own store with a collection of dataset copies When configuring your query federation system You restrict yourself to the selected sources
129. In any case: You have to know the relevant data sources When developing the app using follow-up queries When selecting an existing SPARQL endpoint over a collection of dataset copies When setting up your own store with a collection of dataset copies When configuring your query federation system You restrict yourself to the selected sources There is an alternative: Remember, URIs link to data
130. Automated Link Traversal Idea: Discover further data by looking up relevant URIs in your application Can be combined with the previous approaches
131. Link Traversal Based Query Execution Applies the idea of automated link traversal to the execution of SPARQL queries Idea: Intertwine query evaluation with traversal of RDF links Discover data that might contribute to query results during query execution Alternately: Evaluate parts of the query Look up URIs in intermediate solutions
142. Link Traversal Based Query Execution Advantages: No need to know all data sources in advance No need for specific programming logic Queried data is up to date Does not depend on the existence of SPARQL endpoints provided by the data sources Drawbacks: Not as fast as a centralized collection of copies Unsuitable for some queries Results might be incomplete (do we care?)
143. Implementations Semantic Web Client library (SWClLib) for Java http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ SWIC for Prolog http://moustaki.org/swic/
144. Implementations SQUIN http://squin.org Provides SWClLib functionality as a Web service Accessible like a SPARQL endpoint Install package: unzip and start Less than 5 mins! Convenient access with SQUIN PHP tools: $s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()
147. What is a Linked Data application Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets
148.
149. Discover further information by following the links between different data sources: the fourth principle enables this.
150. Combine the consumed linked data with data from sources (not necessarily Linked Data)
151. Expose the combined data back to the web following the Linked Data principles
152.
153. Hot Research Topics Interlinking Algorithms Provenance and Trust Dataset Dynamics UI Distributed Query Evaluation “You want a good thesis? IR is based on precision and recall. The minute you add semantics, it is a meaningless feature. Logic is based on soundness and completeness. We don’t want soundness and completeness. We want a few good answers quickly.” – Jim Hendler at WWW2009 during the LOD gathering Thanks Michael Hausenblas
154. THANKS Juan Sequeda www.juansequeda.com @juansequeda #cold www.consuminglinkeddata.org Acknowledgements: Olaf Hartig, Patrick Sinclair, Jamie Taylor Slides for Consuming Linked Data with SPARQL by Olaf Hartig