The document discusses Semantic Web technologies including RDF, SPARQL and ontologies. It provides:
1) An introduction to the Semantic Web vision of machines being able to understand and respond to complex requests based on meaning. This requires information to be semantically structured.
2) A brief overview of key concepts in RDF including triples, nodes, blank nodes, and predefined RDF structures like bags and lists.
3) An explanation of the SPARQL query language, which is similar to SQL but interrogates the Semantic Web. SPARQL clauses like SELECT, CONSTRUCT, DESCRIBE and ASK are covered.
4) A discussion of ontological representations including R
Two graph data models : RDF and Property Graphsandyseaborne
Talk given at ApacheConEU Big Data 2015.
This talk describes the two common graph data approaches, RDF and Property Graphs. It concludes with observations about the different emphasis of each and where each is focused.
VALA Tech Camp 2017: Intro to Wikidata & SPARQLJane Frazier
A hands-on introduction to interrogation of Wikidata content using SPARQL, the query language used to query data represented in RDF, SKOS, OWL, and other Semantic Web standards.
Presented by myself and Peter Neish, Research Data Specialist @ University of Melbourne.
Two graph data models : RDF and Property Graphsandyseaborne
Talk given at ApacheConEU Big Data 2015.
This talk describes the two common graph data approaches, RDF and Property Graphs. It concludes with observations about the different emphasis of each and where each is focused.
VALA Tech Camp 2017: Intro to Wikidata & SPARQLJane Frazier
A hands-on introduction to interrogation of Wikidata content using SPARQL, the query language used to query data represented in RDF, SKOS, OWL, and other Semantic Web standards.
Presented by myself and Peter Neish, Research Data Specialist @ University of Melbourne.
Although RDF can be considered the corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who want to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will briefly introduce both technologies using some examples and compare them. We will also present some challenges and applications related with RDF data shapes.
Talk given at: KTH Royal Institute of Technology, School of Industrial Engineering and Management, Mechatronics Division, 7th February, 2020
Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.
However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.
The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.
Presented at the Lotico Berlin Semantic Web Meetup.
Although RDF is a corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who need to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will review the history and motivation of both technologies. We will also and enumerate some challenges and future work with regards to RDF validation.
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources.
http://rml.io
https://github.com/mmlab/RMLProcessor
Although RDF can be considered the corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who want to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will briefly introduce both technologies using some examples and compare them. We will also present some challenges and applications related with RDF data shapes.
Talk given at: KTH Royal Institute of Technology, School of Industrial Engineering and Management, Mechatronics Division, 7th February, 2020
Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.
However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.
The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.
Presented at the Lotico Berlin Semantic Web Meetup.
Although RDF is a corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who need to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will review the history and motivation of both technologies. We will also and enumerate some challenges and future work with regards to RDF validation.
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources.
http://rml.io
https://github.com/mmlab/RMLProcessor
HANDICAP, CIVISME ET STATIONNEMENT
En tant qu’étudiant en ergothérapie ; la question traitant de la sensibilisation du public au handicap et au respect des droits des personnes handicapées me tient beaucoup à coeur. C’est pourquoi je voudrais pousser un coup de gueule contre les automobilistes qui persistent à se garer sur les places réservées aux personnes à mobilité réduite. D’où vient ce besoin de se garer toujours au plus près ? Flemme de marcher un peu ; emploi du temps serré ou encore égoïsme ? Je n’en sais rien mais pour vérier mes propos il sut d’observer les abords d’une école à l’heure de la sortie des classes ou alors l’entrée d’un supermarché ; si les portes étaient assez larges je suspecterais certains d’essayer de se garer dans les rayons.
This presentation was given at the Balisage 2017 conference, and provides an overview of three key RDF standards for constraint modeling, annotation and the use of data frames and cubes in RDF.
This is a lecture note #10 for my class of Graduate School of Yonsei University, Korea.
It describes SPARQL to retrieve and manipulate data stored in Resource Description Framework format
Sesam4 project presentation sparql - april 2011Robert Engels
This slide set is a provided by the SESAM4 consortium as one out of three Technology Primers on Semantic Web technology. This Primer is on SPARQL and gives you a short introduction to its constructs followed by some examples. You can find the belonging slideset at youtube,
Sesam4 project presentation sparql - april 2011sesam4able
This slide set is a provided by the SESAM4 consortium as one out of three Technology Primers on Semantic Web technology. This Primer is on SPARQL and gives you a short introduction to its constructs followed by some examples. You can find the belonging slideset at youtube under SESAM4.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
As of Drupal 7 we'll have RDFa markup in core, in this session I will:
-explain what the implications are of this and why this matters
-give a short introduction to the Semantic web, RDF, RDFa and SPARQL in human language
-give a short overview of the RDF modules that are available in contrib
-talk about some of the potential use cases of all these magical technologies
Drupal 7 will use RDFa markup in core, in this session I will:
-explain what the implications are of this and why this matters
-give a short introduction to the Semantic web, RDF, RDFa and SPARQL in human language
-give a short overview of the RDF modules that are available in contrib
-talk about some of the potential use cases of all these magical technologies
This is a talk from the Drupal track at Fosdem 2010.
The Semantic Web is about to grow up. By efforts such as the Linked Open Data initiative, we finally find ourselves at the edge of a Web of Data becoming reality. Standards such as OWL 2, RIF and SPARQL 1.1 shall allow us to reason with and ask complex structured queries on this data, but still they do not play together smoothly and robustly enough to cope with huge amounts of noisy Web data. In this talk, we discuss open challenges relating to querying and reasoning with Web data and raise the question: can the emerging Web of Data ever catch up with the now ubiquitous HTML Web?
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
Paco Nathan, Director of Community Evangelism at Databricks
Apache Spark is intended as a fast and powerful general purpose engine for processing Hadoop data. Spark supports combinations of batch processing, streaming, SQL, ML, Graph, etc., for applications written in Scala, Java, Python, Clojure, and R, among others. In this talk, I'll explore how Spark fits into the Big Data landscape. In addition, I'll describe other systems with which Spark pairs nicely, and will also explain why Spark is needed for the work ahead.
In this webinar Thomas Cook, Sales Director, AnzoGraph DB, provides a history lesson on the origins of SPARQL, including its roots in the Semantic Web, and how linked open data is used to create Knowledge Graphs. Then, he dives into "What is RDF?", "What is a URI?" and "What is SPARQL?", wrapping up with a real-world demonstration via a Zeppelin notebook.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. Introduction
The Semantic Web, as originally envisioned, is a system that enables
machines to "understand" and respond to complex human requests
based on their meaning. Such an "understanding" requires that the
relevant information sources be semantically structured.
Tim Berners-Lee originally expressed the vision of the Semantic Web as
follows:
I have a dream for the Web [in which computers] become capable of analysing all the
data on the Web – the content, links, and transactions between people and
computers. A "Semantic Web", which makes this possible, has yet to emerge, but
when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives
will be handled by machines talking to machines.
3. Some other thoughts…
Are you a member of
the SPARQL cult
Alex Karp CEO Palantir
Its about graphs not
trees
Pascal Hitzler
Professor and Director of Data Science at the
Department of Computer Science and Engineering at
Wright State University in Dayton, Ohio
In the six degrees of
separation, not all degrees
are equal.
Malcolm Gladwell, The Tipping
Point: How Little Things Can Make a
Big Difference
4. The Talk Structure
Triples and RDF
◦ What is the Big Deal
Resource Description Framework (RDF)
◦ A formal way to represent information
◦ Some of Nomenclature
◦ Some data structures
SPARQL
◦ A language not dissimilar to SQL but can interrogate the Semantic Web
Ontological Representations
◦ A formal way to describe structure
Analytics
◦ Some SNA stuff
The Triple Store Implementations
Some of my stuff – Dealing with Massive RDF Lists
7. The Triple
Subject Predicate Object
Resource or Blank Resource Resource, Literal or Blank
Triple
Note: The URL can identify data in the cloud
or on premise.
Note: No need for NULLs
8. RDF
• RDF - Resource Description Framework and it is a flexible schema-
less data model
• Standards based – W3C
• RDF Syntax
o N3/Turtle
o XML
• Predefined RDF Structures
o Bag
o Seq
o Alt
o List
These are the only data
structures
10. RDF Property Constants
Note: A Predicate is also referred to as property used when the
object is a Literal
Property Description Usage
rdf:first First Element in a list rdf:Property
rdf:rest Rest of the List rdf:Property
rdf:_i List Sequence rdf:Property
rdf:nil End of the List rdf:Resource
Note: If a Predicate is a number the number value is preceded
by an underscore _
13. SPARQL
SPARQL(pronounced "sparkle", a recursive
acronym[2] for SPARQL Protocol and RDF Query Language) is an RDF
query language, that is, a semantic query language for databases, able
to retrieve and manipulate data stored in Resource Description
Framework (RDF) format. It was made a standard by the RDF Data
Access Working Group (DAWG) of the World Wide Web Consortium, and
is recognized as one of the key technologies of the semantic web. On 15
January 2008, SPARQL 1.0 became an official W3C
Recommendation, and SPARQL 1.1 in March, 2013.
Note: Source Wikipedia
14. The Language Structure
PREFIX abc: <nul://sparql/exampleOntology#> .
SELECT ?capital ?country WHERE {
?x abc:cityname ?capital ;
<nul://sparql/exampleOntology#isCapitalOf> ?y.
?y abc:countryname ?country ;
abc:isInContinent abc:Africa.
}
CURI
Resource
Note: It is a bit like an SQL Select Statement
Fully
Qualified
Triple
Variable
16. Describe
DESCRIBE ?x WHERE {
?x a txn:Occurrence.
?x dcterms:date "2010-09-29".
}
LIMIT 10
txn:Occurrence
"2010-09-29".
Note: Describe always returns RDF !
Node to
describe
17. Ask
ASK {
?x a txn:Occurrence.
?x dcterms:date "2010-09-29".
}
Yes or No
18. Select
SELECT ?person ?name ?email WHERE {
?person foaf:email ?email.
?person foaf:name ?name.
?person foaf:skill "internet".
}
LIMIT 50
Note: Results are returned in a tabular format
19. Results from the Select
person name email
<http://www.w3.org/People/karl/karl-foaf.xrdf#me> "Karl Dubost" <mailto:karl@w3.org>
<http://www.w3.org/People/card#amy> "Amy van der Hiel" <mailto:amy@w3.org>
<http://www.w3.org/People/card#edd> "Edd Dumbill" <mailto:edd@xmlhack.com>
<http://www.w3.org/People/card#dj> "Dean Jackson" <mailto:dean@w3.org>
<http://www.w3.org/People/card#edd> "Edd Dumbill" <mailto:edd@usefulinc.com>
<http://www.aaronsw.com/about.xrdf#aaronsw> "Aaron Swartz" <mailto:me@aaronsw.com>
<http://www.w3.org/People/card#i> "Timothy Berners-Lee" <mailto:timbl@w3.org>
<http://www.w3.org/People/EM/contact#me> "Eric Miller" <mailto:em@w3.org>
<http://www.w3.org/People/card#edd> "Edd Dumbill" <mailto:edd@xml.com>
<http://www.w3.org/People/card#dj> "Dean Jackson" <mailto:dino@grorg.org>
<http://www.w3.org/People/card#libby> "Libby Miller" <mailto:libby.miller@bristol.ac.uk>
<http://www.w3.org/People/Connolly/#me> "Dan Connolly" <mailto:connolly@w3.org>
21. The Optional Clause Result
foaf:based_near
foaf:name
foaf:based_near
foaf:img
foaf:homepage
mo:MusicArtist
foaf:name
mo:MusicArtist
foaf:based_near
foaf:name
foaf:homepage
foaf:img
foaf:img
22. Select - Filter
PREFIX prop: <http://dbpedia.org/property/> .
ASK WHERE {
<http://dbpedia.org/resource/Amazon_River> prop:length ?amazon .
<http://dbpedia.org/resource/Nile> prop:length ?nile .
FILTER(?amazon > ?nile) .
}
Note: Filters are applied after the results are selected
Note: Filters can appear anywhere within the SELECT statement
23. Alternatives - Union
PREFIX go: <http://purl.org/obo/owl/GO#> .
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
PREFIX odo: <http://www.obofoundry.org/ro/ro.owl#> .
SELECT DISTINCT ?label ?process
COUNT(*) AS ?count
WHERE {
{ ?process obo:part_of go:GO_0007165 }
UNION
{ ?process rdfs:subClassOf go:GO_0007165 }
?process rdfs:label ?label
}
GROUP BY ?label ORDER BY DESC(COUNT(*))
GROUP BY ?interest ORDER BY DESC(COUNT(*))
COUNT(*) AS ?count
24. SPARQL - Update
PREFIX prop: <http://dbpedia.org/property/> .
PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT DATA {
<http://example/book1> dc:title "A new book" ;
dc:creator "A.N.Other" . }
PREFIX dc: <http://purl.org/dc/elements/1.1/>
DELETE DATA {
<http://example/book2> dc:title "David Copperfield" ;
dc:creator "Edmund Wells" . }
Note: You can only insert and delete triplets
25. Quads – The Graph
CONSTRUCT {
GRAPH :g { ?s :p ?o }
{ ?s :p ?o }
<http://purl.org/obo/owl/GO#> { :s ?p :o }
WHERE {
.
.
.
}
Note: Can be seen as a schema in a relational database
Default Graph
URI
27. Property Paths
Syntax Form Matches
uri
A URI or a prefixed name. A path
of length one.
^elt Inverse path (object to subject).
(elt)
A group path elt, brackets control
precedence.
elt1 / elt2
A sequence path of elt1, followed
by elt2
elt1 ^ elt2
Shorthand for elt1 / ^elt2, that
is elt1 followed by the inverse
of elt2.
elt1 | elt2
A alternative path of elt1,
or elt2 (all possibilities are tried).
elt*
A path of zero or more
occurrences of elt.
elt+
A path of one or more occurrences
of elt.
elt? A path of zero or one elt.
elt{n,m}
A path between n and m
occurrences of elt.
elt{n}
Exactly n occurrences of elt. A
fixed length path.
elt{n,} n or more occurrences of elt.
elt{,n}
Between 0 and n occurrences
of elt.
SELECT ?value WHERE {
:list rdf:rest* []
[] rdf:first ?value
}
Note: Note the use [] this tells
the SPARQL parser that the triples
share a common resource.
32. RDFS
RDF Schema (or RDFS) defines classes and properties.
The resources in the RDFS vocabulary have URIrefs beginning with
http://www.w3.org/2000/01/rdf-schema#
ex:Vehicle rdf:type rdfs:Class.
ex:Car rdfs:subClassOf ex:Vehicle .
ex:Van rdfs:subClassOf ex:Vehicle .
ex:Truck rdfs:subClassOf ex:Vehicle .
ex:MiniVan rdfs:subClassOf ex:Van .
ex:MiniVan rdfs:subClassOf ex:Car .
Class
Vehicle
Van
MiniVan
Truck Car
MiniVan
33. OWL
The Ontology Web Language (OWL) is a family of knowledge
representation languages for authoring ontologies. Ontologies are a formal
way to describe taxonomies and classification networks, essentially defining
the structure of knowledge for various domains: the nouns representing
classes of objects and the verbs representing relations between the objects.
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class>
<owl:oneOf rdf:parseType="Collection">
<owl:Thing rdf:about="#Tosca" />
<owl:Thing rdf:about="#Salome" />
</owl:oneOf>
</owl:Class>
<owl:Class>
<owl:oneOf rdf:parseType="Collection">
<owl:Thing rdf:about="#Turandot" />
<owl:Thing rdf:about="#Tosca" />
</owl:oneOf>
</owl:Class>
</owl:intersectionOf>
</owl:Class>
OWL is intended to be used when the information
contained in documents needs to be processed by
applications, as opposed to situations where the
content only needs to be presented to humans. OWL
can be used to explicitly represent the meaning of
terms in vocabularies and the relationships between
those terms. OWL has more facilities for expressing
meaning and semantics than XML, RDF, and RDFS, and
thus OWL goes beyond these languages in its ability to
represent machine interpretable content on the Web.
35. Ontologies
The Friend Of A Friend (FOAF) ontology
Project homepage: http://www.foaf-project.org/
Namespace: http://xmlns.com/foaf/0.1/
Typical prefix: foaf:
Documentation: http://xmlns.com/foaf/spec/
The Dublin Core (DC) ontology
Project homepage: http://dublincore.org/
Namespace: http://purl.org/dc/elements/1.1/ and http://purl.org/dc/terms/
Typical prefix: dc: and dcterm:
Documentation: http://dublincore.org/specifications/
Description: this is a light weight RDFS vocabulary for describing generic
metadata.
38. Aduna Sesame
Aduna Sesame
◦ Multiple back-end relational database support
◦ MYSQL
◦ PostgresDB
◦ Oracle (Provided by Oracle)
◦ Various Third Party Implementations
◦ Limited support for the property path expression
◦ Not great for massive graph retrievals
◦ Comes with a complient REST interface
◦ Excellent Management Console
39. Sesame Back-ends
Ontotext GraphDB™
Ontotext GraphDB™ (formerly OWLIM) is a leading RDF Triplestore built on OWL (Ontology Web
Language) standards, and fully compatible with the Sesame APIs. GraphDB handles massive loads,
queries and OWL inferencing in real time. Ontotext offers three versions: GraphDB™ Lite, GraphDB™
Standard and GraphDB™ Enterprise.
CumulusRDF
CumulusRDF is an RDF store on a cloud-based architecture, fully compatible with the
Sesame APIs. CumulusRDF provides a REST-based API with CRUD operations to manage
RDF data. The current version uses Apache Cassandra as storage backend.
Systap Blazegraph™
Blazegraph™ (formerly known as Bigdata) is an enterprise graph database by Systap, LLC
that provides a horizontally scaling, fully Sesame-compatible, storage and retrieval
solution for very large volumes of RDF.
40. Apache Jena
Apache Jena
◦ Multiple back-end relational database support
◦ Does support property paths
◦ OK on large graph retrievals
ARQ (SPARQL)
Query your RDF data using ARQ, a SPARQL 1.1compliant engine. ARQ supports remote federated
queries and free text search.
Fuseki
Expose your triples as a SPARQL end-point accessible over HTTP. Fuseki provides REST-style interaction
with your RDF data.
Inference API
Reason over your data to expand and check the content of your triple store. Configure your own
inference rules or use the built-in OWL and RDFS reasoners.
Note: Originally developed by Hewlett Packard
42. Oracle
Has 2 implementations
◦ Has two implementations
◦ A relation back-end (part of the Spatial Pack)
Not so Good (not good for large Graphs)
◦ Built on Oracle Big Data/NoSQL technology
◦ Utilises Apache Jena
◦ Get Neil’s thumbs up
Oracle has nearly two decades of experience working with spatial and graph
database technologies. We have combined this with cutting edge research
from Oracle Labs to deliver advanced analytics for the NoSQL and Hadoop
platform.
Oracle Big Data Spatial and Graph- Q&A with James Steiner,
VP of product management
Melli Annamalai, PhD
43. Bench Mark – Load Times
0
10
20
30
40
50
60
70
80
20
40
60
80
100
120
140
160
180
200
220
240
(Systap) Bigdata
(Sesame) Postgres
(Oracle) Spatial
(Sesame) File
Time in Minutes
44. Bench Mark – Retrieval Times
0
5
10
15
20
25
30
35
40
45
20
40
60
80
100
120
140
160
180
200
220
240
(Systap) Bigdata
(Sesame) Postgres
(Oracle) Spatial
(Sesame) File
TriplesRetrieved–1000s
Time in Minutes
46. Bongo’s Goals
Focus on Tabular Data
◦ Develop an efficient RDF list structure
◦ Creation and Extraction
◦ Integrates with other RDF Implementations
◦ Property Path expression support
◦ Nice Thin and Thick GUIs with an accompanying Command Line Interface
Subject Predicate Object
Triple
Key – only for literals values not resources
47. Retrieval Patterns
Subject Object Predicate Description
○ Any Any Retrieve all the triplets
for a given subject.
○ ○ Any Retrieve all the triplets for a given
object.
Any ○ Any Retrieve all the triplets for a given
predicate
Any Any ○ Retrieve all triplets for a given
object
Any ○ ○ Retrieve all triplets for a specific
object and predicate combination
○ Any ○ Return all the triplets for a given
subject and object
Any Any Any Return all the triplets contained
with in a graph
○ ○ ○ Determine if a specific triple
pattern exists within a graph
52. Conclusion
This talk covered:
RDF
RDF Structures
The SPARQL language
SPARQL Analytical Tools
Property Paths
R Integration
Ontology and Ontological Support
Triple Store Implementations
My Stuff
Bongo
Snail
Note: Read Foundations of Semantic Web Technologies
Pacscal Hitzler, Markus Krotzsch and Sebastian Rudolph
RDF defines a reification vocabulary which provides for describing RDF statements without stating them. These descriptions of statements can be queried by using the defined vocabulary.