The Semantic Web and Linked Open Data: An Introduction

The Semantic Web and Linked
Open Data

Pete DeVries
TaxonConcept.org
http://www.taxonconcept.org/
Department of Entomology
University of Wisconsin - Madison

What is the Semantic Web and how
does it Work?
Lets Look at the Traditional Way
Taxon Table

Location Table

This data structure is really only interpretable within the context of this speciﬁc database

Data Islands

The result are database islands that contain a lot of redundant data which is independently curated.

Each effort beneﬁts little from the other efforts.

Data Sets often Overlap

Text

What they don’t have is a common set of ﬁeld names or ID’s

Each Data set has is own “Vocabulary”

Different Fields
Different Names for the Same Fields
Same Names for Different Fields
Different ways of Interpreting those Fields

These nuances in meaning are often only understood by the
designers of each individual data set.

Consider how differently people interpret the meaning of
what seem to be the same terms

Where the Semantic Web Helps
Tim Berners-Lee’s 4 Rules

1. Use URIs* as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs. so that they can discover more things.

*URI = Uniform Resource Identiﬁer
http://www.w3.org/DesignIssues/LinkedData.html

Use URIs as Names for Things?

Instead of “Door County” use
http://sws.geonames.org/5250768/

For Humans this URI Dereferences to a
Human Interpretable Web Page

Text
Text

For Machines this Dereferences a
Machine Interpretable File

As N-Triples

Why Would Anyone Think this Made Sense?

Now, each of these different databases are using an ID with a shared meaning.

A meaning that can be determined by dereferencing the URI.

All the data sets that use this vocabulary are now connectable.

All the data sets that are linked to this URI are now also linked to each other.

Life Sciences Example

Example: Two databases with county records

One uses “La Crosse County,” the other lists “La Crosse” for La
Crosse County, Wisconsin

You want to link and merge those records so that it is clear that you
mean a particular species was observed in a particular county

Normalize the Meaning between Data Sources
Use this shared vocabulary to integrate these two data sources

Use that shared vocabulary to ﬁnd and link to other relevant data

As More Data Sets Adopt these Principles

The individual datasets are no longer islands, but are one interconnected knowledge base

Other Benefits

Reduced duplication of effort and a better separation of concerns
It would be more efficient for me to simply link to a bibliographic
reference URI on a site that specializes in that then to create my own
bibliographic database.

Similarly, it would be more efficient for the bibliographic database to link
to a URI in a nomenclatural database than curates that aspect separately.

When represented as URI’s in a Semantic Web database or “Triple Store”,
information can be encoded more efficiently ~32 bytes per statement

Enabling usable knowledge bases that scale to billions of “facts”

Example: The Linked Open Data Cloud

Over 55 billion triples and rising

What is Linked Open Data?

1. data representation using open standards
2. use of hyperlinks to make it work on the global web

Wikipedia Images linked to my Species Concepts

TaxonConcept <=> Dbpedia <=> WikiCommons Images
Virtuoso OpenSource and Microsoft Pivot
(some images are too large to display)

How do I Mark up my Data?
Your data set can continue to exist in its current relational
database form, but you need to expose it to the semantic web in a
different form

The goal is to make structured data accessible and discoverable via
hyperlinks.
It also includes the use of hyperlinks to denote properties/
predicates that have well deﬁned semantics.
These semantics are what ontologies and vocabularies deliver with
more ﬁdelity that what's available in a typical RDMS.
Thus, the Semantic Web isn't a destination - it the effect of
publishing data in line with a set of principles as outlined in TimBL's
meme.

Knowledge as Triples
Statements are represented in a triple structure

Subject ➜ Predicate ➜ Object

• An English text version of a triple might look like

• Ochlerotatus triseriatus expected in La Crosse County, WI

Machine Processable Version
Ochlerotatus triseriatus is expected in La Crosse County, WI

Now represented as the following triple*

http://lod.taxonconcept.org/ses/iuCXz#Species

http://lod.taxonconcept.org/ontology/txn.owl#isExpectedIn

http://sws.geonames.org/5258961/

*Not Meant for Human Consumption

Expressing RDF

RDF = Resource Description Framework

Ways to Express RDF (Serialization Formats)

RDF/XML
http://www.w3.org/TR/REC-rdf-syntax/
Notation 3 (N3)
http://www.w3.org/DesignIssues/Notation3.html

Subsets of N3
Turtle (Terse RDF Triple Language)
N-Triples

The Same Triple in Different Formats
RDF/XML (.rdf)

N3 (.n3)

Turtle (.ttl)

You might ﬁnd one of these forms easier to create.
There are various tools that will allow you to convert between one form and another.
If you need RDF/XML, but can create N3; author in N3 then convert those ﬁles to RDF/XML.

How do I tell the Semantic Web
about my Data?

PingtheSemanticWeb
http://pingthesemanticweb.com/
Semantic Sitemaps
http://sw.deri.org/2007/07/sitemapextension/

PingtheSemanticWeb.com
Enter the URL for your RDF documents

Semantic SiteMaps

http://site.example.com/sitemap.xml
http://site.example.com/sitemap.xml.gz
Refer to the sitemap.xml ﬁle in your sites robots.txt ﬁle

How can I Find other Potentially Useful
Data Sets?
CKAN Comprehensive Knowledge Archive Network
http://ckan.net/

Ask the LOD Cloud

Enter in term or name like “Quercus alba”, to see what entities contain that term or name

How can I set up my own Knowledge Base?
Virtuoso Open-Source Edition
http://virtuoso.openlinksw.com/

How can I Query a Knowledge Base?
SPARQL
http://en.wikipedia.org/wiki/SPARQL
http://www.w3.org/TR/rdf-sparql-query/
Query using the Web Interface
Query using your own script or web application

Example

“Describe those occurrences of the species concept Boloria selene”

iSPARQL Query Example Web Interface

More Elaborate SPARQL Query

Query for those mammals that are “expected in” Wisconsin.

* use optional keyword for those attributes that may not exist
* the query includes those attributes that should be returned
The result set will be feed through Microsoft Pivot for Browsing

Result View

Live Query of the LOD Cloud Data Set

Efforts to Align Vocabularies

http://labs.mondeca.com/dataset/lov/index.html

Early EoL LOD

Knowledge Base View

What does the Future hold for the
Semantic Web and Linked Open Data

Improvements in the quantity and quality of LOD data sets.
Improved Alignment of Vocabularies
Improvements in SPARQL and Quadstores
Human and Machine Interpretable Views Merged in RDFa
Better Visualization and Analysis Tools

Other Resources
Linked Open Data http://linkeddata.org/
W3C.org http://esw.w3.org/Main_Page
public-lod email list http://lists.w3.org/Archives/Public/public-lod/
TaxonConcept.org http://www.taxonconcept.org/
TaxonConcept.org Examples http://bit.ly/bundles/pjdlinkeddata/

SlideShare Talks
Evolution Towards Web 3.0: The Semantic Web
http://www.slideshare.net/LeeFeigenbaum/evolution-towards-web-30-the-
semantic-web

Recommendations
Try using and experimenting with existing vocabularies before creating
your own.
Although these technologies allow you to run queries that you might not
have anticipated, thinking about use cases etc. will provide some guidance
on the best way to markup your data.
Start with simple models and representations and add complexity as you
gain experience.
You may not want or be able to expose all your data to the LOD Cloud,
but exposing the metadata in commonly used vocabularies will make your
data more “ﬁndable”
Some vocabularies* are still under development and discussion, but in
many cases you can modify your SQL to RDF export to accommodate
changes.
* For instance, it is not clear to me what is the “best” vocabulary for
representing publications.

Acknowledgments
Kingsley Idehen
http://www.openlinksw.com/blog/~kidehen/

David “Paddy” Patterson mbl.edu

Anne Thessen mbl.edu

Dmitry Mozzherin mbl.edu

Han Wang rpi.edu

Patrick Leary eol.org

The Semantic Web and Linked Open Data: An Introduction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Semantic Web and Linked Open Data: An Introduction

Similar to The Semantic Web and Linked Open Data: An Introduction (20)

Recently uploaded

Recently uploaded (20)

The Semantic Web and Linked Open Data: An Introduction

Editor's Notes