Intro to the Semantic Web Landscape - 2011

The Semantic Web Landscape
A Practical Introduction

Contact:
Lee Feigenbaum
lee@cambridgesemantics.com
VP Technology
Co-chair, W3C SPARQL Working Group
©2011 Cambridge Semantics Inc. All rights reserved.

Example: Alzheimer’s Drug Discovery

What genes are involved in signal transduction
and are related to pyramidal neurons?

2 ©2011 Cambridge Semantics Inc. All rights reserved.

General search: 223,000 hits, 0 results


Domain-limited search: Still 2,580 potential results


Specific databases: Too many silos!


Linked Scientific Data: 32 targeted results


What’s the trick?

1. Agreement on common terms and
relationships
2. Incremental, flexible data structure
3. Good-enough modeling
4. Query interface tailored to the data model


WHAT IS THE SEMANTIC WEB?


Names


Branding

• Semantic Web
• Web of Data
• Giant Global Graph
• Data Web
• Web 3.0
• Linked Data Web
• Semantic Data Web
• Enterprise Information Web


Semantic Web – 1st View

• “The Semantic Web”
– Link explicit data on the World Wide Web in a machine-
readable fashion
• …government data
• …commercial data
• …scientific data
• …social data
– In order to enable applications such as…
• …targeted, semantic search
• …data browsing
• …automated agents

World Wide Web : Web pages :: The Semantic Web : Data


Semantic Web – 2nd View

• “Semantic Web technologies”
– A family of technology standards that ‘play nice together’,
including:
• Flexible data model
• Expressive ontology language
• Distributed query language
– Drive enterprise applications, including:
• Data integration & virtualization
• Business intelligence
• Large knowledgebases
• …
The technologies enable us to build applications and solutions that
were not possible, practical, or feasible traditionally.


A Common & Coherent Set of Technology Standards

• A common set of technologies:
– ...enables diverse uses
– ...encourages interoperability
• A coherent set of technologies:
– …encourage incremental application
– …provide a substantial base for innovation
• A standard set of technologies:
– ...reduces proprietary vendor lock-in
– ...encourages many choices for tool sets


The (In)Famous Layer Cake


Semantic Web Technology Timeline

1999 2001 2004 2007 2008 2011

RIF


2011: Where we are

As technologies & tools have evolved, Semantic Web
advocates have progressed through stages:

Report on… Execute on…

Semantic Web vision Initial experiments

Experiments Technology standards

Technology standards Software packages

Software packages Proofs of concept

Proofs of concept Initial production implementations

2nd, 3rd, … implementations—
Initial production implementations
network effect


2011: Where we’re not
Image from Trey Ideker via Enoch Huang

Semantic Web technologies are not a ‘magic crank’ for discovering
new drugs (or solving other problems, for that matter)!


2011: Where we’re not (cont’d)

“Ontology” vs.
“ontology”? XML vs. RDF?

Semantic Web vs.
Linked Data?
Data integration vs.
reasoning vs. KBs
RDFa vs. microformats vs. search vs. app.
vs. microdata vs. development vs. …
schema.org
The Semantic Web still suffers from confusing and conflicting
messaging, each of which claims it’s “correct”.

2011: Where we’re not (cont’d)

We don’t yet have standard solutions for privacy, trust, probability,
and other elements of the Semantic Web vision.


What do Semantic Web solutions look like?


RDF is…

Resource Description Framework


RDF is…

The data model of the Semantic Web.


RDF is…

A flexible data model that features unambiguous
identifiers and named relations between pairs of
resources.


RDF is…

A labeled, directed graph of relations between
resources and literal values.

• RDF graphs are collections of triples
• Triples are made up of a subject, a predicate, and an
object

predicate
subject object

• Resources and relationships are named with URIs


Example RDF triples

• “Lee Feigenbaum works for Cambridge Semantics”
Lee works for Cambridge
Feigenbaum Semantics

• “Lee Feigenbaum was born in 1978”
Lee born in
1978
Feigenbaum

• “Cambridge Semantics is headquartered in
Massachusetts”
Cambridge headquartered
Massachusetts
Semantics


Triples connect to form graphs

Lee works for Cambridge
Feigenbaum Semantics

headquartered
born in
lives in

1978 Massachusetts

capital

Boston


Why RDF? What’s different here?

• The graph data structure makes merging data with
shared identifiers trivial
• Triples act as a least common denominator for
expressing data
• URIs for naming remove ambiguity
– …the same identifier means the same thing


Why RDF? Coping With Change

Flexible
Graph
URIs for Agility
Model naming On-the-fly

The World Changes

Traditionally:
Change is costly
Semantics:
Change is cheap

RDB 1 RDB 2


Why RDF? Add Meaning to Data

With traditional technology:
Cust ID Name Referred By Work Phone
29212 Travis Ember Janet Cassy
Barbara Cassy 212-555-5001 Inside the
30012 Jessica Evalta Brian Meedly 617-555-2325 database
59235 Hector Samton Agatha Browne 732-555-8715

29212 Travis Ember Janet Cassy 212-555-5001
30012 Jessica Evalta Brian Meedly 617-555-2325
Outside the
59235 Hector Samton Agatha Browne 732-555-8715
database

No one knows what these numbers and names mean!

Why RDF? Add Meaning to Data

With Semantic Web technology:
name Text

Person referred by Data
Text description
mobile
phone Text

name Travis Ember

Person2912 referred by Data,
Janet Cassy
wherever it
mobile appears
phone 212-555-5001

The meaning always travels with the data

What does RDF look like?

• RDF is the model, for which there are several
concrete syntaxes:
– RDF/XML – standard, complex XML syntax
– Turtle – common, textual, triples-oriented syntax
• …currently being standardized by the RDF working group
– N3 – more expressive superset of Turtle
– N-Triples – textual, line-oriented, useful for streaming

When writing RDF by hand and in many guides, examples,
and discussions these days, you’ll see Turtle most often.


A Bit of Turtle

• Write a triple by writing its parts separated by spaces
(subject predicate object)

@prefix ex: <http://example.org/myvocab/> .
@prefix geo: <http://geonames.example/> .

ex:LeeFeigenbaum ex:employer ex:CambridgeSemantics .
ex:LeeFeigenbaum ex:birthYear 1978 .
ex:CambridgeSemantics ex:headquarters geo:BostonMA .
geo:BostonMA ex:population 574000 .

SPARQL is…

SPARQL Protocol And RDF Query Language


SPARQL is…

The query language of the Semantic Web.


SPARQL is…

A SQL-like language for querying sets of RDF
graphs.


SPARQL is…

A simple protocol for issuing queries and
receiving results over HTTP. So…

Every SPARQL client works with every SPARQL
server!


Why SPARQL?

SPARQL lets us:
• Pull information from structured and semi-structured
data.
• Explore data by discovering unknown relationships.
• Query and search an integrated view of disparate
data sources.
• Glue separate software applications together by
transforming data from one vocabulary to another.
• Update RDF data in bulk


Dealer 2
Dealer 1 Dealer 3
Employee ERP / Budget
Directory System
Web EPA Fuel Efficiency
Spreadsheet

SPARQL Query Engine

What automobiles get more than 25 miles per gallon and can be purchased at a
dealer located within 10 miles of one of my employees?

SELECT ?automobile
WHERE {
?automobile a ex:Car ; epa:mpg ?mpg ;
ex:dealer ?dealer .
?employee a ex:Employee ; geo:loc ?loc .
?dealer geo:loc ?dealerloc .
FILTER(?mpg > 25 &&
geo:dist(?loc, ?dealerloc) <= 10) .
}
Web dashboard SPARQL query

The SPARQL 1.1 Landscape Includes

• A query language
– Now with aggregates, subqueries, property
paths, negation, & more
• An update language
• An HTTP protocol for issuing SPARQL queries &
updates
• A REST protocol for reading/writing RDF data
• A service description mechanism & vocabulary
• Basic federated query extensions
• Standard semantics for mixing query with reasoning


From the explicit to the inferred

• 3 pieces of the Semantic Web technology stack are
about describing a domain well enough to capture
(some of) the meaning of resources and relationships
in the domain
– RDF Schema
– OWL
– RIF

Apply knowledge to data to get more data.


RDFS is…

RDF Schema


RDF Schema is…

• Elements of:
– Vocabulary (defining terms)
• I define a relationship called “prescribed dose.”

– Schema (defining types)
• “prescribed dose” relates “treatments” to “dosages”
– (my prescribed dose is 2mg; therefore 2mg is a dosage)

– Taxonomy (defining hierarchies)
• Any “doctor” is a “medical professional”
– (therefore Dr. Brown is a medical professional)


WOL OWL is…

Web Ontology Language


OWL is…

• Elements of ontology
– Same/different identity
• “author” and “auteur” are the same relation
• two resources with the same “ISBN” are the same “book”
– More expressive type definitions
• A “cycle” is a “vehicle” with at least one “wheel”
• A “bicycle” is a “cycle” with exactly two “wheels”
– More expressive relation definitions
• “sibling” is a symmetric predicate
• the value of the “favorite dwarf” relation must be one of “happy”,
“sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc”


OWL: Rich Class Definitions

• A class is a (named) collection of things with similar
attributes

Image courtesy of Fabien Gandon


attributes


Why Ontologies? Put Data Within Reach of Domain Experts

High-fidelity mappings
make data reusable for
many situations


RIF is…

Rules Interchange Format


RIF is…

• Standard representation for exchanging sets of
logical and business rules
• Logical rules
– A buyer buys an item from a seller if the seller sells the
item to the buyer
– A customer becomes a "Gold" customer as soon as his
cumulative purchases during the current year top $5000
• Production rules
– Customers that become "Gold" customers must be notified
immediately, and a golden customer card will be printed
and sent to them within one week
– For shopping carts worth more than $1000, "Gold"
customers receive an additional discount of 10% of the
total amount

Fantasy Land Architecture

Ontology /

+ Schema

Custom Custom Custom Custom Custom Custom
UI UI UI UI UI UI


Reality

Internet
DB2
XML
LDAP
Oracle Directory
RDB

Custom Custom Custom Custom Custom Custom
UI UI UI UI UI UI


R2RML is…

Relational to RDF Mapping Language


R2RML is…

An RDF vocabulary for specifying mappings from
relational data to RDF data (and SPARQL).

The following R2RML slides are courtesy of Alex Miller:
http://www.slideshare.net/alexmiller/releasing-relational-data-to-the-semantic-web-7634727


GRDDL is…

Gleaning Resource Descriptions from Dialects of
Language


GRDDL is…

A method for authoritatively getting RDF data
from XML and XHTML documents.


Linked Data is…

• A simple set of 4 guidelines for publishing RDF data on
the Web (over HTTP)
– Developed by Tim Berners-Lee in 2006

1. Use URIs as names for things
• Globally unique identity
2. Use HTTP URIs
• Everyone has a Web browser/client
3. When someone looks up a URI, provide useful
information
• …in the form of RDF data
4. Include links to other URIs
• Foster discovery of additional information


The LOD Cloud, 2007


The LOD Cloud, 2008


The LOD Cloud, 2009


LOD, 2011


RDFa is…

RDF in Attributes


RDFa is…

A collection of HTML attributes that allow RDF to
be embedded directly in Web pages.


RDFa Example


My name is Manu Sporny
and you can give me a ring via
1-800-555-0155.
<img rel="image" src="http://manu.sporny.org/images/manu.png" />
I have a
<a rel="foaf:weblog" href="http://manu.sporny.org/">blog</a>.


Example courtesy of Manu Sporny:
http://manu.sporny.org/2011/rdfa-lite/

Why RDFa?

• Don’t Repeat Yourself (DRY)
• In-context metadata (copy & paste)
• Authoritative (no screen scraping)


RDFa in action


SEMANTIC WEB LANDSCAPE TODAY


Semantic Web Tools

In 2011, there are a wide variety of open-source
and commercial Semantic Web tools available.


Types of RDF Tools

• Triple stores
– Built on relational database—increasingly less common
– Native RDF store
• Development libraries
• Full-featured application servers

Most RDF tools contain some elements of each of
these.


Finding RDF Tools

• Community-maintained lists
– http://esw.w3.org/topic/SemanticWebTools
• Emphasis on large triple stores
– http://esw.w3.org/topic/LargeTripleStores
• Michael Bergman’s Sweet Tools searchable list:
– http://www.mkbergman.com/?page_id=325
• Community forums:
– http://answers.semanticweb.com
– #swig on irc.freenode.net
– semantic-web@w3.org


Types of SPARQL Tools

• Query engines
– Things that can run queries
– Most RDF stores provide a SPARQL engine
• Query rewriters
– E.g. to query relational databases (more later)
• Endpoints
– Things that accept queries on the Web and return results
• Client libraries
– Things that make it easy to ask queries


Finding SPARQL Tools

• Community-maintained list of query engines
– http://esw.w3.org/topic/SparqlImplementations
• Publicly accessible SPARQL endpoints
– http://esw.w3.org/topic/SparqlEndpoints
• Michael Bergman’s Sweet Tools searchable list:
– http://www.mkbergman.com/?page_id=325
• Community forums:
– http://answers.semanticweb.com
– #swig on irc.freenode.net
– semantic-web@w3.org


OWL Tools and Infrastructure

• Editors/environments
– Protégé, TopBraid, Oiled, Ontotrack, …

• Reasoning systems
– Pellet, FaCT++, Hermit, Racer, CEL, …

• Reasoning integrated into RDF databases
– OWLIM, Oracle RDF, Stardog, Virtuoso


Visualizing and Publishing Vocabularies


Reusable, public ontologies

FOAF

The Event Ontology

Measurement Units Ontology


What about… everything else?

Standards don’t yet exist, but many tools exist to
derive RDF and/or run SPARQL queries against
other sources of data.


LDAP Directories

Squirrel RDF
http://jena.sourceforge.net/SquirrelRDF/


Excel spreadsheets

Anzo for Excel
http://www.cambridgesemantics.com/products/anzo_for_excel


Web-based data sources

Virtuoso Sponger Cartridges
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger


Unstructured Text

Calais
http://www.opencalais.com/


Unstructured Text

Zemanta Web Service
http://developer.zemanta.com/


Semantic Web In Use: Social Data

• People, relationships
– Friend Of A Friend (“FOAF”) – foaf:knows
– Self-published or site-published (LiveJournal, hi5, …)
• Blogs, discussion forums, mailing lists
– Semantically Interlinked Online Communities (“SIOC”)
– Plug-ins for popular blogging & CMS platforms
• Calendars, vCards, reviews, …
– One-offs


Social Data Example

• Facebook Open Graph Protocol


Semantic Web In Use: Scientific Data

May 12, 2009 89

Semantic Web In Use: Enterprises on the Web

• Thesis: Describe your business more precisely and
drive more (and better) traffic to your site
• Example: NYTimes publishes their article
classification scheme as linked data
• Example: Best Buy, Overstock.com use RDFa to
annotate product listings


Measurable Results

• 30% increase in search-engine traffic
• 15% increase in click-through-rate for search ads


Semantic Web In Use: Inside the Enterprise

• Many and Varied Applications Across Industries
– Health care and pharma
• integration, classification, ontologies
– Oil & Gas
• integration, classification
– Finance
• structured data, ontologies, XBRL
– Publishing
• metadata
– Libraries & museums
• metadata, classification
– IT
• rapid application development & evolution

Targeting High-Potential Opportunities in Pharma

...
Territory Profile Preferred
Regional targets
Analyst

Per-analyst
relevance filter

Universe of
considered
opportunities
High-potential
opportunities
Mobile device


Delivering Dynamic, Data-driven Websites

“publishing stack is a great innovation for the BBC as dynamicthe first to
The development of this new high-performance
we are
semantic

use this technology on such a high-profile site. It also puts us at the
cutting edge of development for the next phase of the Internet, Web 3.0.

Semantic Web In Use: Government data

– Since January 2010, 2,500 (large) datasets published as
Linked Data

– Since May 2009, 250,000 (smaller)
datasets published (CSV, XML, …)
– RPI project to convert datasets to
Linked Data


TAKE-AWAY ADVICE


Where do Semantic Web technologies shine?

• These are horizontal, enabling technologies.
• But they apply particularly well to problems with
these characteristics:
– Heterogeneous data from multiple, diverse sources
• Increasing reliance on connections within this data
– Rapidly changing information needs
– Significant early-mover advantage
– Cross-organizational collaboration
– Large amounts of data that would benefit from
classification


Getting Started with Semantic Web technologies

Don’t boil the ocean.


Getting Started with Semantic Web technologies

• Goal: quick tactical wins on the path to large
strategic value
• Be sure to consider the operational ramifications
– Who does what differently?
• Ideal Semantic Web projects/applications have an
incremental path towards broad deployment that
generates demonstrable value along the way


Choose practical, enterprise-ready tools

• Look beyond the core Semantic Web capabilities and
consider:
– integration with existing enterprise systems
– development & extension models
– deployment, logging, maintenance, backup
– tooling
– user experience

If you choose to build new components and
assemble existing components together, it’s quite
likely you’ll end up reinventing the wheel.


Plan for Acquiring Expertise

• What level of expertise is necessary?
– Technologies only?
– Technologies + API?
– Technologies + tooling?
– Tooling only?
– …
• How will we acquire the expertise?
– In-house (and if so, how?)
– Vendor services
– 3rd-party services
– Open-source community


Thanks & Discussion

• I’m always happy to field questions & engage in
discussion:

lee@cambridgesemantics.com


Intro to the Semantic Web Landscape - 2011

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro to the Semantic Web Landscape - 2011

Similar to Intro to the Semantic Web Landscape - 2011 (20)

More from LeeFeigenbaum

More from LeeFeigenbaum (6)

Recently uploaded

Recently uploaded (20)

Intro to the Semantic Web Landscape - 2011

Editor's Notes