Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

The Joy of Data
A cookbook for publishing
Linked Data on the Web
Bernadette Hyland, CEO
3 Round Stones, Inc
bhyland@3roundstones.com

A pragmatic
approach to
publishing & consuming
Linked Data

Agenda
• Setting the scene

• Ingredients ... we use a cooking analogy

• Open standards & best practices

• Data modeling without context

• Social contract as a publisher

• Next steps

Setting the scene ...
where should we
focus?

We’ll review
• Converting data into RDF

• The social contract publishers
make

• The importance of announcing

• Where to turn for guidance

Why should we care?
• We pretend our organizations are hierarchical -- they aren’t

• Information is power.

• Combining information from different sources is very
powerful.

• The US data warehouse market in 2010 was $10B

• In 2012 expected to grow to $13.5B

World changing phenomenon

Linked Data approach, we can begin to address the
• Using
non-hierarchical nature of our organizations

• We can combine information sources

• The W3C has deﬁned standards that enable interoperability
and allow us to freely move data

We are sowing the
seeds for nothing
short of a
revolution

What does it take?
• The ingredients list ...

• Thinking differently about your
data

• Modeling for re-use

• Summary of process in 7 steps

“The change from atoms to bits is irrevocable
and unstoppable”
Being Digital by Nicolas Negroponte

We use URIs to describe both bits & atoms ...

Information resources are things that
computers understand, e.g., Web pages, images,
CSS ﬁles, etc.

Non-information resources are atoms, e.g.,
people, places, events, things, concepts, etc.

• A different way of thinking about
data

• The Open World Assumption

• Lots of URIs

• To be citizen of the world (not
everyone speaks English)

• To publish useful information &
announce it!

and Human Readable (or edible)

Publish machine & human
readable content
• Machine readable format
• Human-readable descriptions of your data set
• Increase visibility with search engines
• Include RDFa or other microformats
• Publish a voID description of your RDF dataset

100%

House email
90%

SEO
80%
Paid search
Banners,
70% buttons
Text-link ads
Usage >>>

Afﬁliate Marketing
60% Behavioral
Contextual targeting
targeting
Rented email
lists
50% Rich media/
video

40%
Pop-ups/
pop-unders
30%
0% 10% 20% 30% 40% 50% 60%
Marketers Reporting “Great” Return on Investment

There is a Process

Identify Model Name Describe Convert Publish

Maintain

Preparation
1. Leverage what exists
• Request a copy of the logical and physical model of the
database(s)
• Obtain data extracts (i.e., databases and/or spreadsheets)
or create data in a way that can be replicated.

Modeling the data
2. Model data without context to allow for
reuse and easier merging of data sets

• Traditional
DBAs organize data for speciﬁed
Web services or applications.

• With LD, application logic does not drive the
data schema, concepts, etc.

Modeling the data
3. Look for real world objects of interest (e.g., people, places,
things, locations, etc.) and model them.
• Investigate how others are already modeling similar or
related data.
• Look for duplication and normalize the data
• Use common sense to decide whether or not to make link

Modeling the data ...
4. Connect data from different sources and authoritative
vocabularies (see list of popular vocabularies below).
• Use URIs as names for your objects

Modeling the data ...

• Put aside immediate needs of any application
• Don’t think about how an application will use your data
• Do think about time and how the data will change over
time.

Convert, Publish & Maintain

5. Write a script or process to convert the data set
repeatedly

6. Publish to the Web and announce it! (more details shortly)

7. Maintenance strategy (more details in the social contract at
the end)

Take the plunge ... Be forgiving

• Simplistic data models can still be useful

• Better to make progress with something rather than do
nothing because we cannot be comprehensive and
complete

Take an iterative approach
1. Review of modeling decisions

2. Review vocabularies chosen and developed

3. Modify/update data conversion scripts

4. Do a maintenance walk-through with real use cases

5. Show how to explore data with SPARQL and
visualizations

6. Discuss a persistent identiﬁer strategy (think PURLs)

Data stewards should....

• Make data accessible via the Web’s standard
access mechanism, speciﬁcally http URIs
• Represent data in a common format,
such as RDF/XML, Notation-3 (N3), Turtle, N-
Triples, RDFa, and RDF/JSON
• Provide self describing data

Linked Data Formats
• RDF/XML - RDF for XML pipelines

• Turtle - Human-readable RDF

• XHTML with GRDDL transformation

• XHTML with embedded RDFa

• RDF Schema - Describing structure

In a tart, smoothie or
margarita ... berries
can be combined in
different ways

Guidelines for merging

• URIs name the resources we are describing
• Two people using the same URI are describing the same
thing
• The same URI in two datasets means the same thing
• Graphs from several different sources can be merged;
• Resources with the same URI are considered identical;
• No limitations on which graphs can be merged.

Announcing the
finished
product!

•Inform the LOD
developer community
(linkeddata.org, W3 lists)
•Announce to search
engines (RDFa hints, register
to make accessible)
•Publish human readable
descriptions
•Encourage interlinking
•Publish schema as voID
•Include SPARQL
endpoint

ACCEPTABLE ROI FOR IT

4% 17%
13%

16%

6 months
49% 12 months
18 months
24 months
More than 24 months

The Social Contract ...
The not so fine print

• LOD is a social contract to provide the public with information
• Follow best practices for modeling
• Carefully consider your URI strategy
• Ensure that your LOD remains available where you say it will be
• Publish voID description
• For a government agency ... a data policy is “a must”
• specify data quality and retention, treatment of data thru
secondary sources, restrictions for use, frequency of updates,
public participation, and applicability of this data policy

We’ve created
someting quite
beautiful

Reading

http://linkeddatabook.com/editions/1.0/

http://3roundstones.com/linking-enterprise-data/

This work is Copyright © 2011 3 Round Stones Inc.
It is licensed under the Creative Commons Attribution 3.0 Unported License
Full details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:
Attribution. You must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse
you or your use of the work).
• For any reuse or distribution, you must make clear to others the license terms of this work.
• Any of the above conditions can be waived if you get permission from the copyright holder.
• Nothing in this license impairs or restricts the author's moral rights.
• Some Content in the work may be licensed under different terms, this is noted separately.

Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Recommended

Recommended

More Related Content

Similar to Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Similar to Bernadette Hyland SemTech 2011 West - Linked Data Cookbook (20)

More from Bernadette Hyland-Wood

More from Bernadette Hyland-Wood (20)

Recently uploaded

Recently uploaded (20)

Bernadette Hyland SemTech 2011 West - Linked Data Cookbook