Introduction to the
Semantic Web and
Linking Data



                      Eric Axel Franzon
                      Vice President
                      Semantic Universe/
                      Wilshire Conferences
About Me
• Professional
     • Wilshire Conferences
     • Semantic Universe
     • W3C
     • Guidewire Group
• Coach / Consultant / Trainer
• Geek
Today we will talk about:
• Semantic Technologies
• Semantic Web & Web 3.0
• Linked Data
  – Linked Open Data
  – Linked Enterprise Data
• Use cases
• That harmonica on the first slide
Semantic
  Technologies



Semantic
Web
Web
 Technologies



World
Wide
Web
Semantic Web = Web 3.0

    = Web of Data
www.geekandpoke.com
What is the Web of Data Not?

          • A software package
          • Something that will ever
            “be complete”
          • A replacement for the
            current Web
          • A pipe dream
          • A silver bullet
It’s also not…
• HAL 9000
It’s also not…
• Skynet
What is the Web of Data?

• A Web-scale architecture
• A metadata technology
• A layer of meaning on the
 existing Web
• In use TODAY!
Web of Data
Q: What does Linked Data have
to do with the Semantic Web?
Web 1.0 – Linking Documents
Web 1.0
Web 1.0

“I see: characters
+ formatting
+ images”
 --my Computer
Web 1.0 – Linking Documents
Web 2.0 – Linking People
Web 2.0
Web 2.0

“I see: characters
+ formatting
+ images”
 --my Computer
Web 1.0 – Linking Documents
Web 2.0 – Linking People
Web 3.0 – Linking Data
Web 3.0 – Linking Data
             Title   Publisher   Format
    Author


                     Price
Cover
Web 3.0 – Linking Data
             Title   Publisher   Format
    Author

              “I see: things
                  Price
Cover         + relationships.
              This information
              is about a book.”
Semantic
  Technologies



Semantic   Linked
Web        Open
           Data
Linking Open Data Project
        May, 2007
March 2009
Data from these trusted sources
        is available for you
to use in your applications TODAY.

      Data you can LINK to.

       And not just data…
Semantic Data that is not only
    machine READABLE.

It is machine UNDERSTANDABLE!
Disambiguation
Disambiguation


    mole, n.
But…
Metadata
Doctorow’s Criticisms LOD/LED Response
             “People lie”                Allow users to choose a social trust model

                                         Automate where possible and encourage
          “People are lazy”
                                               authoring where needed

                                          Automate where possible, check where
         “People are stupid”
                                                       possible

  “Mission Impossible: know thyself”        Allow multiple sources of metadata

       “Schemas aren’t neutral”                   Allow multiple schemas

      “Metrics influence results”                 Allow multiple metrics

“There’s more than one way to describe
                                                Allow multiple descriptions
             something”
LOD/LED is flexible
How does LOD/LED work?

1. By uniquely identifying THINGS
2. By uniquely identifying RELATIONSHIPS
3. By using TRIPLES
How does LOD/LED work?

1. By uniquely identifying THINGS

            So, what’s a THING?
A THING is anything that can be uniquely
identified by a URI or a literal (string)

Me                                     http://twitter.com/ericaxel
My postal code       http://www.city-data.com/zips/90043.html
The White House                    Lat: 38.89859 Long: -77.035971
L.A. County’s sales tax rate                              9.750 %
                               http://ericfranzon.com/operator.jpg
This is a collection of THINGS:

                t_people
  Name City            State Post code
  David Fredericksburg VA    22408
  Eric Culver City     CA    90230
Trees and Tables
                 t_people
Name City                    State        Post code
David   Fredericksburg       VA           22408
Eric    Culver City          CA           90230

                                                         people



                                David                                            Eric
                      City                                        City
                             State        Post                           State        Post
                                          code                                        code
         Fredericksburg              VA          22408      Culver City          CA          90230
Trees and Tables – Problem 1

Name     City
                  t_people
                                State         Post code     flag
                                                                                  Adding partial
David    Fredericksburg         VA            22408         1                     data to
Eric     Culver City            CA            90230                               tables leads to
                                                          people
                                                                                  sparseness

                   flag
            1                    David                                            Eric
                       City                                        City
                              State        Post                           State        Post
                                           code                                        code
        Fredericksburg                VA          22408      Culver City          CA          90230
Trees and Tables – Problem 2
                 t_people
                                                                          Common data
Name City                    State        Post code                       leads to (lots!)
David   Culver City          CA           90230
                                                                          of duplication
Eric    Culver City          CA           90230

                                                         people



                                David                                            Eric
                      City                                        City
                             State        Post                           State        Post
                                          code                                        code
             Culver City             CA          90230      Culver City          CA          90230
Graphs
                             people


    flag
1           David                                 Eric
                    City                   City
           Post                                     Post
           code                                     code
                            Culver City
                    State                 State
                                CA



                              90230
How does LOD/LED work?

1. By uniquely identifying THINGS
2. By uniquely identifying RELATIONSHIPS

            Who’s your daddy?
Is Father of
mailto:ericaxel@yahoo.com




  <owl:ObjectProperty rdf:ID="isFather">
      <rdfs:domain rdf:resource="#Person"/>
      <rdfs:range rdf:resource="#Person"/>
  </owl:ObjectProperty>
1. By uniquely identifying THINGS
2. By uniquely identifying RELATIONSHIPS
3. By using TRIPLES



         What’s a triple?
Triples? It’s Elementary! (School)
       book has title.




          Relationship
            Predicate
         That is a Triple!
Triples? It’s Elementary!
         “This book has a title.”

       “Eric wrote this Web page.”

      “This article is about moles.”

             “I like blues.”

           “I like B.L.U.E.S.”

“This image can be used non-commercially.”

“My email address is ericaxel@yahoo.com.”
Triples
           Book     Has Title    “Title”


                    Created




                                                Objects
Subjects




            Eric                 Webpage


                   Has License      CC Non-
           Image
                                   Commercial

                   Predicates
Author             Title




         Book



  ISBN          Publisher
The Trouble with Triples
Cytoscape.org
Our Data are Multiplying.



            Review of the Review
Trends in data growth

• Vast amounts of digital data being
  produced daily.
  –Wal-Mart produces 1 million
   transactions every hour. DBs
   estimated at > 2.5 petabytes
• US National Archives creating > 10
  million digital assets annually
Data Inflation
•   Megabyte (MB) = 220
•   Gigabyte (GB) = 230
•   Terabyte (TB) = 240
•   Petabyte (PB) = 250 or 1000TB
•   Exabyte (EB) = 260 or 1,000PB
                      70
•   Zettabyte (ZB) = 2 or 1,000EB
                      80
•   Yottabyte (YB) = 2 or 1,000ZB
Acceleration
–Decoding human genome involves
 analyzing 3 billion base pairs

 • what took 10 years to process in
   2003, takes a week today
A brand new professional has emerged ....

The   data scientist who combines the
                               ,

                  skills of
  software programmer, statistician and
      storyteller/artist to extract the
nuggets of gold hidden under mountains of
                    data.
  - The Economist, “Data, data everywhere”, Feb 27th 2010
When we come back…
S – T – R – E – T – C - H
         Brea k!
Linked Data
is like a harmonica


• It’s easy to play
Facebook
• Unique Visitors*: 540,000,000
• Page Views: 570,000,000,000
* Per month

Source: Google - The 1000 most-visited sites on the web
Facebook
Facebook
FOAF: Friend-Of-A-Friend




http://www.foaf-project.org/
FOAF-a-Matic
http://www.ldodds.com/foaf/foaf-a-matic
semantictweet.com
semantictweet.com
semantictweet.com

Can create four FOAF files:
    • Friends (who I follow)
    • Followers
    • All
    • Just Me
Linked Data
   is like a harmonica


• It’s easy to play
• It’s a “real” instrument
The Technologies of RDBMS


• Data
• Schemas
• Query Language
RDBMS Data
              t_people
Name City            State Post code
David Fredericksburg VA    22408
Eric Culver City     CA    90230
RDBMS Schema
RDBMS Query Language: SQL

   SELECT isbn,
          title,
          price,
          price * 0.06 AS
   sales_tax
      FROM Book
      WHERE price > 100.00
      ORDER BY title;
The Technologies of LOD/LED


• Data
• Schemas
• Query Language
The Data Language

Resource
Description
Framework
RDF Triples
Subject                       Predicate      Object
http://plushbeautybar.com     dc: creator    http://www.ericax
                                             el.com/foaf.rdf
http://www.geonames.org/      dc: location   N 34° 1' 16''
maps/google_34.021_-                         W 118° 23' 47''
118.396.html
http://twitter.com/ericaxel   foaf: knows    “Brian Sletten”
RDF Triple Components
Subject                       Predicate      Object
http://plushbeautybar.com     dc: creator    http://www.ericax
                                             el.com/foaf.rdf
http://www.geonames.org/      dc: location   N 34° 1' 16''
maps/google_34.021_-                         W 118° 23' 47''
118.396.html
http://twitter.com/ericaxel   foaf: knows    “Brian Sletten”
                                             http://twitter.com/bsletten




          URI                   URI             URI or
                                             String Literal
“RDF is good for distributing data
across the Web and pretending
it’s in one place.”

-Dean Allemang, TopQuadrant
Just so you know…
There are many ways of representing RDF:

    • RDF/XML        • N-Triples
    • N3             • Turtle
    • JSON           • RDFa

Each serialization has pros and cons, but
they all are used to connect
THINGS and RELATIONSHIPS into TRIPLES
The Schemata

Linked Data schemas consist of:

 Your RDF relationships (predicates)
                  +
      Relationship descriptions
LOD/LED Schemata

id   First Name Last Name          Schema
                                                Relationship
1      Tony      Shaw              Data         description
                                                               hasSurname


                                                                     owl:sameAs

               Initial Schema
                            hasFirstName                       hasLastName
                                            hasID



                            Tony                    1              Shaw
Choosing Relationships
• Reuse popular vocabularies
 –FOAF (Friend-of-a-friend)
 –Dublin Core (library/publisher
   metadata)
 –SIOC (Semantically-Interlinked Online
   Communities)
• ...or make up your own!
RDF Triples
Subject                       Predicate      Object
http://plushbeautybar.com     dc: creator    http://www.ericax
                                             el.com/foaf.rdf
http://www.geonames.org/      dc: location   N 34° 1' 16''
maps/google_34.021_-                         W 118° 23' 47''
118.396.html
http://twitter.com/ericaxel   foaf: knows    “David Wood”
Relationship Descriptions
1. Resource Description Framework Schema
(RDFS): Simple, hierarchical classes

2. Simple Knowledge Organization System
(SKOS): Port taxonomies to the Semantic Web

3. Web Ontology Language (OWL): Complex
logical relationships
Combine vocabularies and descriptions
LOD/LED Schemata
• Put as much work into creating your
  LED schema as you put into creating
  your relational schemas
• ... maybe even a bit more (due to links
  between your data and others’).
New York Times -SKOS
New York Times -SKOS
New York Times -SKOS


SKOS STUFF
The query language
SPARQL
     SPARQL
     Protocol
     And
     RDF
     Query
     Language
SPARQL Example #1
FOAF (some people that Eric Franzon knows)

     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT ?name
     FROM <http://ericaxel.com/eric.rdf>
     WHERE {
         ?knower foaf:knows ?known .
         ?known foaf:name ?name .
     }
SPARQL Example #1
Example #1 - Results
SPARQL Example #2
             Querying two FOAF Profiles
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?name
FROM NAMED <http://ericaxel.com/eric.rdf>
FROM NAMED <http://zepheira.com/team/dave/dave.rdf>
WHERE {
  GRAPH <http://ericaxel.com/eric.rdf> {
    ?x rdf:type foaf:Person .
    ?x foaf:name ?name .
  } .
  GRAPH <http://zepheira.com/team/dave/dave.rdf> {
    ?y rdf:type foaf:Person .
    ?y foaf:name ?name .
  } .
}
Example #2 - Results
SPARQL Example #3
    Bart Simpson's chalkboard gags (DBPedia)

SELECT ?episode,?chalkboard_gag
WHERE { ?episode skos:subject ?season .
        ?season rdfs:label ?season_title .
        ?episode dbpedia2:blackboard ?chalkboard_gag .
FILTER (regex(?season_title, "The Simpsons episodes, season")) .
}
ORDER BY ?season
Example #3 - Results
http://www.milinkito.com/swf/bart.php
Are *real* companies using
       Linked Data?
Easy to play; takes work to master.
…and many more!
E-Commerce




  A vocabulary to describe products,
services, and other e-commerce terms.
Who is using GoodRelations?
           1100+ Best Buy stores
Phase 2
~640,000 “next-gen” product detail pages
21 Open Box Products
  listed at this store!
Who is using GoodRelations?
With RDFa + GoodRelations, but no
additional SEO work,
PlushBeautyBar.com was indexed
by Google within one week.
Semantic (Web)
  Technologies
                  Linked RDBMS
                  Enterprise
Semantic   Linked Data       CRM
Web        Open
           Data
                        Calendars
MIXING private and public data?



Absolutely! And it’s really useful to do so!
Example:

iConcertCal
Public + Private Data: iConcertCal
Public + Private Data: iConcertCal
Example:

  Siri
Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,
by disambiguating, and
by reasoning.
Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,
by disambiguating, and
by reasoning.
Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,
by disambiguating, and
by reasoning.
Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,
by disambiguating, and
by reasoning.
Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,
by disambiguating, and
by reasoning.
Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,
by disambiguating, and
by reasoning.
Example:
•   Largest broadcasting corp. in the world
•   8 national TV channels
•   10 national radio stations
•   40 local radio stations
•   An extensive website, bbc.co.uk
• Broadcasts 1,000-1,500 programs per day.
• Publishes information in several formats:
  audio, video, textual.
• Needed to relate information across media
  for both users and third-party developers
• Approach: Create a Web presence for each
    • Broadcast
    • Artist
    • Species (and other biological ranks),
      habitat and adaptation
 –that the BBC has an interest in.
"Creating web identifiers for every item
the BBC has an interest in, and
considering those as aggregations of
BBC content about that item, allows us
to enable very rich cross-domain user
journeys."
-- Yves Raimond
• BBC Music is underpinned by the
  Musicbrainz music database and
  Wikipedia.
• “BBC Music takes the approach that the
  Web itself is its content management
  system. [BBC] editors directly
  contribute to Musicbrainz and
  Wikipedia.”
BBC
• Wildlife Finder links existing LOD data with
  BBC content to make pages about each
  species, habitat and adaptation:
• Wildlife programmes (clips and episodes) are
  identified by tagging the clip or episode with
  the appropriate dbpedia URI.
"The RDF representations of these
web identifiers allow developers to
use our data to build applications."
-- Yves Raimond
A few final thoughts
A little bit can be very powerful!
RDFs
   RDF   RDFa
                OWL
       triple

Web 3.0 = Semantic Web
                SPARQL

Linked Data        SKOS
RDFs                            Dublin Core
                                                          OWL-DL
                                                                       OWL-Full


NLP
           RDF           RDFa
                            triplestore
                                                PURLs
                                                                 OWL              OWL2

                    triple
                                                            OWL-lite
                                          vocabulary                   microdata
ontology                                           folksonomy
 subject            predicate                 object            entity extraction

Web 3.0 = Semantic Web
                                                                SPARQL
  microformats                  REST             GRDDL

                taxonomy                  URI
      Artificial Intelligence             cloud computing           open world reasoning
                        LOD                   LED                  reasoning engine
Linked Data                                               data portability   SKOS
Further Reading




…and more to come!
THANK YOU!
   Questions?
   Operators are standing by.



   EricAxel@yahoo.com
Semantic Technology Conference
www.Semantic-Conference.com
       June 21-25, 2010




       Semantic Universe
  Free Informational Resource
  www.SemanticUniverse.com
Resources
http://geekandpoke.typepad.com/

http://richard.cyganiak.de/2007/10/lod/

http://iconcertcal.com

http://siri.com

http://data.nytimes.com

http://freedigitalphotos.com

http://aldobucchi.com

http://www.milinkito.com/swf/bart.php
Resources
http://www.flickr.com/photos/kellyhogaboom/4369774518/

http://www.flickr.com/photos/zenera/56677048/

http://www.flickr.com/photos/97964364@N00/59780745/

http://www.flickr.com/photos/starwarsblog/793008715/

http://www.flickr.com/photos/peterpearson/871254091/

http://www.flickr.com/photos/birdfarm/60946474/

http://www.flickr.com/photos/entropy1138/173847148/

http://www.flickr.com/photos/wainwright/351684037/

http://data.nytimes.com/50891932523096258603.rdf

Introduction tothe Semantic Web and Linked Data