‫أكاديمية الحكومة اإللكترونية الفلسطينية‬
              The Palestinian eGovernment Academy
                          www.egovacademy.ps

Tutorial II: Data Integration and Open Information Systems

                     Session 15.1
     The Data Web and Linked Data

                    Dr. Mustafa Jarrar
                       University of Birzeit
                       mjarrar@birzeit.edu
                         www.jarrar.info


                             PalGov © 2011                1
About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
Project Consortium:
             Birzeit University, Palestine
                                                           University of Trento, Italy
             (Coordinator )


             Palestine Polytechnic University, Palestine   Vrije Universiteit Brussel, Belgium


             Palestine Technical University, Palestine
                                                           Université de Savoie, France

             Ministry of Telecom and IT, Palestine
                                                           University of Namur, Belgium
             Ministry of Interior, Palestine
                                                           TrueTrust, UK
             Ministry of Local Government, Palestine


Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011
                                                                                                 2
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.


No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.




                 Attribution-NonCommercial-ShareAlike
                              CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.

                                 PalGov © 2011                               3
Tutorial Map

                                                                                                          Topic                                     h
               Intended Learning Objectives
                                                                             Session 1: XML Basics and Namespaces                               3
A: Knowledge and Understanding
                                                                             Session 2: XML DTD’s                                               3
 2a1: Describe tree and graph data models.
                                                                             Session 3: XML Schemas                                             3
 2a2: Understand the notation of XML, RDF, RDFS, and OWL.
 2a3: Demonstrate knowledge about querying techniques for data               Session 4: Lab-XML Schemas                                         3

 models as SPARQL and XPath.                                                 Session 5: RDF and RDFs                                            3

 2a4: Explain the concepts of identity management and Linked data.           Session 6: Lab-RDF and RDFs                                        3
 2a5: Demonstrate knowledge about Integration &fusion of                     Session 7: OWL (Ontology Web Language)                             3
 heterogeneous data.                                                         Session 8: Lab-OWL                                                 3
B: Intellectual Skills                                                       Session 9: Lab-RDF Stores -Challenges and Solutions                3
 2b1: Represent data using tree and graph data models (XML &                 Session 10: Lab-SPARQL                                             3
 RDF).                                                                       Session 11: Lab-Oracle Semantic Technology                         3
 2b2: Describe data semantics using RDFS and OWL.                            Session 12_1: The problem of Data Integration                      1.5
 2b3: Manage and query data represented in RDF, XML, OWL.                    Session 12_2: Architectural Solutions for the Integration Issues   1.5
 2b4: Integrate and fuse heterogeneous data.                                 Session 13_1: Data Schema Integration                              1
C: Professional and Practical Skills                                         Session 13_2: GAV and LAV Integration                              1
 2c1: Using Oracle Semantic Technology and/or Virtuoso to store              Session 13_3: Data Integration and Fusion using RDF                1
 and query RDF stores.                                                       Session 14: Lab-Data Integration and Fusion using RDF              3
D: General and Transferable Skills
 2d1: Working with team.                                                     Session 15_1: Data Web and Linked Data                             1.5
 2d2: Presenting and defending ideas.                                        Session 15_2: RDFa                                                 1.5
 2d3: Use of creativity and innovation in problem solving.
 2d4: Develop communication skills and logical reasoning abilities.          Session 16: Lab-RDFa                                               3

                                                                      PalGov © 2011                                                                     4
Module ILOs


After completing this module students will be
able to:
   -Explain the concepts of identity management and
   linked data.
   - Understand basic concepts of the Data Web.
   -Integrate and fuse heterogeneous data.




                         PalGov © 2011                5
Semantic/ Data Web/ Web 3.0?



“The goal of the Semantic Web is
to create a universal medium for the
exchange of DATA”, W3C.




“The Semantic Web is a web of data, in
some ways like a global database”,
Tim Berners-Lee – Inventor of the WWW.




                             PalGov © 2011   6
Web of Data

• The Data Web envisions the web as a world-wide
  interlinked structured data.

• The Web as we know it today is a global information space
  of linked documents.

• The same vision is applied to data: publishing and
  connecting structured data on the web.




                         PalGov © 2011                   7
Classical Web
                                      Diagram Source: Christian Bizer



                • The classical web a global
                  information space of linked
                  documents.
                • Primary Units of the hypertext
                  Web are:
                     – HTML Documents,
                     – Connected by Hyperlinks




                PalGov © 2011                                    8
The challenge

• The problem is that the information on the
  classical web is not structured.
   – Programs cannot use such information in a useful way.


• The Solution is to increase the structure of
  published information.




                        PalGov © 2011                    9
Web APIs and Mashups
                                         Diagram Source: Christian Bizer



• Many major data sources such as Amazon, Yahoo!, eBay,
  and Google provide access to their data through APIs.

• Currently, programmableweb.com lists 3891 APIs and
  6101 mashups (up to 14. Sep 2011).


               API

               API            MashUp

               API


                       PalGov © 2011                                10
Web APIs and Mashups
                                                         Picture Source: Christian Bizer


•   However,
     – APIs provide proprietary interfaces,
     – Data retrieved from these APIs is represented using different formats
       (different data models).
     – Mashups created using these APIs are based on a fixed set of data
       sources. This is because entities in different APIs are not linked.
     – You can not set hyperlinks between entities.




                                                 APIs separates
                                                 data




                                 PalGov © 2011                                       11
Beyond Web APIs and Mashups:
      The Data Web and Linked Data
• The Data Web envisions the web as a world-wide
  interlinked structured data.

• Linked data refers to the set of best practices for
  publishing and connecting structured data on the web.

• Linked data best practices has lead to the extension of the
  web connecting data from diverse domains such as:
   – People, companies, books, scientific publications, films, music, television
     and radio programs, genes, proteins drugs, clinical trials, online
     communities, statistical and scientific data, reviews, …




                                 PalGov © 2011                                 12
The Data Web and Linked Data
                                                         Diagram Source: Christian Bizer


•   While the primary units of the hypertext Web are HTML documents
    connected by un-typed Hyperlinks, Linked Data relies on documents
    containing data in RDF. However, rather than simply connecting these
    documents, Linked Data uses RDF to make typed statements that link
    arbitrary things in the world.
•   The result is a web of things in the world, described by data on the Web




                                 PalGov © 2011                                      13
The Data Web and Linked Data

Berners-Lee (2006) outlined a set of 'rules' for publishing
data on the Web in a way that all published data becomes
part of a single global data space:

    1. Use URIs as names for things
    2. Use HTTP URIs so that people can look up those
       names
    3. When someone looks up a URI, provide useful
       information, using the standards (RDF, SPARQL)
    4. Include links to other URIs, so that they can discover
       more things



                          PalGov © 2011                       14
Properties of the Web of Linked Data

• Anyone can publish data to the Web of Linked Data

• Entities are connected by links
   – creating a global data graph that spans data sources and enables
     the discovery of new data sources.


• Data is self-describing
   – If an application encounters data represented using an unfamiliar
     vocabulary, the application can resolve the URIs that identify
     vocabulary terms in order to find their RDFS or OWL definition.


• The Web of Data is open
   – meaning that applications can discover new data sources at run-
     time by following links.
                             PalGov © 2011                             15
Realization: Linking Open Data Project




               PalGov © 2011             16
Realization: Linking Open Data Project

• Grassroots community effort to:
   – publish existing open license datasets as Linked Data
     on the Web
   – interlink things between different data sources


• By September 2010 the cloud had grown to 25
  billion RDF triples, interlinked by around 395
  million RDF links.




                        PalGov © 2011                    17
Linking Data

    •    How are same entities described in different datasets linked?
    •    AGAIN: By linking the Global Identifier, that is, the URI!
    •    Let’s have a look at real examples from real datasets:

        <http://dbpedia.org/resource/Bethlehem> owl:sameAs
        <http://sws.geonames.org/284315/>
 • Linking the entity “Bethlehem” between the DBPedia dataset and the Geonames dataset
   in the Linking Open Data cloud.
 • This is done by linking the URIs of “Bethlehem” in both datasets using owl:sameAs.

<http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs
<http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007>
 • Linking the entity “Tim Berners-Lee” between the DBPedia dataset and the DBLP dataset .
 • This is done by linking the URIs of “Tim Berners-Lee” in both datasets using owl:sameAs.


NOTE: The student is encouraged to visit the URIs specified above.
                                         PalGov © 2011                                    18
Resources



                                        http://dbpedia.org/resource/Bethlehem
                                      (Bethlehem URI in DBPedia)




http://sws.geonames.org/284315/ 
(Bethlehem URI in Geonames)

                                    PalGov © 2011                         19
Applications
                                   Diagram Source: Christian Bizer



• What Can I do with this?




                   PalGov © 2011                              20
Let’s draw a graph of our example!



          v:Person

                           “George Mousa”

    v:nickname
                          “Geno”
                                             v:Address

                                 v:city
                                              “Nablus”


                                            “Palestine”


                     “George Mousa”



                       PalGov © 2011                      21
References

•   Christian Bizer: The Emerging Web of Linked Data. Presentation at
    SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009.

•   W3C: www.w3c.org

•   Linking Open Data:
    http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData




                                   PalGov © 2011                                 22

Pal gov.tutorial2.session15 1.linkeddata

  • 1.
    ‫أكاديمية الحكومة اإللكترونيةالفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.ps Tutorial II: Data Integration and Open Information Systems Session 15.1 The Data Web and Linked Data Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info PalGov © 2011 1
  • 2.
    About This tutorial ispart of the PalGov project, funded by the TEMPUS IV program of the Commission of the European Communities, grant agreement 511159-TEMPUS-1- 2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps Project Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, Palestine Coordinator: Dr. Mustafa Jarrar Birzeit University, P.O.Box 14- Birzeit, Palestine Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  • 3.
    © Copyright Notes Everyoneis encouraged to use this material, or part of it, but should properly cite the project (logo and website), and the author of that part. No part of this tutorial may be reproduced or modified in any form or by any means, without prior written permission from the project, who have the full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SA This license lets others remix, tweak, and build upon your work non- commercially, as long as they credit you and license their new creations under the identical terms. PalGov © 2011 3
  • 4.
    Tutorial Map Topic h Intended Learning Objectives Session 1: XML Basics and Namespaces 3 A: Knowledge and Understanding Session 2: XML DTD’s 3 2a1: Describe tree and graph data models. Session 3: XML Schemas 3 2a2: Understand the notation of XML, RDF, RDFS, and OWL. 2a3: Demonstrate knowledge about querying techniques for data Session 4: Lab-XML Schemas 3 models as SPARQL and XPath. Session 5: RDF and RDFs 3 2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3 2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3 heterogeneous data. Session 8: Lab-OWL 3 B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3 2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3 RDF). Session 11: Lab-Oracle Semantic Technology 3 2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5 2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5 2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1 C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1 2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1 and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3 D: General and Transferable Skills 2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5 2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5 2d3: Use of creativity and innovation in problem solving. 2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3 PalGov © 2011 4
  • 5.
    Module ILOs After completingthis module students will be able to: -Explain the concepts of identity management and linked data. - Understand basic concepts of the Data Web. -Integrate and fuse heterogeneous data. PalGov © 2011 5
  • 6.
    Semantic/ Data Web/Web 3.0? “The goal of the Semantic Web is to create a universal medium for the exchange of DATA”, W3C. “The Semantic Web is a web of data, in some ways like a global database”, Tim Berners-Lee – Inventor of the WWW. PalGov © 2011 6
  • 7.
    Web of Data •The Data Web envisions the web as a world-wide interlinked structured data. • The Web as we know it today is a global information space of linked documents. • The same vision is applied to data: publishing and connecting structured data on the web. PalGov © 2011 7
  • 8.
    Classical Web Diagram Source: Christian Bizer • The classical web a global information space of linked documents. • Primary Units of the hypertext Web are: – HTML Documents, – Connected by Hyperlinks PalGov © 2011 8
  • 9.
    The challenge • Theproblem is that the information on the classical web is not structured. – Programs cannot use such information in a useful way. • The Solution is to increase the structure of published information. PalGov © 2011 9
  • 10.
    Web APIs andMashups Diagram Source: Christian Bizer • Many major data sources such as Amazon, Yahoo!, eBay, and Google provide access to their data through APIs. • Currently, programmableweb.com lists 3891 APIs and 6101 mashups (up to 14. Sep 2011). API API MashUp API PalGov © 2011 10
  • 11.
    Web APIs andMashups Picture Source: Christian Bizer • However, – APIs provide proprietary interfaces, – Data retrieved from these APIs is represented using different formats (different data models). – Mashups created using these APIs are based on a fixed set of data sources. This is because entities in different APIs are not linked. – You can not set hyperlinks between entities. APIs separates data PalGov © 2011 11
  • 12.
    Beyond Web APIsand Mashups: The Data Web and Linked Data • The Data Web envisions the web as a world-wide interlinked structured data. • Linked data refers to the set of best practices for publishing and connecting structured data on the web. • Linked data best practices has lead to the extension of the web connecting data from diverse domains such as: – People, companies, books, scientific publications, films, music, television and radio programs, genes, proteins drugs, clinical trials, online communities, statistical and scientific data, reviews, … PalGov © 2011 12
  • 13.
    The Data Weband Linked Data Diagram Source: Christian Bizer • While the primary units of the hypertext Web are HTML documents connected by un-typed Hyperlinks, Linked Data relies on documents containing data in RDF. However, rather than simply connecting these documents, Linked Data uses RDF to make typed statements that link arbitrary things in the world. • The result is a web of things in the world, described by data on the Web PalGov © 2011 13
  • 14.
    The Data Weband Linked Data Berners-Lee (2006) outlined a set of 'rules' for publishing data on the Web in a way that all published data becomes part of a single global data space: 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things PalGov © 2011 14
  • 15.
    Properties of theWeb of Linked Data • Anyone can publish data to the Web of Linked Data • Entities are connected by links – creating a global data graph that spans data sources and enables the discovery of new data sources. • Data is self-describing – If an application encounters data represented using an unfamiliar vocabulary, the application can resolve the URIs that identify vocabulary terms in order to find their RDFS or OWL definition. • The Web of Data is open – meaning that applications can discover new data sources at run- time by following links. PalGov © 2011 15
  • 16.
    Realization: Linking OpenData Project PalGov © 2011 16
  • 17.
    Realization: Linking OpenData Project • Grassroots community effort to: – publish existing open license datasets as Linked Data on the Web – interlink things between different data sources • By September 2010 the cloud had grown to 25 billion RDF triples, interlinked by around 395 million RDF links. PalGov © 2011 17
  • 18.
    Linking Data • How are same entities described in different datasets linked? • AGAIN: By linking the Global Identifier, that is, the URI! • Let’s have a look at real examples from real datasets: <http://dbpedia.org/resource/Bethlehem> owl:sameAs <http://sws.geonames.org/284315/> • Linking the entity “Bethlehem” between the DBPedia dataset and the Geonames dataset in the Linking Open Data cloud. • This is done by linking the URIs of “Bethlehem” in both datasets using owl:sameAs. <http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007> • Linking the entity “Tim Berners-Lee” between the DBPedia dataset and the DBLP dataset . • This is done by linking the URIs of “Tim Berners-Lee” in both datasets using owl:sameAs. NOTE: The student is encouraged to visit the URIs specified above. PalGov © 2011 18
  • 19.
    Resources  http://dbpedia.org/resource/Bethlehem (Bethlehem URI in DBPedia) http://sws.geonames.org/284315/  (Bethlehem URI in Geonames) PalGov © 2011 19
  • 20.
    Applications Diagram Source: Christian Bizer • What Can I do with this? PalGov © 2011 20
  • 21.
    Let’s draw agraph of our example! v:Person “George Mousa” v:nickname “Geno” v:Address v:city “Nablus” “Palestine” “George Mousa” PalGov © 2011 21
  • 22.
    References • Christian Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. • W3C: www.w3c.org • Linking Open Data: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData PalGov © 2011 22