from the Semantic Web to the Web of Data
                                             ten years of linking up

     Lugano...
a short ToC

 story of a buzzword

                  concepts and ideas behind it

Linked Data: four rules, billions of op...
story of a buzzword

“To a computer, the Web is a
 flat, boring world devoid
        of meaning.”

                  “A new...
story of a buzzword
story of a buzzword
story of a buzzword
story of a buzzword




“Adding semantics to the web involves two things:
allowing documents which have information in
mac...
story of a buzzword


 typed objects and relationships


 machine-readable content metadata


 with shared semantics



Th...
concepts and ideas behind it
concepts and ideas behind it


How to represent the knowledge ?
concepts and ideas behind it


How to represent the knowledge ?

World’s academic communities dealt for
years with knowled...
concepts and ideas behind it

  SHOE[1]

       “SHOE is an extension to HTML which
       allows authors to annotate thei...
concepts and ideas behind it
John Sowa’s
Conceptual Graphs [2]

(...) they express meaning in a form that is logically
pre...
concepts and ideas behind it


declining such approaches in a

    unpredictable

                 decentralized

        ...
concepts and ideas behind it



Resource Description Framework RDF


 corner stone of the Semantic Web
 technology stack

...
concepts and ideas behind it


everything is univocally identifiable with
a Uniform Resource Identifier


   a web page, a p...
concepts and ideas behind it


relationships between things could be expressed
with a directed, labeled graph


          ...
concepts and ideas behind it


http://dpalmisano.myopenid.com




                                 http://sws.geonames.org...
concepts and ideas behind it


       http://dpalmisano.myopenid.com



                                          http://x...
concepts and ideas behind it


http://dpalmisano.myopenid.com



                                   http://xmlns.com/foaf/...
concepts and ideas behind it


   http://dpalmisano.myopenid.com



                                      http://xmlns.com...
concepts and ideas behind it

                                   http://xmlns.com/foaf/0.1/based_near

  http://dpalmisano...
concepts and ideas behind it

                                   http://xmlns.com/foaf/0.1/based_near

  http://dpalmisano...
concepts and ideas behind it

                                   http://xmlns.com/foaf/0.1/based_near

  http://dpalmisano...
concepts and ideas behind it

                                     http://xmlns.com/foaf/0.1/based_near

    http://dpalmi...
concepts and ideas behind it

                                 http://xmlns.com/foaf/0.1/based_near

http://dpalmisano.myo...
concepts and ideas behind it

an “Hello World” RDFSchema vocabulary

                                                rdf:t...
concepts and ideas behind it

RDFSchema entailment: inferring new statements


 http://helloworld.com/ontology/Person
    ...
concepts and ideas behind it

RDFSchema entailment: inferring new statements


 http://helloworld.com/ontology/Person     ...
concepts and ideas behind it


OWL allows to specify other axioms

      property cardinality restrictions
      classes d...
concepts and ideas behind it


describe everything...




                                         and more...
concepts and ideas behind it


RDFa: Bridging the traditional
Web with the Semantic Web

<div rel="dc:creator">

 <span ty...
concepts and ideas behind it


SPARQL: querying the Semantic Web

     based on graph pattern matching
    SPARQL Protocol...
concepts and ideas behind it


SPARQL: querying the Semantic Web


              SELECT ?person
              WHERE {
    ...
concepts and ideas behind it


SPARQL: querying the Semantic Web


            “In which university have
            studi...
concepts and ideas behind it

SELECT DISTINCT ?almaMater, count(?almaMater) as ?frequency
WHERE {
{ {?company a dbpedia-ow...
Linked Data: four rules, billions of opportunities

1.
 Use URIs to identify things.

2.
 Use HTTP URIs so that these thin...
Linked Data: four rules, billions of opportunities

DBpedia: Wikipedia as a database




 extract such structured info and...
Linked Data: four rules, billions of opportunities


let’s do it also for

                          Internet Movie Databa...
Linked Data: four rules, billions of opportunities
the server side of the triple: Java and the Semantic Web
the server side of the triple: Java and the Semantic Web



  RDF is the model

  SPARQL is the query language

  RDFa is ...
the server side of the triple: Java and the Semantic Web


Semantic Web general purposes open sources libraries

Jena[3] -...
the server side of the triple: Java and the Semantic Web

Jena: creating a model
// URI declarations
String familyUri = "h...
the server side of the triple: Java and the Semantic Web

Jena: querying the model
// Create a new query passing a String ...
the server side of the triple: Java and the Semantic Web


other valuable alternatives

Sesame[4] - a generic open source ...
the server side of the triple: Java and the Semantic Web


getting RDF data

Any23[6] - Anything to Triples
- a library
- ...
the server side of the triple: Java and the Semantic Web

Any23: rdf extraction

 /*1*/ Any23 runner = new Any23();
 /*2*/...
the server side of the triple: Java and the Semantic Web

Any23 deals with such documents that already
contains some RDF m...
the server side of the triple: Java and the Semantic Web


The world's largest maker of solar inverters announced Monday t...
the server side of the triple: Java and the Semantic Web


The world's largest maker of solar inverters announced Monday t...
the server side of the triple: Java and the Semantic Web


    exposed as HTTP Web services they
    provide responses in ...
the server side of the triple: Java and the Semantic Web

indexing RDF data

SIREn[10]: Efficient semi-structured Informati...
the server side of the triple: Java and the Semantic Web


storing RDF data
commonly known as “triple-stores”[11]

“let me...
the server side of the triple: Java and the Semantic Web

   JDBC and Virtuoso

boolean more = stmt.execute("sparql select...
the server side of the triple: Java and the Semantic Web


Empire[12]: JPA for RDF

- Object Triples Mapper
- 4Store, Sesa...
the server side of the triple: Java and the Semantic Web


crawl the Web
 extract RDF from RDFa and
 Microformats with Any...
successes, failures and hopes

Linked Data and RDFa seem to be the right
ways to trigger the “network effect” about
the us...
successes, failures and hopes


Twine.com



it has been the first mainstream consumer
application of Semantic Web.

raised...
successes, failures and hopes


Twine.com




   “I can truly say they present significant challenges
   both to developers...
successes, failures and hopes


GoodRelations: e-commerce on the Web of Data




huge impact on traditional search engines...
successes, failures and hopes


GoodRelations: e-commerce on the Web of Data




GoodRelations and RDFa could heavily impa...
/me



Technologist @ Fondazione Bruno Kessler
Web of Data research Unit

twitter.com/dpalmisano
davidepalmisano.wordpress...
a bunch of references
[1] http://www.cs.umd.edu/projects/plus/SHOE/
[2] http://www.jfsowa.com/cg/
[3] http://jena.sourcefo...
Upcoming SlideShare
Loading in...5
×

From the Semantic Web to the Web of Data: ten years of linking up

28,873

Published on

Presentation given at Lugano Java User Group. 30 March 2010

Published in: Technology
8 Comments
116 Likes
Statistics
Notes
No Downloads
Views
Total Views
28,873
On Slideshare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
1,773
Comments
8
Likes
116
Embeds 0
No embeds

No notes for slide

From the Semantic Web to the Web of Data: ten years of linking up

  1. 1. from the Semantic Web to the Web of Data ten years of linking up Lugano 30-03-2010 Davide Palmisano - Fondazione Bruno Kessler
  2. 2. a short ToC story of a buzzword concepts and ideas behind it Linked Data: four rules, billions of opportunities the server side of the triple: Java and the Semantic Web successes, failures and hopes
  3. 3. story of a buzzword “To a computer, the Web is a flat, boring world devoid of meaning.” “A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities” “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, ”
  4. 4. story of a buzzword
  5. 5. story of a buzzword
  6. 6. story of a buzzword
  7. 7. story of a buzzword “Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.”
  8. 8. story of a buzzword typed objects and relationships machine-readable content metadata with shared semantics The Web as a global giant decentralized database
  9. 9. concepts and ideas behind it
  10. 10. concepts and ideas behind it How to represent the knowledge ?
  11. 11. concepts and ideas behind it How to represent the knowledge ? World’s academic communities dealt for years with knowledge representation artificial intelligence, natural language processing, model management and many other research fields largely contributed some ancestors traced the way
  12. 12. concepts and ideas behind it SHOE[1] “SHOE is an extension to HTML which allows authors to annotate their web pages with machine-readable knowledge” <USE-ONTOLOGY ID="cs-dept-ontology" VERSION="1.0" PREFIX="cs" URL= "http://www.cs.umd.edu/projects/plus/SHOE/cs.html"> <CATEGORY NAME="cs.Professor" FOR="http://www.cs.umd.edu/users/hendler/"> <RELATION NAME="cs.member">     <ARG POS=1 VALUE="http://www.cs.umd.edu/projects/plus/">     <ARG POS=2 VALUE="http://www.cs.umd.edu/users/hendler/"> </RELATION> <RELATION NAME="cs.name">    <ARG POS=2 VALUE="Dr. James Hendler"> </RELATION>
  13. 13. concepts and ideas behind it John Sowa’s Conceptual Graphs [2] (...) they express meaning in a form that is logically precise, humanly readable, and computationally tractable (...) BOY AGNT WALK “boy walking”
  14. 14. concepts and ideas behind it declining such approaches in a unpredictable decentralized potentially incoherent environment as the Web is has been the goal of a standardization effort mainly lead by the W3C
  15. 15. concepts and ideas behind it Resource Description Framework RDF corner stone of the Semantic Web technology stack 1999, first publication directed and labeled graphs as data model
  16. 16. concepts and ideas behind it everything is univocally identifiable with a Uniform Resource Identifier a web page, a person, a book, an intangible thing http://dpalmisano.myopenid.com http://dbpedia.org/resource/Lugano http://dbtune.org/myspace/coldplay
  17. 17. concepts and ideas behind it relationships between things could be expressed with a directed, labeled graph where nodes could be resources or XMLSchema-typed values and relationships are identified also by URIs
  18. 18. concepts and ideas behind it http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
  19. 19. concepts and ideas behind it http://dpalmisano.myopenid.com http://xmlns.com/foaf/0.1/based_near http://sws.geonames.org/3165243/ it’s an RDF triple
  20. 20. concepts and ideas behind it http://dpalmisano.myopenid.com http://xmlns.com/foaf/0.1/based_near http://sws.geonames.org/3165243/ http://www.geonames.org/ontology#name Trento
  21. 21. concepts and ideas behind it http://dpalmisano.myopenid.com http://xmlns.com/foaf/0.1/based_near http://sws.geonames.org/3165243/ http://www.geonames.org/ ontology#population http://www.geonames.org/ontology#name 104946 Trento
  22. 22. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ XML serialization <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://dpalmisano.myopenid.com/"> <foaf:based_near rdf:resource="http://sws.geonames.org/ 3165243/"/> </rdf:Description> </rdf:RDF>
  23. 23. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ Turtle serialization @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . <http://dpalmisano.myopenid.com/> foaf:based_near <http:// sws.geonames.org/3165243/> .
  24. 24. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ N3 serialization <http://dpalmisano.myopenid.com/> <http://xmlns.com/foaf/0.1/ based_near> <http://sws.geonames.org/3165243/> .
  25. 25. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ JSON serialization { "http://dpalmisano.myopenid.com" : { "http://xmlns.com/foaf/0.1/based_near": [ { "type" : "uri" , "value" : "http://sws.geonames.org/3165243/" } ] } }
  26. 26. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ this triple represents a relationship between two resources but how we can represent the meaning of that relationship? defining vocabularies and ontologies: RDFSchema and OWL
  27. 27. concepts and ideas behind it an “Hello World” RDFSchema vocabulary rdf:type http://helloworld.com/ontology/Person http://helloworld.com/ontology/father rdf:type rdf:type rdf:type rdfs:Class rdfs:Property
  28. 28. concepts and ideas behind it RDFSchema entailment: inferring new statements http://helloworld.com/ontology/Person http://helloworld.com/resource/Michele rdf:type http://helloworld.com/ontology/father http://helloworld.com/resource/Davide
  29. 29. concepts and ideas behind it RDFSchema entailment: inferring new statements http://helloworld.com/ontology/Person rdf:type http://helloworld.com/resource/Michele rdf:type http://helloworld.com/ontology/father http://helloworld.com/resource/Davide
  30. 30. concepts and ideas behind it OWL allows to specify other axioms property cardinality restrictions classes disjunction property transitivity cardinality constraints but beware: more expressivity means more reasoning complexity interested in these topics? give a try to [3]
  31. 31. concepts and ideas behind it describe everything... and more...
  32. 32. concepts and ideas behind it RDFa: Bridging the traditional Web with the Semantic Web <div rel="dc:creator"> <span typeof="foaf:Person" about="http://foafbuilder.qdos.com/people/ dpalmisano.myopenid.com/foaf.rdf#me"> <a property="foaf:name" rel="foaf:homepage" href="http:// dpalmisano.myopenid.com/">Davide Palmisano</a> <a rel="foaf:workplaceHomepage" href="http://www.fbk.eu">Fondazione Bruno Kessler</a> </span> </div>
  33. 33. concepts and ideas behind it SPARQL: querying the Semantic Web based on graph pattern matching SPARQL Protocol and RDF Query Language 4 different operators: SELECT, DESCRIBE, ASK and CONSTRUCT
  34. 34. concepts and ideas behind it SPARQL: querying the Semantic Web SELECT ?person WHERE { ?person a foaf:Person. ?person ex:age ?age. FILTER(?age > 18) }
  35. 35. concepts and ideas behind it SPARQL: querying the Semantic Web “In which university have studied the founders of successful IT companies?” and order them by frequency...
  36. 36. concepts and ideas behind it SELECT DISTINCT ?almaMater, count(?almaMater) as ?frequency WHERE { { {?company a dbpedia-owl:Company} UNION { ?company a yago:InternetCompaniesOfTheUnitedStates } UNION  {?company a yago:CompaniesBasedInSiliconValley} UNION {?company a yago:CompaniesListedOnNASDAQ} } ?company dbpedia-owl:numberOfEmployees ?numberOfEmpl. FILTER (?numberOfEmpl > 0). OPTIONAL { ?company dbpedia-owl:keyPerson ?keyPerson } ?keyPerson dbpprop:almaMater ?almaMater. } ORDER BY DESC(?frequency)
  37. 37. Linked Data: four rules, billions of opportunities 1. Use URIs to identify things. 2. Use HTTP URIs so that these things can be referred to and looked up ("dereference") by people and user agents. 3. Provide useful information (i.e., a structured description - metadata) about the thing when its URI is dereferenced. 4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
  38. 38. Linked Data: four rules, billions of opportunities DBpedia: Wikipedia as a database extract such structured info and represent it with RDF
  39. 39. Linked Data: four rules, billions of opportunities let’s do it also for Internet Movie Database BBC /programmes CiteSeer GeoNames Musicbrainz CIA factbook and for all imaginable data- intensive traditional Web sites...
  40. 40. Linked Data: four rules, billions of opportunities
  41. 41. the server side of the triple: Java and the Semantic Web
  42. 42. the server side of the triple: Java and the Semantic Web RDF is the model SPARQL is the query language RDFa is our Trojan horse Linked Data is the paradigm how does it fit with Java?
  43. 43. the server side of the triple: Java and the Semantic Web Semantic Web general purposes open sources libraries Jena[3] - The Semantic Web Java framework - a RDF API - parsing and writing RDF in RDF/XML, N3 and N-Triples - an OWL API - In-memory storage and persistence layer - SPARQL query engine - Schemagen: Java classes from a RDFSchema vocabulary
  44. 44. the server side of the triple: Java and the Semantic Web Jena: creating a model // URI declarations String familyUri = "http://family/"; String relationshipUri = "http://purl.org/vocab/relationship/"; // Create an empty Model Model model = ModelFactory.createDefaultModel(); // Create a Resource for each family member, identified by their URI Resource adam = model.createResource(familyUri+"adam"); Resource beth = model.createResource(familyUri+"beth"); // Create properties for the different types of relationship to represent Property siblingOf = model.createProperty(relationshipUri,"siblingOf"); // Add properties to adam describing relationships to other family members adam.addProperty(siblingOf,beth);
  45. 45. the server side of the triple: Java and the Semantic Web Jena: querying the model // Create a new query passing a String containing the RDQL to execute Query query = new Query(queryString); // Set the model to run the query against query.setSource(model); // Use the query to create a query engine QueryEngine qe = new QueryEngine(query); // Use the query engine to execute the query QueryResults results = qe.exec(); while (results.hasNext()) { ResultBinding binding = (ResultBinding)results.next(); RDFNode definition = (RDFNode) binding.get("definition"); System.out.println(definition.toString()); Resource concept = (Resource)binding.get("concept"); List wordforms = concept.listObjectsOfProperty(wordForm); }
  46. 46. the server side of the triple: Java and the Semantic Web other valuable alternatives Sesame[4] - a generic open source Java framework for storage and querying of RDF data - easy, elegant and well documented jRDF[5] - an RDF library for Java - notable for IoC support (Spring 2)
  47. 47. the server side of the triple: Java and the Semantic Web getting RDF data Any23[6] - Anything to Triples - a library - a Web service - a CLI - allows to extract RDF from various sources: - Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License and XFN - RDF/XML, Turtle and Notation3 - RDF/XML, N3, Turtle and content-negotiated serialization supported
  48. 48. the server side of the triple: Java and the Semantic Web Any23: rdf extraction /*1*/ Any23 runner = new Any23(); /*2*/ runner.setHTTPUserAgent("test-user-agent"); /*3*/ HTTPClient httpClient = runner.getHTTPClient(); /*4*/ DocumentSource source = new HTTPDocumentSource(          httpClient,          "http://www.rentalinrome.com/semanticloft/semanticloft.htm"       ); /*5*/ ByteArrayOutputStream out = new ByteArrayOutputStream(); /*6*/ TripleHandler handler = new NTriplesWriter(out); /*7*/ runner.extract(source, handler); /*8*/ String n3 = out.toString("UTF-8");
  49. 49. the server side of the triple: Java and the Semantic Web Any23 deals with such documents that already contains some RDF metadata extracting the semantics from free-text and disambiguate terms with links to some Linked Data cloud it’s another story a pletora of different services - AlchemyAPI[7] - OpenCalais[8]
  50. 50. the server side of the triple: Java and the Semantic Web The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver. "We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer. The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year.
  51. 51. the server side of the triple: Java and the Semantic Web The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver. "We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer. The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year. http://dbpedia.org/resource/Frankfurt http://dbpedia.org/resource/Denver http://dbpedia.org/resource/Kassel
  52. 52. the server side of the triple: Java and the Semantic Web exposed as HTTP Web services they provide responses in XML, RDF/XML, RDFa or JSON Apache UIMA comes with two annotators for AlchemyAPI and OpenCalais[9]
  53. 53. the server side of the triple: Java and the Semantic Web indexing RDF data SIREn[10]: Efficient semi-structured Information Retrieval for Lucene - a plugin for Lucene - extends the Lucene query model - semi-structured search - structure aware full-text search - ranked semi-structured search: most relevant results returned first - sub-linear average response time - flexible semi-structured indexing
  54. 54. the server side of the triple: Java and the Semantic Web storing RDF data commonly known as “triple-stores”[11] “let me insert triples and make SPARQL queries above them” - OpenLink Virtuoso - 4Store - Redland - Jena or Sesame over a RDBMS
  55. 55. the server side of the triple: Java and the Semantic Web JDBC and Virtuoso boolean more = stmt.execute("sparql select * from <gr> where { ?x ?y ?z }"); ResultSetMetaData data = stmt.getResultSet().getMetaData(); while(more) { rs = stmt.getResultSet(); while(rs.next()) { ... } more = stmt.getMoreResults(); }
  56. 56. the server side of the triple: Java and the Semantic Web Empire[12]: JPA for RDF - Object Triples Mapper - 4Store, Sesame and Jena support - small annotation framework for tying Java beans to RDF -generate Java interfaces for classes described in an OWL ontology automatically based on domain, range constraints, cardinality restrictions - runtime implementation generation - IoC with Google Guice
  57. 57. the server side of the triple: Java and the Semantic Web crawl the Web extract RDF from RDFa and Microformats with Any23 index the data with SIREn store the data on HBase in one word: Sindice.com
  58. 58. successes, failures and hopes Linked Data and RDFa seem to be the right ways to trigger the “network effect” about the usage of Semantic Web technologies data.gov.uk
  59. 59. successes, failures and hopes Twine.com it has been the first mainstream consumer application of Semantic Web. raised nearly $24mm of venture capital over 2 rounds gaining users rapidly - faster than Twitter did in it’s early years Twine.com is going to be acquired by Evri.com
  60. 60. successes, failures and hopes Twine.com “I can truly say they present significant challenges both to developers and to end-users.These challenges all stem from one underlying problem: Data storage.” - Nova Spivack CEO
  61. 61. successes, failures and hopes GoodRelations: e-commerce on the Web of Data huge impact on traditional search engines ranking enabling cross-site product and offerings retrieval Google rich snippets
  62. 62. successes, failures and hopes GoodRelations: e-commerce on the Web of Data GoodRelations and RDFa could heavily impact on traditional SEO techniques it may be a really powerful traction for an unleashed usage of RDFa and semi-structured data on the Web
  63. 63. /me Technologist @ Fondazione Bruno Kessler Web of Data research Unit twitter.com/dpalmisano davidepalmisano.wordpress.com wed.fbk.eu
  64. 64. a bunch of references [1] http://www.cs.umd.edu/projects/plus/SHOE/ [2] http://www.jfsowa.com/cg/ [3] http://jena.sourceforge.net/ [4] http://www.openrdf.org/ [5] http://jrdf.sourceforge.net/ [6] http://developers.any23.org/ [7] http://alchemyapi.com [8] http://opencalais.com [9] http://incubator.apache.org/uima/ [10] http://siren.sindice.com/ [11] http://en.wikipedia.org/wiki/Triplestore/ [12] http://clarkparsia.com/weblog/2010/02/03/ empire-0-6/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×