From the Semantic Web to the Web of Data: ten years of linking up
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

From the Semantic Web to the Web of Data: ten years of linking up

on

  • 31,393 views

Presentation given at Lugano Java User Group. 30 March 2010

Presentation given at Lugano Java User Group. 30 March 2010

Statistics

Views

Total Views
31,393
Views on SlideShare
25,392
Embed Views
6,001

Actions

Likes
114
Downloads
1,765
Comments
8

88 Embeds 6,001

http://www.ticbeat.com 1346
http://www.readwriteweb.com 1265
http://www.seoskeptic.com 1215
http://www.scoop.it 427
http://readwrite.com 281
http://eduardoarea.blogspot.com 265
http://aurametrix.blogspot.com 138
http://www.tugrul.name 110
http://storify.com 99
http://www.techgig.com 93
http://www.christianjunius.com 79
http://www.slideshare.net 77
http://eduardoarea.blogspot.com.es 61
http://digitalstrategyblog.com 60
http://www.readwriteweb.es 57
http://paper.li 50
http://eduardoarea.blogspot.mx 44
http://www.eduardoarea.blogspot.com 37
http://interamericandc.wordpress.com 28
http://www.eduardoarea.blogspot.mx 22
http://www.cinesunt.info 18
http://translate.googleusercontent.com 17
http://localhost 15
http://legoman.tistory.com 14
http://pintiniblog.wordpress.com 13
http://socialmediafuehrerschein.de 13
http://www.eduardoarea.blogspot.com.es 12
http://letthedataflow.ca 10
https://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 9
http://publicate.it 8
http://www.backlinks.es 8
http://aurametrix.blogspot.in 7
http://aurametrix.blogspot.co.uk 5
http://eduardoarea.blogspot.de 4
http://www.eduardoarea.blogspot.com.ar 4
http://linkeddata.uriburner.com 4
http://eduardoarea.blogspot.com.ar 4
https://192.168.0.8 4
http://martijn.profec.nl 4
http://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 3
http://swik.net 3
http://aurametrix.blogspot.com.au 3
http://www.iweb34.com 3
http://stepstagesimon.wordpress.com 3
http://www.urenio.org 3
https://www.linkedin.com 2
http://aurametrix.blogspot.se 2
http://aurametrix.blogspot.co.at 2
http://www.foldier.com 2
http://www.pearltrees.com 2
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

15 of 8 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

From the Semantic Web to the Web of Data: ten years of linking up Presentation Transcript

  • 1. from the Semantic Web to the Web of Data ten years of linking up Lugano 30-03-2010 Davide Palmisano - Fondazione Bruno Kessler
  • 2. a short ToC story of a buzzword concepts and ideas behind it Linked Data: four rules, billions of opportunities the server side of the triple: Java and the Semantic Web successes, failures and hopes
  • 3. story of a buzzword “To a computer, the Web is a flat, boring world devoid of meaning.” “A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities” “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, ”
  • 4. story of a buzzword
  • 5. story of a buzzword
  • 6. story of a buzzword
  • 7. story of a buzzword “Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.”
  • 8. story of a buzzword typed objects and relationships machine-readable content metadata with shared semantics The Web as a global giant decentralized database
  • 9. concepts and ideas behind it
  • 10. concepts and ideas behind it How to represent the knowledge ?
  • 11. concepts and ideas behind it How to represent the knowledge ? World’s academic communities dealt for years with knowledge representation artificial intelligence, natural language processing, model management and many other research fields largely contributed some ancestors traced the way
  • 12. concepts and ideas behind it SHOE[1] “SHOE is an extension to HTML which allows authors to annotate their web pages with machine-readable knowledge” <USE-ONTOLOGY ID="cs-dept-ontology" VERSION="1.0" PREFIX="cs" URL= "http://www.cs.umd.edu/projects/plus/SHOE/cs.html"> <CATEGORY NAME="cs.Professor" FOR="http://www.cs.umd.edu/users/hendler/"> <RELATION NAME="cs.member">     <ARG POS=1 VALUE="http://www.cs.umd.edu/projects/plus/">     <ARG POS=2 VALUE="http://www.cs.umd.edu/users/hendler/"> </RELATION> <RELATION NAME="cs.name">    <ARG POS=2 VALUE="Dr. James Hendler"> </RELATION>
  • 13. concepts and ideas behind it John Sowa’s Conceptual Graphs [2] (...) they express meaning in a form that is logically precise, humanly readable, and computationally tractable (...) BOY AGNT WALK “boy walking”
  • 14. concepts and ideas behind it declining such approaches in a unpredictable decentralized potentially incoherent environment as the Web is has been the goal of a standardization effort mainly lead by the W3C
  • 15. concepts and ideas behind it Resource Description Framework RDF corner stone of the Semantic Web technology stack 1999, first publication directed and labeled graphs as data model
  • 16. concepts and ideas behind it everything is univocally identifiable with a Uniform Resource Identifier a web page, a person, a book, an intangible thing http://dpalmisano.myopenid.com http://dbpedia.org/resource/Lugano http://dbtune.org/myspace/coldplay
  • 17. concepts and ideas behind it relationships between things could be expressed with a directed, labeled graph where nodes could be resources or XMLSchema-typed values and relationships are identified also by URIs
  • 18. concepts and ideas behind it http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
  • 19. concepts and ideas behind it http://dpalmisano.myopenid.com http://xmlns.com/foaf/0.1/based_near http://sws.geonames.org/3165243/ it’s an RDF triple
  • 20. concepts and ideas behind it http://dpalmisano.myopenid.com http://xmlns.com/foaf/0.1/based_near http://sws.geonames.org/3165243/ http://www.geonames.org/ontology#name Trento
  • 21. concepts and ideas behind it http://dpalmisano.myopenid.com http://xmlns.com/foaf/0.1/based_near http://sws.geonames.org/3165243/ http://www.geonames.org/ ontology#population http://www.geonames.org/ontology#name 104946 Trento
  • 22. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ XML serialization <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://dpalmisano.myopenid.com/"> <foaf:based_near rdf:resource="http://sws.geonames.org/ 3165243/"/> </rdf:Description> </rdf:RDF>
  • 23. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ Turtle serialization @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . <http://dpalmisano.myopenid.com/> foaf:based_near <http:// sws.geonames.org/3165243/> .
  • 24. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ N3 serialization <http://dpalmisano.myopenid.com/> <http://xmlns.com/foaf/0.1/ based_near> <http://sws.geonames.org/3165243/> .
  • 25. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ JSON serialization { "http://dpalmisano.myopenid.com" : { "http://xmlns.com/foaf/0.1/based_near": [ { "type" : "uri" , "value" : "http://sws.geonames.org/3165243/" } ] } }
  • 26. concepts and ideas behind it http://xmlns.com/foaf/0.1/based_near http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/ this triple represents a relationship between two resources but how we can represent the meaning of that relationship? defining vocabularies and ontologies: RDFSchema and OWL
  • 27. concepts and ideas behind it an “Hello World” RDFSchema vocabulary rdf:type http://helloworld.com/ontology/Person http://helloworld.com/ontology/father rdf:type rdf:type rdf:type rdfs:Class rdfs:Property
  • 28. concepts and ideas behind it RDFSchema entailment: inferring new statements http://helloworld.com/ontology/Person http://helloworld.com/resource/Michele rdf:type http://helloworld.com/ontology/father http://helloworld.com/resource/Davide
  • 29. concepts and ideas behind it RDFSchema entailment: inferring new statements http://helloworld.com/ontology/Person rdf:type http://helloworld.com/resource/Michele rdf:type http://helloworld.com/ontology/father http://helloworld.com/resource/Davide
  • 30. concepts and ideas behind it OWL allows to specify other axioms property cardinality restrictions classes disjunction property transitivity cardinality constraints but beware: more expressivity means more reasoning complexity interested in these topics? give a try to [3]
  • 31. concepts and ideas behind it describe everything... and more...
  • 32. concepts and ideas behind it RDFa: Bridging the traditional Web with the Semantic Web <div rel="dc:creator"> <span typeof="foaf:Person" about="http://foafbuilder.qdos.com/people/ dpalmisano.myopenid.com/foaf.rdf#me"> <a property="foaf:name" rel="foaf:homepage" href="http:// dpalmisano.myopenid.com/">Davide Palmisano</a> <a rel="foaf:workplaceHomepage" href="http://www.fbk.eu">Fondazione Bruno Kessler</a> </span> </div>
  • 33. concepts and ideas behind it SPARQL: querying the Semantic Web based on graph pattern matching SPARQL Protocol and RDF Query Language 4 different operators: SELECT, DESCRIBE, ASK and CONSTRUCT
  • 34. concepts and ideas behind it SPARQL: querying the Semantic Web SELECT ?person WHERE { ?person a foaf:Person. ?person ex:age ?age. FILTER(?age > 18) }
  • 35. concepts and ideas behind it SPARQL: querying the Semantic Web “In which university have studied the founders of successful IT companies?” and order them by frequency...
  • 36. concepts and ideas behind it SELECT DISTINCT ?almaMater, count(?almaMater) as ?frequency WHERE { { {?company a dbpedia-owl:Company} UNION { ?company a yago:InternetCompaniesOfTheUnitedStates } UNION  {?company a yago:CompaniesBasedInSiliconValley} UNION {?company a yago:CompaniesListedOnNASDAQ} } ?company dbpedia-owl:numberOfEmployees ?numberOfEmpl. FILTER (?numberOfEmpl > 0). OPTIONAL { ?company dbpedia-owl:keyPerson ?keyPerson } ?keyPerson dbpprop:almaMater ?almaMater. } ORDER BY DESC(?frequency)
  • 37. Linked Data: four rules, billions of opportunities 1. Use URIs to identify things. 2. Use HTTP URIs so that these things can be referred to and looked up ("dereference") by people and user agents. 3. Provide useful information (i.e., a structured description - metadata) about the thing when its URI is dereferenced. 4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
  • 38. Linked Data: four rules, billions of opportunities DBpedia: Wikipedia as a database extract such structured info and represent it with RDF
  • 39. Linked Data: four rules, billions of opportunities let’s do it also for Internet Movie Database BBC /programmes CiteSeer GeoNames Musicbrainz CIA factbook and for all imaginable data- intensive traditional Web sites...
  • 40. Linked Data: four rules, billions of opportunities
  • 41. the server side of the triple: Java and the Semantic Web
  • 42. the server side of the triple: Java and the Semantic Web RDF is the model SPARQL is the query language RDFa is our Trojan horse Linked Data is the paradigm how does it fit with Java?
  • 43. the server side of the triple: Java and the Semantic Web Semantic Web general purposes open sources libraries Jena[3] - The Semantic Web Java framework - a RDF API - parsing and writing RDF in RDF/XML, N3 and N-Triples - an OWL API - In-memory storage and persistence layer - SPARQL query engine - Schemagen: Java classes from a RDFSchema vocabulary
  • 44. the server side of the triple: Java and the Semantic Web Jena: creating a model // URI declarations String familyUri = "http://family/"; String relationshipUri = "http://purl.org/vocab/relationship/"; // Create an empty Model Model model = ModelFactory.createDefaultModel(); // Create a Resource for each family member, identified by their URI Resource adam = model.createResource(familyUri+"adam"); Resource beth = model.createResource(familyUri+"beth"); // Create properties for the different types of relationship to represent Property siblingOf = model.createProperty(relationshipUri,"siblingOf"); // Add properties to adam describing relationships to other family members adam.addProperty(siblingOf,beth);
  • 45. the server side of the triple: Java and the Semantic Web Jena: querying the model // Create a new query passing a String containing the RDQL to execute Query query = new Query(queryString); // Set the model to run the query against query.setSource(model); // Use the query to create a query engine QueryEngine qe = new QueryEngine(query); // Use the query engine to execute the query QueryResults results = qe.exec(); while (results.hasNext()) { ResultBinding binding = (ResultBinding)results.next(); RDFNode definition = (RDFNode) binding.get("definition"); System.out.println(definition.toString()); Resource concept = (Resource)binding.get("concept"); List wordforms = concept.listObjectsOfProperty(wordForm); }
  • 46. the server side of the triple: Java and the Semantic Web other valuable alternatives Sesame[4] - a generic open source Java framework for storage and querying of RDF data - easy, elegant and well documented jRDF[5] - an RDF library for Java - notable for IoC support (Spring 2)
  • 47. the server side of the triple: Java and the Semantic Web getting RDF data Any23[6] - Anything to Triples - a library - a Web service - a CLI - allows to extract RDF from various sources: - Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License and XFN - RDF/XML, Turtle and Notation3 - RDF/XML, N3, Turtle and content-negotiated serialization supported
  • 48. the server side of the triple: Java and the Semantic Web Any23: rdf extraction /*1*/ Any23 runner = new Any23(); /*2*/ runner.setHTTPUserAgent("test-user-agent"); /*3*/ HTTPClient httpClient = runner.getHTTPClient(); /*4*/ DocumentSource source = new HTTPDocumentSource(          httpClient,          "http://www.rentalinrome.com/semanticloft/semanticloft.htm"       ); /*5*/ ByteArrayOutputStream out = new ByteArrayOutputStream(); /*6*/ TripleHandler handler = new NTriplesWriter(out); /*7*/ runner.extract(source, handler); /*8*/ String n3 = out.toString("UTF-8");
  • 49. the server side of the triple: Java and the Semantic Web Any23 deals with such documents that already contains some RDF metadata extracting the semantics from free-text and disambiguate terms with links to some Linked Data cloud it’s another story a pletora of different services - AlchemyAPI[7] - OpenCalais[8]
  • 50. the server side of the triple: Java and the Semantic Web The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver. "We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer. The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year.
  • 51. the server side of the triple: Java and the Semantic Web The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver. "We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer. The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year. http://dbpedia.org/resource/Frankfurt http://dbpedia.org/resource/Denver http://dbpedia.org/resource/Kassel
  • 52. the server side of the triple: Java and the Semantic Web exposed as HTTP Web services they provide responses in XML, RDF/XML, RDFa or JSON Apache UIMA comes with two annotators for AlchemyAPI and OpenCalais[9]
  • 53. the server side of the triple: Java and the Semantic Web indexing RDF data SIREn[10]: Efficient semi-structured Information Retrieval for Lucene - a plugin for Lucene - extends the Lucene query model - semi-structured search - structure aware full-text search - ranked semi-structured search: most relevant results returned first - sub-linear average response time - flexible semi-structured indexing
  • 54. the server side of the triple: Java and the Semantic Web storing RDF data commonly known as “triple-stores”[11] “let me insert triples and make SPARQL queries above them” - OpenLink Virtuoso - 4Store - Redland - Jena or Sesame over a RDBMS
  • 55. the server side of the triple: Java and the Semantic Web JDBC and Virtuoso boolean more = stmt.execute("sparql select * from <gr> where { ?x ?y ?z }"); ResultSetMetaData data = stmt.getResultSet().getMetaData(); while(more) { rs = stmt.getResultSet(); while(rs.next()) { ... } more = stmt.getMoreResults(); }
  • 56. the server side of the triple: Java and the Semantic Web Empire[12]: JPA for RDF - Object Triples Mapper - 4Store, Sesame and Jena support - small annotation framework for tying Java beans to RDF -generate Java interfaces for classes described in an OWL ontology automatically based on domain, range constraints, cardinality restrictions - runtime implementation generation - IoC with Google Guice
  • 57. the server side of the triple: Java and the Semantic Web crawl the Web extract RDF from RDFa and Microformats with Any23 index the data with SIREn store the data on HBase in one word: Sindice.com
  • 58. successes, failures and hopes Linked Data and RDFa seem to be the right ways to trigger the “network effect” about the usage of Semantic Web technologies data.gov.uk
  • 59. successes, failures and hopes Twine.com it has been the first mainstream consumer application of Semantic Web. raised nearly $24mm of venture capital over 2 rounds gaining users rapidly - faster than Twitter did in it’s early years Twine.com is going to be acquired by Evri.com
  • 60. successes, failures and hopes Twine.com “I can truly say they present significant challenges both to developers and to end-users.These challenges all stem from one underlying problem: Data storage.” - Nova Spivack CEO
  • 61. successes, failures and hopes GoodRelations: e-commerce on the Web of Data huge impact on traditional search engines ranking enabling cross-site product and offerings retrieval Google rich snippets
  • 62. successes, failures and hopes GoodRelations: e-commerce on the Web of Data GoodRelations and RDFa could heavily impact on traditional SEO techniques it may be a really powerful traction for an unleashed usage of RDFa and semi-structured data on the Web
  • 63. /me Technologist @ Fondazione Bruno Kessler Web of Data research Unit twitter.com/dpalmisano davidepalmisano.wordpress.com wed.fbk.eu
  • 64. a bunch of references [1] http://www.cs.umd.edu/projects/plus/SHOE/ [2] http://www.jfsowa.com/cg/ [3] http://jena.sourceforge.net/ [4] http://www.openrdf.org/ [5] http://jrdf.sourceforge.net/ [6] http://developers.any23.org/ [7] http://alchemyapi.com [8] http://opencalais.com [9] http://incubator.apache.org/uima/ [10] http://siren.sindice.com/ [11] http://en.wikipedia.org/wiki/Triplestore/ [12] http://clarkparsia.com/weblog/2010/02/03/ empire-0-6/