Introduction to the Semantic Web

4,682 views

Published on

Introduction to the semantic web, solutions for Linux, and Apache tools presented by Stefane Fermigier and Olivier Grisel.

Published in: Education, Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,682
On SlideShare
0
From Embeds
0
Number of Embeds
868
Actions
Shares
0
Downloads
141
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 32 nouveaux clients en 9 mois\n40 nouveaux clients au total\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Introduction to the Semantic Web

    1. 1. Introduction to the Semantic WebStefane Fermigier, Olivier Grisel - Nuxeo Solutions Linux - Paris - May 2011
    2. 2. Agenda• A pragmatic introduction to the Semantic Web• Experience report and demos from Nuxeo• Apache tools for Big Linked Data
    3. 3. 1. Introduction to the Semantic Web
    4. 4. Prelude
    5. 5. Source: Mills Davis, “Semantic Social Computing”, sept. 2007
    6. 6. History
    7. 7. Invented the web in 1989(yeah!)
    8. 8. Invented the web in 1989(yeah!)Invented the semanticweb in 1994 (duh?)
    9. 9. Historical perspective• From web 1.0: web of sites and pages, aka the World Wide Web• To web 2.0: web of people and of participation, aka the Social Web (Blogs, RSS, tags, Facebook, Wikipedia, etc.)• To web 3.0: web of data, of meaning and connected knowledge, aka the Semantic Web
    10. 10. Semantics & Ontologies
    11. 11. Some examples• FOAF: relationships between people (social network)• SIOC: relationships between websites, articles, blogs, comments• Rich Snippets: syndicate RDFa content for SEO by Google, Yahoo • good-relations: e-commerce (Ebay...) • rNews: metadata for news agencies (AFP, Reuters...)
    12. 12. How is it related to the Web?
    13. 13. The traditional Web• A principle: hypertext• A protocol: HTTP• An identification scheme: URNs/URIs• A language: HTML
    14. 14. “To a computer, then, the web is a flat, boring world devoid of meaning”Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
    15. 15. “This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them”Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
    16. 16. “Adding semantics to the web involves two things: allowing documents which have information inmachine-readable forms, and allowing links to be created with relationship values.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
    17. 17. “The Semantic Web is not a separate Web but anextension of the current one, in which information is given well-defined meaning, better enablingcomputers and people to work in cooperation.”Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
    18. 18. The traditional Web• A principle: hypertext• A protocol: HTTP• An identification scheme: URNs/URIs• A language: HTML
    19. 19. The semantic Web• A principle: hypertext• A protocol: HTTP• An identification scheme: URNs/URIs• A language: HTML RDF
    20. 20. The W3C “Layer Cake”
    21. 21. The W3C “Layer Cake” Alreadystandardized
    22. 22. URIs and the Web of Things• URIs (Unique Resource Identifiers) are used to identify things (also called entities) in the real world• For instance: people, places, events, companies, products, movies, etc.
    23. 23. The RDF modelRDF is used to describe relationshipsbetween objects, identified by their URIs PredicateSubject Object
    24. 24. ExampleSource: http://www.slideshare.net/AntidotNet/web-smantique-web-de-donnes- web-30-linked-data-quelques-repres-pour-sy-retrouver
    25. 25. RDF serializationAs XML:Others, ex: N3:
    26. 26. SPARQL• Query language for RDF databases• Several implementations • OSS: Apache Jena, Sesame, 4Store, Virtuoso, Mulgara, Redland, Open Anzo... • Proprietary: 5Store, AllegroGraph RDFStore, Stardog, Dydra, OWLIM...• More expressive than SQL, scalability is still an open question
    27. 27. SPARQL Sample
    28. 28. Where and howto find these data?
    29. 29. Solution 1: “Lift”• One can use HTML scrapping and natural language processing (NLP) technique to extract semantic information from existing content / sites• Generic solutions: OpenCalais, Zemanta, Apache Stanbol• Pro: no need to change existing content• Con: error prone, needs human checks
    30. 30. Example: DBPedia
    31. 31. Solution 2: export• RDFa and microformats are used to embed semantic information (expressed using the DRF model) into regular web pages• RDFa does it using existing (rel) and additional (about, property, typeof) attributes• Microformats only use usual HTML attributes (class)
    32. 32. Solution 3: reuse• Linked Online Data: (usually large) data repositories available on the web (for free or not), expressed using the RDF model• Interoperability between these repositories (their ontologies) must be defined
    33. 33. Linked Open Data in 2007“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
    34. 34. 2008“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
    35. 35. 2009“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
    36. 36. 2010“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
    37. 37. Good for Enterprise apps too!Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
    38. 38. Why now?
    39. 39. Key EnablersOpen Data and Linked Online DataAdvances in automatic content analysis(linguistics, image processing) and machinelearningClassical logic and classical AIComputing power (Moore’s law +MapReduce)
    40. 40. The technologies and data are available, Let’s put them to use!
    41. 41. 2. Nuxeo &Semantic ECM
    42. 42. Nuxeo: an open source ECM vendorOur Focus is Enterprise Content ManagementECM as a Platform for Content ApplicationsOpen Source as Efficient Development ModelModern architecture for 21st Century business “Lean, mobile, social, interoperable”A Social Marketplace in action Innovation driven by community of customers, partners, and our core developers
    43. 43. Nuxeo ECM - From Platform to Products Construction Media Government Life Sciences Business Solutions Correspondence Contracts Records Invoice Processing Management Management Management Case Structured Horizontal Document Digital Asset Document Content Management Packages Management Management Framework Server Aggregator Nuxeo Enterprise Platform Platform Complete set of components covering all aspects of ECM ContentInfrastructure Nuxeo Core Lightweight, scalable, embeddable content repository 45
    44. 44. Major Customers
    45. 45. Goals for Semantic ECM • Repurpose existing content better • Improve search and collaboration • Make information more contextual • Extract and use information from content • Leverage Open and Linked Data, contribute • Make ECM user’s content smarter! • > Gain efficiency, effectiveness and strategic positioning on the ECM market 47
    46. 46. Demo 48
    47. 47. IKS project • European project under the FP7, with 13 partners (6 SMEs) and a 8.5 MEUR budget • Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products • Started in Jan. 2009, will last until Dec. 2012 • First tangible result: Apache Stanbol, already integrated in a Nuxeo plugin  49
    48. 48. The Semantic Engine• From unstructured content to Knowledge• Language guessing• Topic classification (Business, Sports, Media, ...)• Named Entities extraction and linking• Relationships and properties extraction 50
    49. 49. 51
    50. 50. 52
    51. 51. 53
    52. 52. RESTful isBeautiful 54
    53. 53. = Semantic Engines (Apache OpenNLP) +Fast Linked Data local index (Apache Solr) + Semantic Rule Engine 55 (Apache Jena)
    54. 54. Apache Stanbol Engine 1 DBpedia Engine 2 21 Engine 3 Freebase Nuxeo DM 3 addon Geonames LDAP Local IT infrastructure (LAN) 56
    55. 55. 3. Apache tools for processingBig and/or Linked Data
    56. 56. Training statistical models for NER withWikipedia and DBpedia • Extract sentences with link positions in Wikipedia articles • DBPedia to the find type of the target entity (Person, Location, Organization) • Apache Pig scripts to compute the join + format the result as training files for OpenNLP • Apache OpenNLP to build and evaluate the models • Apache Hadoop for distributed processing • Apache Whirr for deployment and management on Amazon EC2 cluster 58
    57. 57. 59
    58. 58. 60
    59. 59. 61
    60. 60. 62
    61. 61. Training statistical models for topicclassification from Wikipedia and DBpedia • Filter category tree from DBpedia SKOS entries (~500k) • Pig scripts to compute the joins with articles abstracts for all the articles categorized in Wikipedia • Export as 2.8GB TSV file to be indexed in Apache Solr • Use Solr MoreLikeThisHandler to find the top 5 most related Wikipedia category for any kind of text • Apache Whirr & Hadoop for deployment and management on Amazon EC2 cluster 63
    62. 62. What’s next? • Integrate the R&D results into Stanbol / Nuxeo • Work on user interface / high level javascript toolkits for Linked Data editing • http://github.com/bergie/VIE based on backbone.js • Experiment / Integrate / Refine 64
    63. 63. Resources• http://iks-project.eu• http://stanbol.demo.nuxeo.com• http://incubator.apache.org/stanbol• http://blogs.nuxeo.com/dev• http://hadoop.apache.org/• http://incubator.apache.org/opennlp/• http://github.com/ogrisel/pignlproc 65

    ×