• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to the Semantic Web
 

Introduction to the Semantic Web

on

  • 4,187 views

Introduction to the semantic web, solutions for Linux, and Apache tools presented by Stefane Fermigier and Olivier Grisel.

Introduction to the semantic web, solutions for Linux, and Apache tools presented by Stefane Fermigier and Olivier Grisel.

Statistics

Views

Total Views
4,187
Views on SlideShare
3,342
Embed Views
845

Actions

Likes
8
Downloads
105
Comments
0

7 Embeds 845

http://giubot.wordpress.com 737
http://www.nuxeo.com 101
http://giuseppebottasini.wordpress.com 3
https://twitter.com 1
http://paper.li 1
http://staging.nuxeo.com 1
http://translate.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 32 nouveaux clients en 9 mois\n40 nouveaux clients au total\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Introduction to the Semantic Web Introduction to the Semantic Web Presentation Transcript

  • Introduction to the Semantic WebStefane Fermigier, Olivier Grisel - Nuxeo Solutions Linux - Paris - May 2011
  • Agenda• A pragmatic introduction to the Semantic Web• Experience report and demos from Nuxeo• Apache tools for Big Linked Data
  • 1. Introduction to the Semantic Web
  • Prelude
  • Source: Mills Davis, “Semantic Social Computing”, sept. 2007
  • History
  • Invented the web in 1989(yeah!)
  • Invented the web in 1989(yeah!)Invented the semanticweb in 1994 (duh?)
  • Historical perspective• From web 1.0: web of sites and pages, aka the World Wide Web• To web 2.0: web of people and of participation, aka the Social Web (Blogs, RSS, tags, Facebook, Wikipedia, etc.)• To web 3.0: web of data, of meaning and connected knowledge, aka the Semantic Web
  • Semantics & Ontologies
  • Some examples• FOAF: relationships between people (social network)• SIOC: relationships between websites, articles, blogs, comments• Rich Snippets: syndicate RDFa content for SEO by Google, Yahoo • good-relations: e-commerce (Ebay...) • rNews: metadata for news agencies (AFP, Reuters...)
  • How is it related to the Web?
  • The traditional Web• A principle: hypertext• A protocol: HTTP• An identification scheme: URNs/URIs• A language: HTML
  • “To a computer, then, the web is a flat, boring world devoid of meaning”Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • “This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them”Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • “Adding semantics to the web involves two things: allowing documents which have information inmachine-readable forms, and allowing links to be created with relationship values.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • “The Semantic Web is not a separate Web but anextension of the current one, in which information is given well-defined meaning, better enablingcomputers and people to work in cooperation.”Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • The traditional Web• A principle: hypertext• A protocol: HTTP• An identification scheme: URNs/URIs• A language: HTML
  • The semantic Web• A principle: hypertext• A protocol: HTTP• An identification scheme: URNs/URIs• A language: HTML RDF
  • The W3C “Layer Cake”
  • The W3C “Layer Cake” Alreadystandardized
  • URIs and the Web of Things• URIs (Unique Resource Identifiers) are used to identify things (also called entities) in the real world• For instance: people, places, events, companies, products, movies, etc.
  • The RDF modelRDF is used to describe relationshipsbetween objects, identified by their URIs PredicateSubject Object
  • ExampleSource: http://www.slideshare.net/AntidotNet/web-smantique-web-de-donnes- web-30-linked-data-quelques-repres-pour-sy-retrouver
  • RDF serializationAs XML:Others, ex: N3:
  • SPARQL• Query language for RDF databases• Several implementations • OSS: Apache Jena, Sesame, 4Store, Virtuoso, Mulgara, Redland, Open Anzo... • Proprietary: 5Store, AllegroGraph RDFStore, Stardog, Dydra, OWLIM...• More expressive than SQL, scalability is still an open question
  • SPARQL Sample
  • Where and howto find these data?
  • Solution 1: “Lift”• One can use HTML scrapping and natural language processing (NLP) technique to extract semantic information from existing content / sites• Generic solutions: OpenCalais, Zemanta, Apache Stanbol• Pro: no need to change existing content• Con: error prone, needs human checks
  • Example: DBPedia
  • Solution 2: export• RDFa and microformats are used to embed semantic information (expressed using the DRF model) into regular web pages• RDFa does it using existing (rel) and additional (about, property, typeof) attributes• Microformats only use usual HTML attributes (class)
  • Solution 3: reuse• Linked Online Data: (usually large) data repositories available on the web (for free or not), expressed using the RDF model• Interoperability between these repositories (their ontologies) must be defined
  • Linked Open Data in 2007“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 2008“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 2009“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 2010“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • Good for Enterprise apps too!Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
  • Why now?
  • Key EnablersOpen Data and Linked Online DataAdvances in automatic content analysis(linguistics, image processing) and machinelearningClassical logic and classical AIComputing power (Moore’s law +MapReduce)
  • The technologies and data are available, Let’s put them to use!
  • 2. Nuxeo &Semantic ECM
  • Nuxeo: an open source ECM vendorOur Focus is Enterprise Content ManagementECM as a Platform for Content ApplicationsOpen Source as Efficient Development ModelModern architecture for 21st Century business “Lean, mobile, social, interoperable”A Social Marketplace in action Innovation driven by community of customers, partners, and our core developers
  • Nuxeo ECM - From Platform to Products Construction Media Government Life Sciences Business Solutions Correspondence Contracts Records Invoice Processing Management Management Management Case Structured Horizontal Document Digital Asset Document Content Management Packages Management Management Framework Server Aggregator Nuxeo Enterprise Platform Platform Complete set of components covering all aspects of ECM ContentInfrastructure Nuxeo Core Lightweight, scalable, embeddable content repository 45
  • Major Customers
  • Goals for Semantic ECM • Repurpose existing content better • Improve search and collaboration • Make information more contextual • Extract and use information from content • Leverage Open and Linked Data, contribute • Make ECM user’s content smarter! • > Gain efficiency, effectiveness and strategic positioning on the ECM market 47
  • Demo 48
  • IKS project • European project under the FP7, with 13 partners (6 SMEs) and a 8.5 MEUR budget • Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products • Started in Jan. 2009, will last until Dec. 2012 • First tangible result: Apache Stanbol, already integrated in a Nuxeo plugin  49
  • The Semantic Engine• From unstructured content to Knowledge• Language guessing• Topic classification (Business, Sports, Media, ...)• Named Entities extraction and linking• Relationships and properties extraction 50
  • 51
  • 52
  • 53
  • RESTful isBeautiful 54
  • = Semantic Engines (Apache OpenNLP) +Fast Linked Data local index (Apache Solr) + Semantic Rule Engine 55 (Apache Jena)
  • Apache Stanbol Engine 1 DBpedia Engine 2 21 Engine 3 Freebase Nuxeo DM 3 addon Geonames LDAP Local IT infrastructure (LAN) 56
  • 3. Apache tools for processingBig and/or Linked Data
  • Training statistical models for NER withWikipedia and DBpedia • Extract sentences with link positions in Wikipedia articles • DBPedia to the find type of the target entity (Person, Location, Organization) • Apache Pig scripts to compute the join + format the result as training files for OpenNLP • Apache OpenNLP to build and evaluate the models • Apache Hadoop for distributed processing • Apache Whirr for deployment and management on Amazon EC2 cluster 58
  • 59
  • 60
  • 61
  • 62
  • Training statistical models for topicclassification from Wikipedia and DBpedia • Filter category tree from DBpedia SKOS entries (~500k) • Pig scripts to compute the joins with articles abstracts for all the articles categorized in Wikipedia • Export as 2.8GB TSV file to be indexed in Apache Solr • Use Solr MoreLikeThisHandler to find the top 5 most related Wikipedia category for any kind of text • Apache Whirr & Hadoop for deployment and management on Amazon EC2 cluster 63
  • What’s next? • Integrate the R&D results into Stanbol / Nuxeo • Work on user interface / high level javascript toolkits for Linked Data editing • http://github.com/bergie/VIE based on backbone.js • Experiment / Integrate / Refine 64
  • Resources• http://iks-project.eu• http://stanbol.demo.nuxeo.com• http://incubator.apache.org/stanbol• http://blogs.nuxeo.com/dev• http://hadoop.apache.org/• http://incubator.apache.org/opennlp/• http://github.com/ogrisel/pignlproc 65