Apache Stanbol 
and the Web of Data - ApacheCon 2011
Upcoming SlideShare
Loading in...5
×
 

Apache Stanbol 
and the Web of Data - ApacheCon 2011

on

  • 4,345 views

Presentation on Apache Stanbol (incubating) and related projects given by Olivier Grisel durin ApacheCon 2011. ...

Presentation on Apache Stanbol (incubating) and related projects given by Olivier Grisel durin ApacheCon 2011.

More information:
- http://incubator.apache.org/stanbol/
- http://www.iks-project.eu

Statistics

Views

Total Views
4,345
Views on SlideShare
3,317
Embed Views
1,028

Actions

Likes
5
Downloads
69
Comments
0

7 Embeds 1,028

http://groupe09.ebdetu.net 902
http://www.nuxeo.com 111
http://www.redditmedia.com 5
https://twitter.com 5
http://a0.twimg.com 3
http://www.groupe09.ebdetu.net 1
http://www.mefeedia.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Apache Stanbol 
and the Web of Data - ApacheCon 2011 Apache Stanbol 
and the Web of Data - ApacheCon 2011 Presentation Transcript

  • Apache Stanbol (Incubating) and the Web of Data Olivier Grisel, Nuxeo ogrisel@apache.org, 2011-11-11 11/7/11
  • My Background 11/7/11 Olivier Grisel - R&D Engineer nuxeo Open Source ECM    European project: IKS Stuff I do: Machine Learning Natural Language Processing  All things data
  • Agenda 11/7/11 The Web of Data: what, why, how? CMS integration demo Semantic Components in Stanbol Building models for Stanbol
  • The Web of Data What, Why, How?
  •  
  • 11/7/11 “ To a computer, then, the web is a  flat ,  boring  world devoid of  meaning ” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 11/7/11 “ This is a pity, as in fact  documents  on the web describe  real objects  and imaginary  concepts , and give particular  relationships  between them” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 11/7/11 “ The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning , better enabling computers and people to work in cooperation.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 11/7/11 “ Adding semantics to the web involves two things: allowing  documents  which have information in  machine-readable  forms, and allowing  links  to be created with  relationship values .” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • The Web of Data – What? 11/7/11
      • Shared description of the real world
        • Structured with vocabularies
        • Decentralized
        • Scoped by namespaces
        • Linked
  • The Web of Data – Why? 11/7/11
      • Strings are ambiguous
        • New York / The Big Apple / NYC
        • Washington (Person, State, City, Sports Team...)
      • Structured context helps humans 
        • Who is this guy?
        • Where is this city?
      • Conceptual frame helps machines
        • Explicit user intent decoding
        • Smarter indexing / search?
  • Decoding User Intents 11/7/11
  • Decoding User Intents 11/7/11 Next Generation User Interfaces Siri - conversational interface IBM DeepQA: Watson for Heath Care Tell Google about your stuff Publish structured prediction of your products "3 bedrooms flat near Montmartre" Useful for non-public data as well Intranet query: "ApacheCon slides" Intranet query: "Xerox invoices" Intranet query: "Xerox salesperson email"
  • The Web of Data - How? 11/7/11 RDF / TripeStores / Sparql Graph stores with dynamic schemas Strong interoperability JSON-LD Upgrade your JSON with scoped vocabularies Web / Mobile / JS developer friendly RDFa + schema.org & rNews Publish annotation in structured markup Vocabulary understood by Search Engines
  • HTML example 11/7/11 <p>   My name is Manu Sporny and you can give me a ring via   1-800-555-0155.     <img src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
  • RDFa example 11/7/11 <p vocab=&quot;http://schema.org/&quot;     prefix=&quot;foaf: http://xmlns.com/foaf/0.1/&quot;    about=&quot;#manu&quot; typeof=&quot;Person&quot; >   My name is <span property=&quot;name&quot; >Manu Sporny</span>   and you can give me a ring via   <span property=&quot;telephone&quot; >1-800-555-0155</span>.     <img rel=&quot;image&quot;     src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a rel=&quot;foaf:weblog&quot;     href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
  • JSON-LD example 11/7/11
  • 11/7/11 2007 2008 2009 2010
  • 2011
  • Bridging the Web of Data and my CMS
  •  
  • Apache Stanbol 11/7/11 Enhancer Text analysis with Apache OpenNLP  / Tika EntityHub / ContentHub Linked Data Indexing with Apache Solr Graph Storage with Apache Clerezza / Jena Reasoner / Rules Inference with Apache Jena & OWLApi  Components / HTTP Services OSGi with Apache Felix / JAX-RS with Jersey
  •  
  •  
  •  
  •  
  • RESTful is Beautiful
  • Minimalist HTTP Client 11/7/11 curl -X POST -H &quot;Accept: text/turtle&quot; -H &quot;Content-type: text/plain&quot; --data &quot;John Smith was born in London.&quot; http://stanbol.demo.nuxeo.com/engines
  •  
  •  
  • Local IT infrastructure (LAN) Nuxeo DM addon 1 1 Apache Stanbol 1 2 1 Engine 1 Engine 2 Engine 3 3 DBpedia Freebase Geonames LDAP
  • Stanbol Enhancer 11/7/11 Chain of Enhancement Engines Language Detection (Tika) Named Entity Detection (OpenNLP) Linked Data dereferencing (Solr) Refactoring / Translation (Jena)
  • Stanbol EntityHub 11/7/11 Referenced Sites DBpedia Geonames (NY Times, MusicBrainz, ProductDB, UnitProt...) Fast local offline indices (Solr) Batch indexing utilities for RDF dumps Multilingual fulltext search in labels & descriptions Vocabulary mapping / merging
  • Stanbol Reasoner 11/7/11 RDFS / OWL-lite / OWL2 Consistency checks Cardinality checks: each person has 1 birth date Range constraints: birth dates are valid dates Materializing types / properties Types from subclass: Musician > Artist > Person Symmetric property: A worked with B Transitive property: A is a located in B Query-time expansion / inference?
  • Stanbol Rules 11/7/11 Simple Prolog-like language uncleRule[ has(<http://example.org/family.owl#hasParent>, ?x, ?z) . has(<http://example.org/family.owl#hasSibling>, ?z, ?y) -> has(<http://example.org/family.owl#hasUncle>, ?x, ?y) ] Sparql Construct or SWRL PREFIX family: <http://example.org/family.owl#> CONSTRUCT { ?x family:hasUncle} ?y } WHERE { ?x family:hasParent ?z . ?z family:hasSibling ?y}
  • Online Demos 11/7/11 Simple analyzer with small index https://stanbol.demo.nuxeo.com All services deployed http://dev.iks-project.eu:8081
  • Building Stanbol Enhancer models from Wikipedia with the Apache data tools
  • Universal Topic Classification 11/7/11 Use Apache Lucene / Solr MoreLikeThis to perform a truncated nearest neighbors query in the TF-IDF vector space of Wikipedia
  • Universal Topic Classification 11/7/11 Index text of all articles grouped by topic Solr MoreLikeThis query on new document DBpedia dumps provide: Text summaries for each article “ subject” relationships between articles and topics “ broader” / “narrower” SKOS hieararchy between topics
  • About the Data 11/7/11 500k purely technical categories “ People_with_missing_birth_place”, “Rivers_in_Romania” 70k “semantically grounded” categories Paths to roots require both “ technical” and “grounded” categories Scale: 1.2M topic / topic links 30M topic / article links
  • Some results (Wikinews) 11/7/11 US children who celebrate Independence Day more likely to become Republicans, says Harvard study Fireworks Voting theory Republican Party (United States) Statistics Electoral systems
  • Some results (Wikinews) 11/7/11 U.S. space agency NASA sues ex-astronaut American astronauts Aviation halls of fame Edwards Air Force Base Apollo program Exploration of the Moon
  • Some results (Wikinews) 11/7/11 Hundreds of thousands of British public sector workers strike over planned pension changes Retirement in the United Kingdom United Kingdom pensions and benefits Pensions in the United Kingdom Labor disputes by country Labor disputes
  • Some results (PLoS One) 11/7/11 Metabolic Programming during Lactation Stimulates Renal Na+ Transport in the Adult Offspring Due to an Early Impact on Local Angiotensin II Pathways Renal physiology Kidney Nephrology Hypertension Membrane biology
  • Wrap Up 11/7/11 Web of Data brings Sructured Context Frame to decode  User Intention NLP + Entities & Topics indices to automate Content Enrichment to provide Disambiguationn
  • Resources 11/7/11 Documentation, svn, mailing list:   http://incubator.apache.org/stanbol IKS project blog:   http://blog.iks-project.eu Blog posts about Semantic ECM:   http://blogs.nuxeo.com/dev/semantic/
  • Thank you for your attention! 11/7/11 Olivier Grisel [email_address] https://twitter.com/ogrisel
  • Training models for NER from Wikipedia Extract sentences with link positions in Wikipedia articles DBPedia to the find type of the target entity (Person, Location, Organization) Apache Pig scripts to compute the join + format the result as training files for OpenNLP Apache OpenNLP to build and evaluate the models Apache Hadoop / Apache Whirr for distributed processing
  •  
  •  
  •  
  •