NLP Interchange Format (NIF) - Version 1.0 http://nlp2rdf.org <ul>Creating Knowledge out of Interlinked Data </ul><ul>LOD2...
NLP2RDF + NIF <ul><ul><li>NLP Interchange Format (NIF)  is an RDF/OWL-based format that allows to combine and chain severa...
NLP2RDF  is a LOD2 project providing: </li><ul><ul><li>documentation
reference implementations of NIF
collaboration platform
tutorials / example source code
mailing list for questions and support
possible to join the mailing list on  http://nlp2rdf.org </li></ul></ul></ul></ul>
<ul><ul><li>NLP2RDF + NIF </li><ul><li>Motivation and comparison of other NLP frameworks
URI design
NLP domain vocabularies
Applications </li></ul></ul></ul>NLP2RDF + NIF
NLP2RDF - NIF Use Cases <ul><ul><li>Problem: NLP software is organized in pipelines (UIMA, Gate) </li><ul><li>Integration ...
For each tool and each framework an adapter has to be created (n*m)
No ad-hoc integration
Difficult to aggregate output
Difficult to exchange single components
Not robust: if step 6 of 20 steps fails no output is produced  </li></ul></ul></ul>
NLP2RDF – NIF Use Cases
NLP2RDF – NIF Use Cases Included in  RDF/OWL  as - rdf:type  - rdfs:subClassOf  - links and mappings
NLP2RDF – NIF Use Cases Intra -changeable, but not  inter -changeable: Gate Plugin can not be used in UIMA easily
NIF – Integration Architecture
NIF – URI Design URI Design How to address Strings with URIs
NIF – How to address Strings with URIs?
NIF – How to address Strings with URIs?
Comparison of URI Recipes http://www.w3.org/DesignIssues/LinkedData.html#offset_14406_14418_Semantic+Web http://www.w3.org...
Comparison of URI Recipes
NIF – Combining RDF
NIF – Snowball Stemmer
NIF – Stanford Core
NIF – Combined RDF
NLP2RDF – NIF – 1.0 <ul><ul><li>NIF-1.0 provides:
Structural Interoperability:  </li><ul><li>URI recipes to anchor annotation in documents
Upcoming SlideShare
Loading in …5
×

NIF - Version 1.0 - 2011/10/23

2,120 views

Published on

This presentation was given at the MSW2 Workshop

It gives an overview over the current state of NIF in its version 1.0

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,120
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NIF - Version 1.0 - 2011/10/23

  1. 1. NLP Interchange Format (NIF) - Version 1.0 http://nlp2rdf.org <ul>Creating Knowledge out of Interlinked Data </ul><ul>LOD2 Presentation . 02.09.2010 . Page </ul><ul>http://lod2.eu </ul><ul>AKSW, Universität Leipzig </ul><ul>Sebastian Hellmann </ul>
  2. 2. NLP2RDF + NIF <ul><ul><li>NLP Interchange Format (NIF) is an RDF/OWL-based format that allows to combine and chain several Natural Language Processing (NLP) tools in a flexible, light-weight way.
  3. 3. NLP2RDF is a LOD2 project providing: </li><ul><ul><li>documentation
  4. 4. reference implementations of NIF
  5. 5. collaboration platform
  6. 6. tutorials / example source code
  7. 7. mailing list for questions and support
  8. 8. possible to join the mailing list on http://nlp2rdf.org </li></ul></ul></ul></ul>
  9. 9. <ul><ul><li>NLP2RDF + NIF </li><ul><li>Motivation and comparison of other NLP frameworks
  10. 10. URI design
  11. 11. NLP domain vocabularies
  12. 12. Applications </li></ul></ul></ul>NLP2RDF + NIF
  13. 13. NLP2RDF - NIF Use Cases <ul><ul><li>Problem: NLP software is organized in pipelines (UIMA, Gate) </li><ul><li>Integration is done „hard-wired“ (Software has to be developed)
  14. 14. For each tool and each framework an adapter has to be created (n*m)
  15. 15. No ad-hoc integration
  16. 16. Difficult to aggregate output
  17. 17. Difficult to exchange single components
  18. 18. Not robust: if step 6 of 20 steps fails no output is produced </li></ul></ul></ul>
  19. 19. NLP2RDF – NIF Use Cases
  20. 20. NLP2RDF – NIF Use Cases Included in RDF/OWL as - rdf:type - rdfs:subClassOf - links and mappings
  21. 21. NLP2RDF – NIF Use Cases Intra -changeable, but not inter -changeable: Gate Plugin can not be used in UIMA easily
  22. 22. NIF – Integration Architecture
  23. 23. NIF – URI Design URI Design How to address Strings with URIs
  24. 24. NIF – How to address Strings with URIs?
  25. 25. NIF – How to address Strings with URIs?
  26. 26. Comparison of URI Recipes http://www.w3.org/DesignIssues/LinkedData.html#offset_14406_14418_Semantic+Web http://www.w3.org/DesignIssues/LinkedData.html#hash_4_12_79edde636fac847c006605f82d4c5c4d_Semantic+Web
  27. 27. Comparison of URI Recipes
  28. 28. NIF – Combining RDF
  29. 29. NIF – Snowball Stemmer
  30. 30. NIF – Stanford Core
  31. 31. NIF – Combined RDF
  32. 32. NLP2RDF – NIF – 1.0 <ul><ul><li>NIF-1.0 provides:
  33. 33. Structural Interoperability: </li><ul><li>URI recipes to anchor annotation in documents
  34. 34. Ontologies to describe the relations between these URIs: </li><ul><li>e.g. subString, String, Word, Sentence, Document
  35. 35. http://nlp2rdf.lod2.eu/schema/string/
  36. 36. http://nlp2rdf.lod2.eu/schema/sso/ </li></ul></ul><li>Conceptual Interoperability: </li><ul><li>Vocabularies for certain NLP tasks and domains </li></ul><li>Access Interoperability: </li><ul><li>Common Web service parameters </li></ul></ul></ul>
  37. 37. Conceptual Interoperability <ul><ul><li>Ideal approach:
  38. 38. => Reuse of a Reference Vocabulary/Ontology for each NLP layer
  39. 39. Simple Datatype properties for basic annotations: </li><ul><li>Stems, Lemma, statistics (word count) </li></ul><li>Morpho-Syntactical Analysis: OLiA [Chiarcos 2008, 2010] http://nachhalt.sfb632.uni-potsdam.de/owl/
  40. 40. Candidates for inclusion: </li><ul><li>Lemon
  41. 41. XLIFF
  42. 42. NER-Type Ontology: SCMS ( http://scms.eu ), MUC types, DBpedia Ontology </li></ul></ul></ul>
  43. 43. Conceptual Interoperability <ul><ul><li>The OLiA architecture is a set of modular OWL/DL ontologies with: </li><ul><li>ontological models of annotation schemes ( Annotation Models )
  44. 44. An ontology of reference terms ( Reference Model )
  45. 45. ontologies that implement subClassOf relationships between them ( Linking Models ). </li></ul></ul></ul>OLiA [Chiarcos 2008, 2010] http://nachhalt.sfb632.uni-potsdam.de/owl/
  46. 46. OLIA
  47. 47. OLIA Currently 32 Annotation Models for 69 languoids available at: http://nachhalt.sfb632.uni-potsdam.de/owl/ The ontologies can be instrumentalized to achieve parser, tagset, language and framework independence .
  48. 48. OliA - NIF Integration
  49. 49. NIF Tutorial Challenges for Adoption http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-multilingual-part-of-speech-tagger
  50. 50. NIF RoadMap <ul><ul><li>RoadMap: </li><ul><li>NIF 1.0 is published and implementation has started </li><ul><li>Existing: Stanford, Snowball Stemmer
  51. 51. Wrappers currently in development: DBpedia Spotlight, UIMA, Gate ANNIE, LingPipe, MontyLingua, OpenNLP
  52. 52. In LOD2: Zemanta, NooJ, Korean POS taggers
  53. 53. Estimated 300 lines of code per wrapper, with low complexity (parser/serializer). </li></ul><li>http://nlp2rdf.org/category/implementations allows to browse the implementations </li></ul></ul></ul>
  54. 54. NIF RoadMap <ul><ul><li>RoadMap: </li><ul><li>Benchmarking of String URI properties (stability)
  55. 55. Interactive Tutorial challenges online
  56. 56. NIF 2.0-draft will be refined based on the experience gained during the implementation of NIF 1.0
  57. 57. Find experts to model more NLP Layers </li></ul></ul></ul>
  58. 58. <ul><ul><li>Applications of NIF </li></ul></ul>
  59. 59. <ul><ul><li>Related Work </li><ul><li>RDFa / Schema.org </li><ul><li>Annotations are provided by host (no 3 rd party annotations) </li></ul><li>LiveURL project http://liveurls.mozdev.org/index.html </li><ul><li>Use Case is highlighting in browser, Uniqueness of URIs is not a strict requirement </li></ul><li>Asanka Wasala LRC/CNGL - A Micro Crowdsourcing Architecture to Localize Web Content for Less-Resourced Languages http://www.w3.org/International/multilingualweb/limerick/slides/wasala.pdf </li><ul><li>Allows to translate web sites in a browser </li></ul><li>EARMARK http://palindrom.es/phd/research/earmark/ </li><ul><li>3 triples per range vs. 1 URI </li></ul></ul></ul></ul>Representation of Web Annotations
  60. 60. <ul><ul><li>Advantages of RDF/OWL
  61. 61. RDF makes data integration easy: URIs, LinkedData
  62. 62. OWL is based on Description Logics (Guarded Fragment)
  63. 63. Availability of open data sets (access and licence)
  64. 64. Reusability of Vocabularies and Ontologies
  65. 65. Diverse serializations for annotations: XML, Turtle, RDFa+XHTML
  66. 66. Scalable tool support (Databases, Reasoning)
  67. 67. Data is flexible and can produce indexes </li></ul></ul>Meaning Representation Language
  68. 68. Meaning Representation Language
  69. 69. <ul><ul><li>A traditional approach :
  70. 70. POS tag / Dependency parser (e.g. Stanford)
  71. 71. create a rule/pattern language to extract facts and knowledge
  72. 72. Lot's of home-made solutions and problems! </li></ul></ul>Knowledge Extraction with SPARQL
  73. 73. <ul><ul><li>Johanna Völker – Based on Acquisition of OWL DL Axioms from Lexical Resources
  74. 74. # Example:
  75. 75. # A fish is any aquatic vertebrate animal that is covered with scales, and equipped with two sets of paired fins and several unpaired fins.
  76. 76. # [fish] subClassOf [any aquatic vertebrate animal that is covered …]
  77. 77. Cave: The query is out of date
  78. 78. Construct {?sub rdfs:subClassOf ?super} {
  79. 79. ?is a penn:BePresentTense .
  80. 80. ?is nlp:superToken ?is_any_aquatic_.
  81. 81. ?is_any_aquatic_ a olia:VerbPhrase .
  82. 82. ?is_any_aquatic_ nlp:syntacticSubToken [ nlp:normUri ?super] .
  83. 83. ?animal nlp:cop ?is .
  84. 84. ?animal nlp:nsubj ?fish .?fish nlp:superToken [ nlp:normUri ?sub] .
  85. 85. } </li></ul></ul>Knowledge Extraction with SPARQL
  86. 86. Project: http://lod2.eu Organisation: http://uni-leipzig.de , http://aksw.org Presenter: http://bis.informatik.uni-leipzig.de/SebastianHellmann NLP2RDF page: http://nlp2rdf.org

×