Your SlideShare is downloading. ×

Integrating NLP using Linked Data

896

Published on

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
896
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Creating Knowledge out of Interlinked Data http://lod2.eu ISWC – 2013/10/23 – Page 1 Integrating NLP using Linked Data Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer http://slideshare.net/kurzum http://nlp2rdf.org http://lod2.eu LOD2 Presentation . 02.09.2010 . Page AKSW, Universität Leipzig http://lod2.eu
  • 2. ISWC – 2013/10/23 – Page 2 Introduction http://lod2.eu
  • 3. ISWC – 2013/10/23 – Page 3 Introduction Core problems in integrating NLP: 1. Too much heterogeneity 2. Almost no open standards available 3. Lack of open collaboration 4. Difficult and large domain http://lod2.eu
  • 4. ISWC – 2013/10/23 – Page 4 Problem analysis Hardly any reusability in NLP • Free software (as in free beer), but no open licenses • Few standards and few mappings • Integration is hard-wired (you have to write software) – for each tool, for each framework Main benefits of using RDF, OWL and Linked Data are: • lower entry barrier (as a client / user) • easy data integration (linking, mapping) • reusability of tools and conceptualisations (ontologies) • off-the-shelf solutions for common tasks http://lod2.eu
  • 5. ISWC – 2013/10/23 – Page 5 The Semantic Gap http://lod2.eu
  • 6. ISWC – 2013/10/23 – Page 6 http://lod2.eu
  • 7. ISWC – 2013/10/23 – Page 7 NLP2RDF project NLP2RDF (http://nlp2rdf.org) - community project bootstrapped by LOD2 - develops NLP Interchange Format (NIF) - umbrella project to combine (and consolidate) existing work http://lod2.eu
  • 8. ISWC – 2013/10/23 – Page 8 NIF Overview The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. → to create an eco-system of interopable web services http://lod2.eu
  • 9. ISWC – 2013/10/23 – Page 9 http://lod2.eu NIF Overview The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Reuse of existing standards such as RDF, OWL2, the PROV Ontology, LAF (ISO 24612), Unicode and RFC 5147 • Standardize access parameters, annotations (e.g. tokenization), validation and log messages • Reuse of existing ontologies:
  • 10. ISWC – 2013/10/23 – Page 10 http://lod2.eu Example NIF Workflow NIF workflow, however, can obviously not provide any better performance (Fmeasure, speed) than a properly configured UIMA or GATE pipeline with the same components.
  • 11. ISWC – 2013/10/23 – Page 11 Use Cases • • • Internationalization TagSet 2.0 Part of Speech Tagging Wikifier API access via RDFaCE (Entity Linking) http://lod2.eu
  • 12. ISWC – 2013/10/23 – Page 12 http://lod2.eu UC1 - Internationalisation Tagset 2.0 • NIF will be the recommended RDF conversion of the Internationalisation Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/ • NIF turns out to have a unique selling proposition regarding NLP and RDF • There were no suitable alternative RDF vocabulary for this conversion available.
  • 13. ISWC – 2013/10/23 – Page 13 Source: http://www.w3.org/TR/its20/#EX-HTML-whitespace-normalization http://lod2.eu ITS 2.0 RDFa parsers loose all provenance information: <http://examples.com/books/wikinomics> dc:title ''Wikinomics'' . Source: https://en.wikipedia.org/wiki/RDFa
  • 14. ISWC – 2013/10/23 – Page 14 UC1 - Internationalisation Tagset 2.0 http://lod2.eu
  • 15. ISWC – 2013/10/23 – Page 15 UC1 - Internationalisation Tagset 2.0 String offset based on: - Unicode NFC, code points - ISO 24612 - RFC 5147 http://lod2.eu
  • 16. http://lod2.eu ISWC – 2013/10/23 – Page 16 UC2 – Part of Speech Tagging Please see the paper: http://purl.org/olia
  • 17. ISWC – 2013/10/23 – Page 17 UC3 – Wikifier API access via RDFaCE https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki http://lod2.eu
  • 18. ISWC – 2013/10/23 – Page 18 UC3 - Wikifier API access via RDFaCE http://rdface.aksw.org/ http://lod2.eu
  • 19. ISWC – 2013/10/23 – Page 19 UC3 - Wikifier API access via RDFaCE http://rdface.aksw.org/ http://lod2.eu
  • 20. ISWC – 2013/10/23 – Page 20 Evaluation Please see the paper! 1) Quantitative Analysis with Google Wikilinks Corpus as NIF RDF • Crawl of 3 million web sites, 40 million Wikipedia links • ~ 477 million triples in NIF 2) Questionnaire and Developers Study for NIF 1.0 • NIF 1.0 was released in September 2009 • Over 30 known implementations (22 not from authors) • 14 developers participated in the study • Minimal NIF implementation requires less than 500 LoC 3) Qualitative Comparison with other Frameworks and Formats http://lod2.eu
  • 21. ISWC – 2013/10/23 – Page 21 State of NIF 2.0 Corpora as Linked Data • Wikilinks corpus - http://wiki-link.nlp2rdf.org • KORE 50 - http://www.yovisto.com/labs/ner-benchmarks/ • DBpedia Spotlight dataset Tools • entityclassifier.eu – http://entityclassifier.eu • Spotlight - http://spotlight.dbpedia.org • Open NLP • Stanford CoreNLP - https://github.com/NLP2RDF/software • Validator - https://github.com/NLP2RDF/software http://lod2.eu
  • 22. ISWC – 2013/10/23 – Page 22 State of NIF 2.0 • • • Rollout is in progress Distributed implementation at different speed and quality Software lifecycle: • Implementation • Testing/Validation • Integration in the main software • Deployment as a web service • Hosted web services often not up to date while code base is http://lod2.eu
  • 23. ISWC – 2013/10/23 – Page 23 How to join - http://nlp2rdf.org http://lod2.eu
  • 24. ISWC – 2013/10/23 – Page 24 For ontology creators NLP2RDF provides infrastructure for your NLP ontologies • • • • • • Redundant, persistent hosting Maven packages Code and documentation generation Continuous Integration (planned) Indexing Validation of instance data Please write to me or the mailing list nlp2rdf@lists.informatik.uni-leipzig.de http://lod2.eu
  • 25. http://lod2.eu ISWC – 2013/10/23 – Page 25 Take home message • Early industrial uptake • OpenLink, Vistatech.ie, Zemanta, Tenforce, Unister • ITS 2.0 W3C standard was driven by localization industry • • NIF is open and free (CC0 planned) NIF is designed to be a cost-saver Not primarily aimed at increasing features or performance (F-Measure)
  • 26. ISWC – 2013/10/23 – Page 26 Thanks for your attention Open Community – All feedback is welcome! http://slideshare.net/kurzum Websites: http://nlp2rdf.org http://lod2.eu http://lod2.eu
  • 27. ISWC – 2013/10/23 – Page 27 Annotations http://lod2.eu
  • 28. ISWC – 2013/10/23 – Page 28 NIF http://lod2.eu
  • 29. ISWC – 2013/10/23 – Page 29 Scalability - Salzburg Research KMT https://bitbucket.org/srfgkmt/stanbol-nlp http://lod2.eu
  • 30. ISWC – 2013/10/23 – Page 30 Unicode Normal Form C • • Recommendation for RDF Literals http://unicode.org/reports/tr15/#Norm_Forms http://lod2.eu
  • 31. ISWC – 2013/10/23 – Page 31 Tokenization Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations. Language Resources and Evaluation 46(1): 53-74 (2012) http://lod2.eu
  • 32. http://lod2.eu ISWC – 2013/10/23 – Page 32 Validation over specification • • • • • • SPARQL queries produce (find) errors http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t RLOG – An RDF Logging Ontology ./validate.jar -i nif-erroneous-model.ttl -t file Demo → character count Demo → all errors ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 33. ISWC – 2013/10/23 – Page 33 NIF Demo: http://nlp2rdf.lod2.eu/demo.php http://lod2.eu
  • 34. ISWC – 2013/10/23 – Page 34 OLiA http://purl.org/olia http://lod2.eu
  • 35. ISWC – 2013/10/23 – Page 35 NIF http://lod2.eu
  • 36. ISWC – 2013/10/23 – Page 36 NIF http://lod2.eu

×