Advertisement

NIF - NLP Interchange Format

Sebastian Hellmann
Jul. 6, 2011
Advertisement

More Related Content

Similar to NIF - NLP Interchange Format(20)

Advertisement
Advertisement

NIF - NLP Interchange Format

  1. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Sebastian Hellmann AKSW, Universität Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
  2. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Problem: • Currently NLP software is organized in pipelines • Integration is done „hard-wired“ – For each tool and each framework an adapter has to be created (n*m) • Difficult to exchange single components 2 Open Linguistics@OKCon 30.6.2011 2 http://lod2.eu
  3. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Overview: • NLP tools can be integrated via a common output format (Common pattern in Enterprise Application Integration) • For each tool a wrapper needs to be created, that reads NIF and produces NIF • The combination of tools can be adhoc, i.e. it is not a pipeline that needs to be configured • Multi-layer and overlapping annotations are possible • Ontologies provide interfaces for each layer and for applications 3 Open Linguistics@OKCon 30.6.2011 3 http://lod2.eu
  4. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • First Challenge: Representing Strings in RDF • How to give a part of a document or text an identifier (URI)? • What properties can such URIs have? 4 Open Linguistics@OKCon 30.6.2011 4 http://lod2.eu
  5. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 5 LOD2 Event . 06.09.2010 . Page 5 http://lod2.eu
  6. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Example URIs for annotating „Semantic Web“ 6 Open Linguistics@OKCon 30.6.2011 6 http://lod2.eu
  7. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • First Challenge: Representing Strings in RDF • How to give a part of a document or text an identifier (URI)? • What properties can such URIs have? 7 Open Linguistics@OKCon 30.6.2011 7 http://lod2.eu
  8. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • URIs are used to integrate output. RDF merges naturally, if the URIs are the same (or convertible using a certain recipe) 8 Open Linguistics@OKCon 30.6.2011 8 http://lod2.eu
  9. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • Second challenge: Output of each layer is required to be stable. • Components and layers can be interchanged • OLiA provides an ontological interface 9 Open Linguistics@OKCon 30.6.2011 9 http://lod2.eu
  10. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 10 LOD2 Event . 06.09.2010 . Page 10 http://lod2.eu
  11. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 11 LOD2 Event . 06.09.2010 . Page 11 http://lod2.eu
  12. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 12 LOD2 Event . 06.09.2010 . Page 12 http://lod2.eu
  13. Creating Knowledge out of Interlinked Data Workplan • EU Deliverable almost finished • Integration of SnowballStemming and the Stanford Parser • Next step: Integration of Knowledge Extraction tools (Zemanta, DBpedia Spotlight, Alchemy, OpenCalais) • Web Service that read NIF and Output NIF • Google Code Project: http://code.google.com/p/nlp2rdf/ 13 Open Linguistics@OKCon 30.6.2011 13 http://lod2.eu
  14. Creating Knowledge out of Interlinked Data Future • NIF allows to represent NLP output using Knowledge Representation Formalisms (RDF/OWL) • It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia) • Good foundation to optimize machine learning: • Choose the best algortihms • Choose the best data 14 Open Linguistics@OKCon 30.6.2011 14 http://lod2.eu
  15. Creating Knowledge out of Interlinked Data Reasons for Open Data • Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation Extraction from Texts“ • POS-Tags and Dependency Trees in First-Order-Logic • ILP Machine Learning Approach • TIDES Extraction (ACE) 2003 Multilingual Training Data • closed licence • about 3000 US $ • Barrier for reproduction of results • Authors could send me a (p)(r)e-print, but not a copy of the benchmarkTM 15 Open Linguistics@OKCon 30.6.2011 15 http://lod2.eu
  16. Creating Knowledge out of Interlinked Data Thank you for your attention! LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Advertisement