Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Sebastian Hellmann
AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Problem:
• Currently NLP software is organized in pipelines
• Integration is done „hard-wired“
– For each tool and each framework an adapter has to be created
(n*m)
• Difficult to exchange single components
2
Open Linguistics@OKCon 30.6.2011 2 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Overview:
• NLP tools can be integrated via a common output format (Common
pattern in Enterprise Application Integration)
• For each tool a wrapper needs to be created, that reads NIF and
produces NIF
• The combination of tools can be adhoc, i.e. it is not a pipeline that
needs to be configured
• Multi-layer and overlapping annotations are possible
• Ontologies provide interfaces for each layer and for applications
3
Open Linguistics@OKCon 30.6.2011 3 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF
• How to give a part of a document or text an identifier (URI)?
• What properties can such URIs have?
4
Open Linguistics@OKCon 30.6.2011 4 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
5
LOD2 Event . 06.09.2010 . Page 5 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Example URIs for annotating „Semantic Web“
6
Open Linguistics@OKCon 30.6.2011 6 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF
• How to give a part of a document or text an identifier (URI)?
• What properties can such URIs have?
7
Open Linguistics@OKCon 30.6.2011 7 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• URIs are used to integrate output. RDF merges naturally, if the URIs
are the same (or convertible using a certain recipe)
8
Open Linguistics@OKCon 30.6.2011 8 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• Second challenge: Output of each layer is required to be stable.
• Components and layers can be interchanged
• OLiA provides an ontological interface
9
Open Linguistics@OKCon 30.6.2011 9 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
10
LOD2 Event . 06.09.2010 . Page 10 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
11
LOD2 Event . 06.09.2010 . Page 11 http://lod2.eu
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
12
LOD2 Event . 06.09.2010 . Page 12 http://lod2.eu
Creating Knowledge out of Interlinked Data
Workplan
• EU Deliverable almost finished
• Integration of SnowballStemming and the Stanford Parser
• Next step: Integration of Knowledge Extraction tools (Zemanta,
DBpedia Spotlight, Alchemy, OpenCalais)
• Web Service that read NIF and Output NIF
• Google Code Project: http://code.google.com/p/nlp2rdf/
13
Open Linguistics@OKCon 30.6.2011 13 http://lod2.eu
Creating Knowledge out of Interlinked Data
Future
• NIF allows to represent NLP output using Knowledge Representation
Formalisms (RDF/OWL)
• It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)
• Good foundation to optimize machine learning:
• Choose the best algortihms
• Choose the best data
14
Open Linguistics@OKCon 30.6.2011 14 http://lod2.eu
Creating Knowledge out of Interlinked Data
Reasons for Open Data
• Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation
Extraction from Texts“
• POS-Tags and Dependency Trees in First-Order-Logic
• ILP Machine Learning Approach
• TIDES Extraction (ACE) 2003 Multilingual Training Data
• closed licence
• about 3000 US $
• Barrier for reproduction of results
• Authors could send me a (p)(r)e-print, but not a copy of the
benchmarkTM
15
Open Linguistics@OKCon 30.6.2011 15 http://lod2.eu
Creating Knowledge out of Interlinked Data
Thank you for your attention!
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu