SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Sebastian Hellmann
AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
2.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Problem:
• Currently NLP software is organized in pipelines
• Integration is done „hard-wired“
– For each tool and each framework an adapter has to be created
(n*m)
• Difficult to exchange single components
2
Open Linguistics@OKCon 30.6.2011 2 http://lod2.eu
3.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Overview:
• NLP tools can be integrated via a common output format (Common
pattern in Enterprise Application Integration)
• For each tool a wrapper needs to be created, that reads NIF and
produces NIF
• The combination of tools can be adhoc, i.e. it is not a pipeline that
needs to be configured
• Multi-layer and overlapping annotations are possible
• Ontologies provide interfaces for each layer and for applications
3
Open Linguistics@OKCon 30.6.2011 3 http://lod2.eu
4.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF
• How to give a part of a document or text an identifier (URI)?
• What properties can such URIs have?
4
Open Linguistics@OKCon 30.6.2011 4 http://lod2.eu
5.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
5
LOD2 Event . 06.09.2010 . Page 5 http://lod2.eu
6.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
Example URIs for annotating „Semantic Web“
6
Open Linguistics@OKCon 30.6.2011 6 http://lod2.eu
7.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF
• How to give a part of a document or text an identifier (URI)?
• What properties can such URIs have?
7
Open Linguistics@OKCon 30.6.2011 7 http://lod2.eu
8.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• URIs are used to integrate output. RDF merges naturally, if the URIs
are the same (or convertible using a certain recipe)
8
Open Linguistics@OKCon 30.6.2011 8 http://lod2.eu
9.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
• Second challenge: Output of each layer is required to be stable.
• Components and layers can be interchanged
• OLiA provides an ontological interface
9
Open Linguistics@OKCon 30.6.2011 9 http://lod2.eu
10.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
10
LOD2 Event . 06.09.2010 . Page 10 http://lod2.eu
11.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
11
LOD2 Event . 06.09.2010 . Page 11 http://lod2.eu
12.
Creating Knowledge out of Interlinked Data
NIF – NLP Interchange Format
12
LOD2 Event . 06.09.2010 . Page 12 http://lod2.eu
13.
Creating Knowledge out of Interlinked Data
Workplan
• EU Deliverable almost finished
• Integration of SnowballStemming and the Stanford Parser
• Next step: Integration of Knowledge Extraction tools (Zemanta,
DBpedia Spotlight, Alchemy, OpenCalais)
• Web Service that read NIF and Output NIF
• Google Code Project: http://code.google.com/p/nlp2rdf/
13
Open Linguistics@OKCon 30.6.2011 13 http://lod2.eu
14.
Creating Knowledge out of Interlinked Data
Future
• NIF allows to represent NLP output using Knowledge Representation
Formalisms (RDF/OWL)
• It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)
• Good foundation to optimize machine learning:
• Choose the best algortihms
• Choose the best data
14
Open Linguistics@OKCon 30.6.2011 14 http://lod2.eu
15.
Creating Knowledge out of Interlinked Data
Reasons for Open Data
• Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation
Extraction from Texts“
• POS-Tags and Dependency Trees in First-Order-Logic
• ILP Machine Learning Approach
• TIDES Extraction (ACE) 2003 Multilingual Training Data
• closed licence
• about 3000 US $
• Barrier for reproduction of results
• Authors could send me a (p)(r)e-print, but not a copy of the
benchmarkTM
15
Open Linguistics@OKCon 30.6.2011 15 http://lod2.eu
16.
Creating Knowledge out of Interlinked Data
Thank you for your attention!
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu