Creating Knowledge out of Interlinked Data
          Pisa – 2012/10/05 – Page 1                                         http://lod2.eu




                            NIF 2.0 Draft

                         http://slideshare.net/kurzum




                                                        http://nlp2rdf.org
                                                          http://lod2.eu
                                               Sebastian Hellmann
                                                        AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page                                  http://lod2.eu
Pisa – 2012/10/05 – Page 2                       http://lod2.eu




          Introduction

The NLP Interchange Format (NIF) is an RDF/OWL-based format
that aims to achieve interoperability between Natural Language
Processing (NLP) tools, language resources and annotations.


 • Version 1.0 published in November 2011
 • Version 2.0 is scheduled for completion within 2012
Pisa – 2012/10/05 – Page 3   http://lod2.eu




 NIF Introduction
Pisa – 2012/10/05 – Page 4                             http://lod2.eu




            Addressing primary data

●
  file://path/on/my/local/drive/log.txt
●
  http://www.w3.org/DesignIssues/LinkedData.html
●
  “We the People of the United States, in Order to form a more perfect
Union, ...”

NIF: use a document URI and add “#offset_717_729” to address a
substring of the text from index 717 to 729

- file://path/on/my/local/drive/log.txt#offset_717_729
- http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
- urn:content-item-sha1-1bf4330e5a4a707f381513b2e7#offset_717_729
Pisa – 2012/10/05 – Page 5   http://lod2.eu




 Example
Pisa – 2012/10/05 – Page 6                         http://lod2.eu




           Normalizing Text

NIF provides URIs for Unicode Characters using Unicode
Normalization Form C counted in Code Units.

For all NIF URIs, the universe of discourse will then be the words
over the alphabet of Unicode characters (sometimes called Σ ∗ ).

These URIs can become subjects in RDF triples:




Structural interoperability:
- based on RDF
- defines how text is treated and counted
- compatible with RFC 5147 “#char=717,729”
Pisa – 2012/10/05 – Page 7   http://lod2.eu




 As a Web Service
Pisa – 2012/10/05 – Page 8        http://lod2.eu




 NIF Combinator



http://nlp2rdf.lod2.eu/demo.php
Pisa – 2012/10/05 – Page 9   http://lod2.eu




 NIF Combinator
Pisa – 2012/10/05 – Page 10      http://lod2.eu




            Conceptual Interoperability

NIF can be extended by Vocabulary Modules

OliA




http://purl.org/olia
Pisa – 2012/10/05 – Page 11       http://lod2.eu




           Conceptual Interoperability

NIF can be extended by Vocabulary Modules

Apache Stanbol




http://stanbol.apache.org/
Pisa – 2012/10/05 – Page 12       http://lod2.eu




           Scalability




https://bitbucket.org/srfgkmt/stanbol-nlp
Pisa – 2012/10/05 – Page 13                      http://lod2.eu




              Scalability

-   Less problematic, if only used as exchange format
-   RDF is flexible and good for data integration, not fast
-   NIF is very compact (1-3 triples per annotation)
-   Inference possible, but optional
-   Other formats add overhead as well (e.g. SOAP-XML)
-   NIF Web services are RESTful
-   JSON-LD might be the best option for serialization
Pisa – 2012/10/05 – Page 14   http://lod2.eu




 Scalability
Pisa – 2012/10/05 – Page 15                     http://lod2.eu




         NIF 2.0 - plans

• NIF 2.0 tries to be compatible to (Vocabulary Module):
    • FISE used in Apache Stanbol (IKS-EU Project)
    • LAF/GrAF XML – ISO standard, recently published
    • Fragment Identifiers by IETF and W3C
    • Lemon ontology from Monnet EU Project
    • NERD ontology from EURECOM and LinkedTV EU Project
    • Xpointer/XPath URI scheme
Pisa – 2012/10/05 – Page 16                        http://lod2.eu




           Impact NIF

Impact:
 • Around 600 feedback items or events (email requests,
    presentation Q&A, personal questions, 70 people on the
    mailing list)
 • Five known 3rd party implementations (one for GATE JAPE)
 • Over 1 million requests per month on the demo web services
 • Projects that have announced interest / are working on a NIF
    wrapper: LODifier, Apache Stanbol, LAPPS (NSF project),
    Tipalo/Fred, DKPro (UIMA instantiation), ITS 2.0 test suites
Pisa – 2012/10/05 – Page 17                       http://lod2.eu




          Impact NIF




NIF will likely be the recommended RDF conversion of the
Internationalisation Tagset 2.0 W3C standard (ITS 2.0) -
http://www.w3.org/TR/its20/
Pisa – 2012/10/05 – Page 18                            http://lod2.eu




            Thanks for your attention

Open Community – All feedback is welcome!
http://slideshare.net/kurzum
Direct email:
http://bis.informatik.uni-leipzig.de/SebastianHellmann
Public Mailing List:
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
Wiki (collection of use cases and issues):
http://wiki.nlp2rdf.org/wiki/Use_cases_and_requirements#Use_cases
http://wiki.nlp2rdf.org/wiki/Issues
Website:
http://nlp2rdf.org

NIF 2.0 draft for Pisa

  • 1.
    Creating Knowledge outof Interlinked Data Pisa – 2012/10/05 – Page 1 http://lod2.eu NIF 2.0 Draft http://slideshare.net/kurzum http://nlp2rdf.org http://lod2.eu Sebastian Hellmann AKSW, Universität Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
  • 2.
    Pisa – 2012/10/05– Page 2 http://lod2.eu Introduction The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Version 1.0 published in November 2011 • Version 2.0 is scheduled for completion within 2012
  • 3.
    Pisa – 2012/10/05– Page 3 http://lod2.eu NIF Introduction
  • 4.
    Pisa – 2012/10/05– Page 4 http://lod2.eu Addressing primary data ● file://path/on/my/local/drive/log.txt ● http://www.w3.org/DesignIssues/LinkedData.html ● “We the People of the United States, in Order to form a more perfect Union, ...” NIF: use a document URI and add “#offset_717_729” to address a substring of the text from index 717 to 729 - file://path/on/my/local/drive/log.txt#offset_717_729 - http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 - urn:content-item-sha1-1bf4330e5a4a707f381513b2e7#offset_717_729
  • 5.
    Pisa – 2012/10/05– Page 5 http://lod2.eu Example
  • 6.
    Pisa – 2012/10/05– Page 6 http://lod2.eu Normalizing Text NIF provides URIs for Unicode Characters using Unicode Normalization Form C counted in Code Units. For all NIF URIs, the universe of discourse will then be the words over the alphabet of Unicode characters (sometimes called Σ ∗ ). These URIs can become subjects in RDF triples: Structural interoperability: - based on RDF - defines how text is treated and counted - compatible with RFC 5147 “#char=717,729”
  • 7.
    Pisa – 2012/10/05– Page 7 http://lod2.eu As a Web Service
  • 8.
    Pisa – 2012/10/05– Page 8 http://lod2.eu NIF Combinator http://nlp2rdf.lod2.eu/demo.php
  • 9.
    Pisa – 2012/10/05– Page 9 http://lod2.eu NIF Combinator
  • 10.
    Pisa – 2012/10/05– Page 10 http://lod2.eu Conceptual Interoperability NIF can be extended by Vocabulary Modules OliA http://purl.org/olia
  • 11.
    Pisa – 2012/10/05– Page 11 http://lod2.eu Conceptual Interoperability NIF can be extended by Vocabulary Modules Apache Stanbol http://stanbol.apache.org/
  • 12.
    Pisa – 2012/10/05– Page 12 http://lod2.eu Scalability https://bitbucket.org/srfgkmt/stanbol-nlp
  • 13.
    Pisa – 2012/10/05– Page 13 http://lod2.eu Scalability - Less problematic, if only used as exchange format - RDF is flexible and good for data integration, not fast - NIF is very compact (1-3 triples per annotation) - Inference possible, but optional - Other formats add overhead as well (e.g. SOAP-XML) - NIF Web services are RESTful - JSON-LD might be the best option for serialization
  • 14.
    Pisa – 2012/10/05– Page 14 http://lod2.eu Scalability
  • 15.
    Pisa – 2012/10/05– Page 15 http://lod2.eu NIF 2.0 - plans • NIF 2.0 tries to be compatible to (Vocabulary Module): • FISE used in Apache Stanbol (IKS-EU Project) • LAF/GrAF XML – ISO standard, recently published • Fragment Identifiers by IETF and W3C • Lemon ontology from Monnet EU Project • NERD ontology from EURECOM and LinkedTV EU Project • Xpointer/XPath URI scheme
  • 16.
    Pisa – 2012/10/05– Page 16 http://lod2.eu Impact NIF Impact: • Around 600 feedback items or events (email requests, presentation Q&A, personal questions, 70 people on the mailing list) • Five known 3rd party implementations (one for GATE JAPE) • Over 1 million requests per month on the demo web services • Projects that have announced interest / are working on a NIF wrapper: LODifier, Apache Stanbol, LAPPS (NSF project), Tipalo/Fred, DKPro (UIMA instantiation), ITS 2.0 test suites
  • 17.
    Pisa – 2012/10/05– Page 17 http://lod2.eu Impact NIF NIF will likely be the recommended RDF conversion of the Internationalisation Tagset 2.0 W3C standard (ITS 2.0) - http://www.w3.org/TR/its20/
  • 18.
    Pisa – 2012/10/05– Page 18 http://lod2.eu Thanks for your attention Open Community – All feedback is welcome! http://slideshare.net/kurzum Direct email: http://bis.informatik.uni-leipzig.de/SebastianHellmann Public Mailing List: http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf Wiki (collection of use cases and issues): http://wiki.nlp2rdf.org/wiki/Use_cases_and_requirements#Use_cases http://wiki.nlp2rdf.org/wiki/Issues Website: http://nlp2rdf.org