Linked Data for Abbreviations and Segmentation
Upcoming SlideShare
Loading in...5

Linked Data for Abbreviations and Segmentation






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Linked Data for Abbreviations and Segmentation Linked Data for Abbreviations and Segmentation Presentation Transcript

  • ULI meeting – 2013/05/28 – Page 1 http://lod2.euCreating Knowledge out of Interlinked DataLOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität LeipzigSebastian HellmannLinked DataforAbbreviations and Segmentationhttp://nlp2rdf.orghttp://lod2.eu
  • ULI meeting – 2013/05/28 – Page 2 http://lod2.euSebastian Hellmann – researcher working on LOD2 EU ProjectAKSW – Agile Knowledge and the Semantic Web research group in Leipzig -http://aksw.orgInfAI – Institute for Applied Informatics - http://infai.orgContents:• Introduction to Linked Data• Linked data close-up: DBpedia data set• Exploitation of free and open data for CLDR• Collaboration pointsIntroduction
  • ULI meeting – 2013/05/28 – Page 3 http://lod2.eu
  • ULI meeting – 2013/05/28 – Page 4 http://lod2.euhttp://lod-cloud.netLinked Open Data- All datasets provide open access to individual records via HTTP- Many are free (no payment required, as in royalty-free)- Some are openly licensed, e.g. CC-0 or CC-BY-SA=> Open access also applies to published HTML on the WWW, but here the dataitself is published unrendered via RDF
  • ULI meeting – 2013/05/28 – Page 5 http://lod2.eu
  • ULI meeting – 2013/05/28 – Page 6• DBpedia is a crowd-sourced community effort to extract structuredinformation from Wikipedia and make this information available on theWeb.• allows sophisticated queries against Wikipedia content• allows links from the different data sets on the Web to Wikipedia data• data is extracted continuously:• WikiData will be integrated within the next four monthsvia Google Summer of Code project
  • ULI meeting – 2013/05/28 – Page 7 http://lod2.eu paragraph in morethan 20 languages
  • ULI meeting – 2013/05/28 – Page 8 http://lod2.eu from Wikipedia infoboxes
  • ULI meeting – 2013/05/28 – Page 9 http://lod2.eu
  • ULI meeting – 2013/05/28 – Page 10 http://lod2.eu labels
  • ULI meeting – 2013/05/28 – Page 11 http://lod2.euTrend 1: I18n
  • ULI meeting – 2013/05/28 – Page 12• DBpedia Extraction Framework can be extended to easily extract any datafrom Wikipedia:• We are using it to extract corpora for NLP• e.g. URI, surrounding text, surface form• Probabilities:• P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc.• P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in textTrend 2: DBpedia 4 NLP
  • ULI meeting – 2013/05/28 – Page 13• DBpedia is a data dissemination project:• as download for reuse• As Linked Data for interlinking• Corpora will be published via the NLP Interchange RDF Format (NIF) -http://nlp2rdf.orgTrend 2: DBpedia 4 NLP
  • ULI meeting – 2013/05/28 – Page 14 http://lod2.euDBpedia Live Abbreviation ExampleUp-to-date gazetteer- AFD party was founded earlier this year.- lexical information and statistics could be included
  • ULI meeting – 2013/05/28 – Page 15 http://lod2.euLinguistic LOD Cloud
  • ULI meeting – 2013/05/28 – Page 16• DBpedia• Main version and I18n chapters•• Wiktionary 2 RDF:• Wortschatz from Uni Leipzig (planned as Linked Data)•• JRC Names:• JRC-Names is a highly multilingual named entity resource for person andorganisation names•• provides URIs for ISO 629-3• data sets from LLOD
  • ULI meeting – 2013/05/28 – Page 17 http://lod2.eu> CLDR will make an excellent addition to LLODLinguistic LOD
  • ULI meeting – 2013/05/28 – Page 18• CLDR as Linked Data• empowers third parties to link to your authoritative data• links are reusable• LIDER EU project (presumably starting in October) will provide somesupport for linked data adopters• ULI members can join the industry and advisory board• Workshop “DBpedia & NLP” in Oct, 2013•• Creation of free and open benchmarks in RDF• We could promote CLDR and collect contributionsCollaboration points I
  • ULI meeting – 2013/05/28 – Page 19• Personally, I can:• Join ULI mailing list• Look out for appropriate data• Look for opportunities (e.g. synergies with other projects)• Provide some counseling (e.g. pointers, technology Q&A)=> this will be done as preparation for the LIDER EU project, CLDR• Academic collaboration:• Excellent PhD student topic: Create corpora, interlink and fuse data andbenchmark effectiveness for segmentation• Provide knowledge transfer (e.g. tutorials, visits)Collaboration points II
  • ULI meeting – 2013/05/28 – Page 20 http://lod2.euOpen Community – All feedback is welcome! for your attention
  • ULI meeting – 2013/05/28 – Page 21 http://lod2.euWiktionary Example
  • ULI meeting – 2013/05/28 – Page 22 http://lod2.euLOD2 EU Project produces LOD2 Stack.Three requirements to unlock Natural Language Processing (NLP) for the project:1. NLP tool output is required to be in RDF2. Scalability (less triples, focus on usefulness)3. Common vocabulary to integrate and use NLP toolsThe NLP Interchange Format (NIF) is an RDF/OWL-based format that aims toachieve interoperability between Natural Language Processing (NLP) tools,language resources and annotations.• Version 1.0 published in November 2011• Version 2.0 is scheduled for completion within 2013NLP Interchange Format 2.0
  • ULI meeting – 2013/05/28 – Page 23 http://lod2.euNIF Architecture
  • ULI meeting – 2013/05/28 – Page 24 http://lod2.euAdressing Primary Data
  • ULI meeting – 2013/05/28 – Page 25 http://lod2.euAdressing Primary DataNIF 1.0: 2.0 uses RFC 5147:,729User extensions possible: you have to link to documentation on how it was created)
  • ULI meeting – 2013/05/28 – Page 26 http://lod2.euAs a Web Servicecurl--data-urlencode prefix=""--data-urlencode input="[...]"(--data-urlencode source=””)
  • ULI meeting – 2013/05/28 – Page 27• Tibeto-Burman languages:• Russian TreeTagger :• German STTS:• English Penn:→ all map to of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (freeand open, CC-By)Vocabulary Module: OLiA
  • ULI meeting – 2013/05/28 – Page 28• NIF 2.0 tries to be compatible to (Vocabulary Module):• ITS 2.0• FISE used in Apache Stanbol (IKS-EU Project)• LAF/GrAF XML – ISO standard, recently published• Fragment Identifiers by IETF and W3C• Lemon ontology from Monnet EU Project• NERD ontology from EURECOM and LinkedTV EU Project• Xpointer/XPath URI scheme• Open AnnotationNIF 2.0 - plans
  • ULI meeting – 2013/05/28 – Page 29 http://lod2.euNIF 2.0 :• NIF is free and open (CC-0 or CC-BY)• All ontologies will be hosted for persistently by University Leipzig• Sign up on the mailinglist at• Provide Use Cases, Requirements, Implementations at:•• you can contribute:
  • ULI meeting – 2013/05/28 – Page 30 http://lod2.euLOD 2 Stack• Currently project half-time• Most of the tools are free and open source• Commercial rollout planned• Many webinars available• You can integrate your tool via Debian packagehttp://lod2.eu you can contribute: