Linked Data for Abbreviations and Segmentation

3,499 views
3,402 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,499
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linked Data for Abbreviations and Segmentation

  1. 1. ULI meeting – 2013/05/28 – Page 1 http://lod2.euCreating Knowledge out of Interlinked DataLOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität LeipzigSebastian HellmannLinked DataforAbbreviations and Segmentationhttp://nlp2rdf.orghttp://lod2.euhttp://slideshare.net/kurzum
  2. 2. ULI meeting – 2013/05/28 – Page 2 http://lod2.euSebastian Hellmann – researcher working on LOD2 EU ProjectAKSW – Agile Knowledge and the Semantic Web research group in Leipzig -http://aksw.orgInfAI – Institute for Applied Informatics - http://infai.orgContents:• Introduction to Linked Data• Linked data close-up: DBpedia data set• Exploitation of free and open data for CLDR• Collaboration pointsIntroduction
  3. 3. ULI meeting – 2013/05/28 – Page 3 http://lod2.euhttp://lod-cloud.net
  4. 4. ULI meeting – 2013/05/28 – Page 4 http://lod2.euhttp://lod-cloud.netLinked Open Data- All datasets provide open access to individual records via HTTP- Many are free (no payment required, as in royalty-free)- Some are openly licensed, e.g. CC-0 or CC-BY-SA=> Open access also applies to published HTML on the WWW, but here the dataitself is published unrendered via RDF
  5. 5. ULI meeting – 2013/05/28 – Page 5 http://lod2.euhttp://dbpedia.org
  6. 6. ULI meeting – 2013/05/28 – Page 6 http://lod2.eu• DBpedia is a crowd-sourced community effort to extract structuredinformation from Wikipedia and make this information available on theWeb.• allows sophisticated queries against Wikipedia content• allows links from the different data sets on the Web to Wikipedia data• data is extracted continuously: http://live.dbpedia.org• WikiData will be integrated within the next four monthsvia Google Summer of Code projecthttp://dbpedia.org
  7. 7. ULI meeting – 2013/05/28 – Page 7 http://lod2.euhttp://dbpedia.org/resource/BerlinFirst paragraph in morethan 20 languages
  8. 8. ULI meeting – 2013/05/28 – Page 8 http://lod2.euhttp://dbpedia.org/resource/BerlinFacts from Wikipedia infoboxes
  9. 9. ULI meeting – 2013/05/28 – Page 9 http://lod2.euhttp://dbpedia.org/resource/BerlinSeveralHierarchicalClassifications
  10. 10. ULI meeting – 2013/05/28 – Page 10 http://lod2.euhttp://dbpedia.org/resource/BerlinLinksMultilingual labels
  11. 11. ULI meeting – 2013/05/28 – Page 11 http://lod2.euTrend 1: I18n
  12. 12. ULI meeting – 2013/05/28 – Page 12 http://lod2.eu• DBpedia Extraction Framework can be extended to easily extract any datafrom Wikipedia: https://github.com/dbpedia/extraction-framework• We are using it to extract corpora for NLP• e.g. URI, surrounding text, surface form• Probabilities:• P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc.• P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in textTrend 2: DBpedia 4 NLP
  13. 13. ULI meeting – 2013/05/28 – Page 13 http://lod2.eu• DBpedia is a data dissemination project:• as download for reuse• As Linked Data for interlinking• Corpora will be published via the NLP Interchange RDF Format (NIF) -http://nlp2rdf.orgTrend 2: DBpedia 4 NLP
  14. 14. ULI meeting – 2013/05/28 – Page 14 http://lod2.euDBpedia Live Abbreviation ExampleUp-to-date gazetteer- AFD party was founded earlier this year.- lexical information and statistics could be included
  15. 15. ULI meeting – 2013/05/28 – Page 15 http://lod2.euLinguistic LOD Cloud
  16. 16. ULI meeting – 2013/05/28 – Page 16 http://lod2.eu• DBpedia• Main version and I18n chapters• http://dbpedia.org/Datasets/NLP• Wiktionary 2 RDF: http://dbpedia.org/Wiktionary• Wortschatz from Uni Leipzig (planned as Linked Data)• http://corpora.informatik.uni-leipzig.de/download.html• JRC Names: http://langtech.jrc.it/JRC-Names.html• JRC-Names is a highly multilingual named entity resource for person andorganisation names• Lexvo.org:• provides URIs for ISO 629-3• http://lexvo.org/id/iso639-3/spaExample data sets from LLOD
  17. 17. ULI meeting – 2013/05/28 – Page 17 http://lod2.euhttp://linguistics.okfn.org/resources/llod/=> CLDR will make an excellent addition to LLODLinguistic LOD
  18. 18. ULI meeting – 2013/05/28 – Page 18 http://lod2.eu• CLDR as Linked Data• empowers third parties to link to your authoritative data• links are reusable• LIDER EU project (presumably starting in October) will provide somesupport for linked data adopters• ULI members can join the industry and advisory board• Workshop “DBpedia & NLP” in Oct, 2013• http://nlp-dbpedia2013.blogs.aksw.org/• Creation of free and open benchmarks in RDF• We could promote CLDR and collect contributionsCollaboration points I
  19. 19. ULI meeting – 2013/05/28 – Page 19 http://lod2.eu• Personally, I can:• Join ULI mailing list• Look out for appropriate data• Look for opportunities (e.g. synergies with other projects)• Provide some counseling (e.g. pointers, technology Q&A)=> this will be done as preparation for the LIDER EU project, CLDR• Academic collaboration:• Excellent PhD student topic: Create corpora, interlink and fuse data andbenchmark effectiveness for segmentation• Provide knowledge transfer (e.g. tutorials, visits)Collaboration points II
  20. 20. ULI meeting – 2013/05/28 – Page 20 http://lod2.euOpen Community – All feedback is welcome!http://slideshare.net/kurzumWebsites:http://dbpedia.orghttp://nlp2rdf.orghttp://lod2.euThanks for your attention
  21. 21. ULI meeting – 2013/05/28 – Page 21 http://lod2.euWiktionary Example
  22. 22. ULI meeting – 2013/05/28 – Page 22 http://lod2.euLOD2 EU Project produces LOD2 Stack.Three requirements to unlock Natural Language Processing (NLP) for the project:1. NLP tool output is required to be in RDF2. Scalability (less triples, focus on usefulness)3. Common vocabulary to integrate and use NLP toolsThe NLP Interchange Format (NIF) is an RDF/OWL-based format that aims toachieve interoperability between Natural Language Processing (NLP) tools,language resources and annotations.• Version 1.0 published in November 2011• Version 2.0 is scheduled for completion within 2013NLP Interchange Format 2.0
  23. 23. ULI meeting – 2013/05/28 – Page 23 http://lod2.euNIF Architecture
  24. 24. ULI meeting – 2013/05/28 – Page 24 http://lod2.euAdressing Primary Data
  25. 25. ULI meeting – 2013/05/28 – Page 25 http://lod2.euAdressing Primary DataNIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729NIF 2.0 uses RFC 5147:http://www.w3.org/DesignIssues/LinkedData.html#char=717,729User extensions possible:http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme(but you have to link to documentation on how it was created)
  26. 26. ULI meeting – 2013/05/28 – Page 26 http://lod2.euAs a Web Servicecurl--data-urlencode prefix="http://prefix.given.by/theClient#"--data-urlencode input="[...]"(--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”)http://nlp2rdf.lod2.eu/demo/NIFStanfordCore
  27. 27. ULI meeting – 2013/05/28 – Page 27 http://lod2.eu• Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst• Russian TreeTagger :http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform• German STTS: http://purl.org/olia/stts.owl#VAPP• English Penn: http://purl.org/olia/penn.owl#VBG→ all map to http://purl.org/olia/olia.owl#NonFiniteVerbOntologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (freeand open, CC-By)Vocabulary Module: OLiA
  28. 28. ULI meeting – 2013/05/28 – Page 28 http://lod2.eu• NIF 2.0 tries to be compatible to (Vocabulary Module):• ITS 2.0• FISE used in Apache Stanbol (IKS-EU Project)• LAF/GrAF XML – ISO standard, recently published• Fragment Identifiers by IETF and W3C• Lemon ontology from Monnet EU Project• NERD ontology from EURECOM and LinkedTV EU Project• Xpointer/XPath URI scheme• Open AnnotationNIF 2.0 - plans
  29. 29. ULI meeting – 2013/05/28 – Page 29 http://lod2.euNIF 2.0 :• NIF is free and open (CC-0 or CC-BY)• All ontologies will be hosted for persistently by University Leipzig• Sign up on the mailinglist at http://nlp2rdf.org• Provide Use Cases, Requirements, Implementations at:• http://wiki.nlp2rdf.org/wiki/Use_cases#Use_cases• http://wiki.nlp2rdf.org/wiki/Requirements#RequirementsHow you can contribute:
  30. 30. ULI meeting – 2013/05/28 – Page 30 http://lod2.euLOD 2 Stack• Currently project half-time• Most of the tools are free and open source• Commercial rollout planned• Many webinars available• You can integrate your tool via Debian packagehttp://lod2.euhttp://stack.lod2.eu/How you can contribute:

×