Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NJVR: The NanJing Vocabulary Repository


Published on

Presentation given by Yuzhong Qu at CSWS2012.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

NJVR: The NanJing Vocabulary Repository

  1. 1. NJVR: The NanJing Vocabulary Repository Gong Cheng, Min Liu, Yuzhong Qu Nanjing University
  2. 2. Motivation summarization ranking matchingOntology-related research topics A large and representative collection of real-world vocabularies
  3. 3. State of the art Top-down efforts Bottom-up effortsSize: small (hundreds) Size: large (thousands)Access: directly (via browsing) Access: indirectly (via searching) Our goal
  4. 4. Contribution• NJVR: A large and freely-accessible vocabulary repository – Source: An index of 4.1 B RDF triples distributed in 15.9 M RDF documents crawled from 5.8K pay-level domains (PLDs) – Constitution: • RDF descriptions of 2,996 dereferenceable vocabularies crawled from 261 PLDs • Document-level statistical data on their instantiations (e.g. term frequency) – Accessibility: Publicly downloadable
  5. 5. Construction of NJVR1. Crawling2. Vocabulary identification3. Vocabulary instantiation
  6. 6. Crawling (2007—May 2011)1. Initialization (of the URI pool) – Other freely-accessible repositories, e.g. – LOD cloud – Search results, e.g. Swoogle, Google1. URI Dereference and document parsing – package – Jena1. Pool expansion – URIs in parsed documents – Submissions from the users of Falcons
  7. 7. Vocabulary identification• Bottom-up strategy 1. Term: URI that identifies a class/property in its dereference document 2. Vocabulary: Terms in a common namespace are grouped
  8. 8. Results• 455,718 terms – 396,023 classes, 59,868 properties, (many are in YAGO NS)• 2,996 vocabularies – From 261 PLDs , (many are from• Instantiation found for – 115,707 classes (29.2%), e.g. foaf:Person – 25,963 properties (43.4%), e.g. dc:creator – 1,874 vocabularies (62.6%)
  9. 9. Applications of NJVR• Vocabulary ranking• Vocabulary matching• …
  10. 10. NJVR for vocabulary ranking• Using NJVR as a test case for vocabulary ranking
  11. 11. Future work• Removal of low-quality vocabularies from NJVR• Comparative analysis of NJVR and other repositories• …
  12. 12. Just use it!