Your SlideShare is downloading. ×
0
NJVR: The NanJing Vocabulary Repository     Gong Cheng, Min Liu, Yuzhong Qu            Nanjing University
Motivation             summarization   ranking              matchingOntology-related research topics         A large and r...
State of the art      Top-down efforts                      Bottom-up effortsSize: small (hundreds)                Size: l...
Contribution• NJVR: A large and freely-accessible vocabulary repository   – Source: An index of 4.1 B RDF triples distribu...
Construction of NJVR1. Crawling2. Vocabulary identification3. Vocabulary instantiation
Crawling (2007—May 2011)1. Initialization (of the URI pool)   –   Other freely-accessible repositories, e.g. pingthesemant...
Vocabulary identification• Bottom-up strategy   1.   Term: URI that identifies a class/property in its dereference        ...
Results• 455,718 terms   – 396,023 classes, 59,868 properties, (many are in YAGO NS)• 2,996 vocabularies   – From 261 PLDs...
Applications of NJVR•   Vocabulary ranking•   Vocabulary matching•   …
NJVR for vocabulary ranking• Using NJVR as a test case for vocabulary ranking
Future work• Removal of low-quality vocabularies from NJVR• Comparative analysis of NJVR and other repositories• …
Just use it! ws.nju.edu.cn/njvrws.nju.edu.cn/falcons
Upcoming SlideShare
Loading in...5
×

NJVR: The NanJing Vocabulary Repository

187

Published on

Presentation given by Yuzhong Qu at CSWS2012.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
187
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Evaluations in these areas require not only systematically generated benchmarks but also, more importantly, a large and representative collection of real-world vocabularies.
  • search engines have indexed thousands of vocabularies, but unfortunately they only provide search services which are inconvenient for researchers to fully exploit the entire indexes.
  • experiment to compare different centrality measures by using NJVR
  • Transcript of "NJVR: The NanJing Vocabulary Repository"

    1. 1. NJVR: The NanJing Vocabulary Repository Gong Cheng, Min Liu, Yuzhong Qu Nanjing University
    2. 2. Motivation summarization ranking matchingOntology-related research topics A large and representative collection of real-world vocabularies
    3. 3. State of the art Top-down efforts Bottom-up effortsSize: small (hundreds) Size: large (thousands)Access: directly (via browsing) Access: indirectly (via searching) Our goal
    4. 4. Contribution• NJVR: A large and freely-accessible vocabulary repository – Source: An index of 4.1 B RDF triples distributed in 15.9 M RDF documents crawled from 5.8K pay-level domains (PLDs) – Constitution: • RDF descriptions of 2,996 dereferenceable vocabularies crawled from 261 PLDs • Document-level statistical data on their instantiations (e.g. term frequency) – Accessibility: Publicly downloadable
    5. 5. Construction of NJVR1. Crawling2. Vocabulary identification3. Vocabulary instantiation
    6. 6. Crawling (2007—May 2011)1. Initialization (of the URI pool) – Other freely-accessible repositories, e.g. pingthesemanticweb.com – LOD cloud – Search results, e.g. Swoogle, Google1. URI Dereference and document parsing – java.net package – Jena1. Pool expansion – URIs in parsed documents – Submissions from the users of Falcons
    7. 7. Vocabulary identification• Bottom-up strategy 1. Term: URI that identifies a class/property in its dereference document 2. Vocabulary: Terms in a common namespace are grouped
    8. 8. Results• 455,718 terms – 396,023 classes, 59,868 properties, (many are in YAGO NS)• 2,996 vocabularies – From 261 PLDs , (many are from w3.org)• Instantiation found for – 115,707 classes (29.2%), e.g. foaf:Person – 25,963 properties (43.4%), e.g. dc:creator – 1,874 vocabularies (62.6%)
    9. 9. Applications of NJVR• Vocabulary ranking• Vocabulary matching• …
    10. 10. NJVR for vocabulary ranking• Using NJVR as a test case for vocabulary ranking
    11. 11. Future work• Removal of low-quality vocabularies from NJVR• Comparative analysis of NJVR and other repositories• …
    12. 12. Just use it! ws.nju.edu.cn/njvrws.nju.edu.cn/falcons
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×