NJVR: The NanJing Vocabulary Repository

•Download as PPT, PDF•

1 like•503 views

Gong Cheng

Presentation given by Yuzhong Qu at CSWS2012.

Technology

NJVR: The NanJing Vocabulary Repository

Gong Cheng, Min Liu, Yuzhong Qu

Nanjing University

Motivation

summarization
ranking
matching

Ontology-related research topics A large and representative
collection of real-world vocabularies

State of the art

Top-down efforts Bottom-up efforts

Size: small (hundreds) Size: large (thousands)
Access: directly (via browsing) Access: indirectly (via searching)

Our goal

Contribution
• NJVR: A large and freely-accessible vocabulary repository
– Source: An index of 4.1 B RDF triples distributed in 15.9 M RDF
documents crawled from 5.8K pay-level domains (PLDs)
– Constitution:
• RDF descriptions of 2,996 dereferenceable vocabularies crawled from 261 PLDs
• Document-level statistical data on their instantiations (e.g. term frequency)
– Accessibility: Publicly downloadable

Construction of NJVR
1. Crawling
2. Vocabulary identification
3. Vocabulary instantiation

Crawling (2007—May 2011)
1. Initialization (of the URI pool)
– Other freely-accessible repositories, e.g. pingthesemanticweb.com
– LOD cloud
– Search results, e.g. Swoogle, Google
1. URI Dereference and document parsing
– java.net package
– Jena
1. Pool expansion
– URIs in parsed documents
– Submissions from the users of Falcons

Vocabulary identification
• Bottom-up strategy
1. Term: URI that identifies a class/property in its dereference
document
2. Vocabulary: Terms in a common namespace are grouped

Results
• 455,718 terms
– 396,023 classes, 59,868 properties, (many are in YAGO NS)
• 2,996 vocabularies
– From 261 PLDs , (many are from w3.org)
• Instantiation found for
– 115,707 classes (29.2%), e.g. foaf:Person
– 25,963 properties (43.4%), e.g. dc:creator
– 1,874 vocabularies (62.6%)

Applications of NJVR
• Vocabulary ranking
• Vocabulary matching
• …

NJVR for vocabulary ranking
• Using NJVR as a test case for vocabulary ranking

Future work
• Removal of low-quality vocabularies from NJVR
• Comparative analysis of NJVR and other repositories
• …

Just use it!
ws.nju.edu.cn/njvr

ws.nju.edu.cn/falcons

Viewers also liked

Web的图结构分析Gong Cheng

s1140177ChiemiHanyu_Thesischiemihanyu

知识的摘要Gong Cheng

Summarizing Semantic DataGong Cheng

Taking up the Gaokao Challenge: An Information Retrieval ApproachGong Cheng

Term Dependence on the Semantic WebGong Cheng

An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...Gong Cheng

Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng

Surviving (and Thriving in) the Online Identity WarsJohn McCrea

What an "RP" WantsJohn McCrea

Falcon-AO: Results for OAEI 2007Gong Cheng

Searching Semantic Web Objects Based on Class HierarchiesGong Cheng

Aflpintensivo1

Viewers also liked (13)

Web的图结构分析

s1140177ChiemiHanyu_Thesis

知识的摘要

Summarizing Semantic Data

Taking up the Gaokao Challenge: An Information Retrieval Approach

Term Dependence on the Semantic Web

An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...

Towards Exploratory Relationship Search: A Clustering-based Approach

Surviving (and Thriving in) the Online Identity Wars

What an "RP" Wants

Falcon-AO: Results for OAEI 2007

Searching Semantic Web Objects Based on Class Hierarchies

Aflp

Similar to NJVR: The NanJing Vocabulary Repository

The Neuroscience Information Framework: Establishing a practical semantic fra...Neuroscience Information Framework

Roeder rocky 2011_46Chris Roeder

Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen

Development of Semantic Web based Disaster Management SystemNIT Durgapur

Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen

LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...locloud

semantic web & natural languageNurfadhlina Mohd Sharef

Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman

Innovative methods for data integration: Linked Data and NLPariadnenetwork

NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...Neuroscience Information Framework

The Neuroscience Information Framework: Making Resources Discoverable for the...Neuroscience Information Framework

Towards embedded Markup of Learning Resources on the WebStefan Dietze

Next Generation Catalogs: Extensible Catalog, David Lindahlyouthelectronix

Collection directions - towards collective collectionslisld

Cwn aat talkAAT Taiwan

Reuse of Structured Data: Semantics, Linkage, and Realizationandrea huang

COAR Resource TypesJochen Schirrwagen

Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa

Dante al tempo del web semanticoLaboratorio di Cultura Digitale, Università di Pisa

Presentation - First International Library Staff Exchange Week, ZagrebIva Vrkic

Similar to NJVR: The NanJing Vocabulary Repository (20)

The Neuroscience Information Framework: Establishing a practical semantic fra...

Roeder rocky 2011_46

Knowledge Organization System (KOS) for biodiversity information resources, G...

Development of Semantic Web based Disaster Management System

Knowledge Organization System (KOS) for biodiversity information resources, G...

LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...

semantic web & natural language

Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...

Innovative methods for data integration: Linked Data and NLP

NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...

The Neuroscience Information Framework: Making Resources Discoverable for the...

Towards embedded Markup of Learning Resources on the Web

Next Generation Catalogs: Extensible Catalog, David Lindahl

Collection directions - towards collective collections

Cwn aat talk

Reuse of Structured Data: Semantics, Linkage, and Realization

COAR Resource Types

Interpretation, Context, and Metadata: Examples from Open Context

Dante al tempo del web semantico

Presentation - First International Library Staff Exchange Week, Zagreb

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Artificial intelligence in cctv survelliance.pptxhariprasad279825

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Training state-of-the-art general text embeddingZilliz

WordPress Websites for Engineers: Elevate Your Brandgvaughan

"ML in Production",Oleksandr BaganFwdays

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Story boards and shot lists for my a level piececharlottematthew16

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf

Powerpoint exploring the locations used in television show Time Clash

Artificial intelligence in cctv survelliance.pptx

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

SIP trunking in Janus @ Kamailio World 2024

"Debugging python applications inside k8s environment", Andrii Soldatenko

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

SAP Build Work Zone - Overview L2-L3.pptx

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

DMCC Future of Trade Web3 - Special Edition

Designing IA for AI - Information Architecture Conference 2024

My Hashitalk Indonesia April 2024 Presentation

Training state-of-the-art general text embedding

WordPress Websites for Engineers: Elevate Your Brand

"ML in Production",Oleksandr Bagan

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Vector Databases 101 - An introduction to the world of Vector Databases

Commit 2024 - Secret Management made easy

Story boards and shot lists for my a level piece

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

NJVR: The NanJing Vocabulary Repository

1. NJVR: The NanJing Vocabulary Repository Gong Cheng, Min Liu, Yuzhong Qu Nanjing University

2. Motivation summarization ranking matching Ontology-related research topics A large and representative collection of real-world vocabularies

3. State of the art Top-down efforts Bottom-up efforts Size: small (hundreds) Size: large (thousands) Access: directly (via browsing) Access: indirectly (via searching) Our goal

4. Contribution • NJVR: A large and freely-accessible vocabulary repository – Source: An index of 4.1 B RDF triples distributed in 15.9 M RDF documents crawled from 5.8K pay-level domains (PLDs) – Constitution: • RDF descriptions of 2,996 dereferenceable vocabularies crawled from 261 PLDs • Document-level statistical data on their instantiations (e.g. term frequency) – Accessibility: Publicly downloadable

5. Construction of NJVR 1. Crawling 2. Vocabulary identification 3. Vocabulary instantiation

6. Crawling (2007—May 2011) 1. Initialization (of the URI pool) – Other freely-accessible repositories, e.g. pingthesemanticweb.com – LOD cloud – Search results, e.g. Swoogle, Google 1. URI Dereference and document parsing – java.net package – Jena 1. Pool expansion – URIs in parsed documents – Submissions from the users of Falcons

7. Vocabulary identification • Bottom-up strategy 1. Term: URI that identifies a class/property in its dereference document 2. Vocabulary: Terms in a common namespace are grouped

8. Results • 455,718 terms – 396,023 classes, 59,868 properties, (many are in YAGO NS) • 2,996 vocabularies – From 261 PLDs , (many are from w3.org) • Instantiation found for – 115,707 classes (29.2%), e.g. foaf:Person – 25,963 properties (43.4%), e.g. dc:creator – 1,874 vocabularies (62.6%)

9. Applications of NJVR • Vocabulary ranking • Vocabulary matching • …

10. NJVR for vocabulary ranking • Using NJVR as a test case for vocabulary ranking

11. Future work • Removal of low-quality vocabularies from NJVR • Comparative analysis of NJVR and other repositories • …

12. Just use it! ws.nju.edu.cn/njvr ws.nju.edu.cn/falcons

Editor's Notes

Evaluations in these areas require not only systematically generated benchmarks but also, more importantly, a large and representative collection of real-world vocabularies.
search engines have indexed thousands of vocabularies, but unfortunately they only provide search services which are inconvenient for researchers to fully exploit the entire indexes.
experiment to compare different centrality measures by using NJVR

NJVR: The NanJing Vocabulary Repository

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to NJVR: The NanJing Vocabulary Repository

Similar to NJVR: The NanJing Vocabulary Repository (20)

More from Gong Cheng

More from Gong Cheng (19)

Recently uploaded

Recently uploaded (20)

NJVR: The NanJing Vocabulary Repository

Editor's Notes