SlideShare a Scribd company logo
Towards Content-Based Dataset Search:
Test Collections and Beyond
Gong Cheng (Nanjing University)
Presented at NTCIR-16 Task Session (Data Search 2), June 17, 2022
2022/6/17 1
2022/6/17 2
Metadata-Based Dataset Search (MBDS)
https://datasetsearch.research.google.com/
2022/6/17 3
Metadata-Based Dataset Search (MBDS)
Metadata for the COLINDA Dataset
Query:
conference location Lyon
Lyon?
☹ Metadata contains limited information.
☹ Metadata suffers from quality issues.
2022/6/17 4
MBDS is easy to implement, but …
2022/6/17 5
Content-Based Dataset Search (CBDS)
Content-Based Snippet for the COLINDA Dataset
Query:
conference location Lyon
☺ Higher Relevance
☺ Better Explainability
☺ Lower Redundancy
2022/6/17 6
CBDS is demanded because …
2022/6/17 7
Research Tasks in CBDS
An architecture of content-based ad hoc dataset retrieval.
 It is non-trivial to
 create content-oriented dataset queries
 make content-based relevance judgments
 because data is
☹ big
☹ complex
2022/6/17 8
We aimed at building a test collection for
ad hoc content-based dataset retrieval, but …
Softic et al., COLINDA: Modeling, Representing and Using Scientific Events in the Web of Data (DeRiVE 2015)
149,020 triples
Metadata Data
2022/6/17 9
So we developed a dashboard for browsing RDF datasets.
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
2022/6/17 10
So we developed a dashboard for browsing RDF datasets.
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
2022/6/17 11
ACORDAR: RDF Datasets
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
 Synthetic Queries
① Browse a dataset
② Write a content summary
③ Extract keywords as a query
 TREC Queries
 Ad hoc topics in the
English Test Collections
of TREC 1–8
2022/6/17 12
ACORDAR: Queries
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
 Retrieval Models
 TF-IDF based cosine similarity
 BM25F
 Fielded Sequential Dependence Model (FSDM)
 Language Model using Dirichlet priors for smoothing (LMD)
 Index Fields
 Metadata Fields: title, description, author, tags
 Data Fields: literals, classes, properties, entities
2022/6/17 13
ACORDAR: Pooling and Qrels
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
Inter-Annotator Agreement:
Krippendorff’s α = 0.59
 Model Configurations
 Default: using all fields
 [m]: using only metadata fields
 [d]: using only data fields
2022/6/17 14
ACORDAR: Evaluation Results
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
Conclusion 2: TREC queries are more difficult.
Conclusion 1: Metadata and data are both useful.
2022/6/17 15
ACORDAR vs NTCIR-15/16
Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
 Research Challenges
 Scalability
 Tractability
 Heterogeneity
 Projectability
 Universality
 Impacts
 to Users
 to Researchers and Developers
 to Data Providers
2022/6/17 16
CBDS is probably a trend.
Contributors to ACORDAR
 Tengteng Lin (NJU)
 Qiaosheng Chen (NJU)
 Ahmet Soylu (OsloMet & NTNU)
 Basil Ell (U Bielefeld & UiO)
 Ruoqi Zhao (NJU)
 Qing Shi (NJU)
 Xiaxia Wang (NJU)
 Yu Gu (OSU)
 Evgeny Kharlamov (Bosch & UiO)
 …
2022/6/17 17
Thanks for your attention.

More Related Content

Similar to Towards Content-Based Dataset Search - Test Collections and Beyond

KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
Nishita Jaykumar
 
3rd International Conference on Data Science and Applications (DSA 2022)
3rd International Conference on Data Science and Applications (DSA 2022)3rd International Conference on Data Science and Applications (DSA 2022)
3rd International Conference on Data Science and Applications (DSA 2022)
ijdms
 
Copy of MongoDB .pptx
Copy of MongoDB .pptxCopy of MongoDB .pptx
Copy of MongoDB .pptx
nehabsairam
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 
4 Sample PPT for Research methodology aa
4 Sample PPT for Research methodology aa4 Sample PPT for Research methodology aa
4 Sample PPT for Research methodology aa
p20230108
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
Péter Király
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
Bhaskar Mitra
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
Zakaria Zubi
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Chris Bizer
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
 
Big dataclasses 2019_nosql
Big dataclasses 2019_nosqlBig dataclasses 2019_nosql
Big dataclasses 2019_nosql
Alexandre BERGERE
 
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB
 
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
ijscai
 
Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]
Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]
Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]
Mumbai B.Sc.IT Study
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
wajrcs
 
CodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdf
CodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdfCodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdf
CodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdf
Matthias Trapp
 
CRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to NodesCRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to Nodes
Nicole Vasilevsky
 
Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack
MongoDB
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD

Similar to Towards Content-Based Dataset Search - Test Collections and Beyond (20)

KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
 
3rd International Conference on Data Science and Applications (DSA 2022)
3rd International Conference on Data Science and Applications (DSA 2022)3rd International Conference on Data Science and Applications (DSA 2022)
3rd International Conference on Data Science and Applications (DSA 2022)
 
Copy of MongoDB .pptx
Copy of MongoDB .pptxCopy of MongoDB .pptx
Copy of MongoDB .pptx
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
4 Sample PPT for Research methodology aa
4 Sample PPT for Research methodology aa4 Sample PPT for Research methodology aa
4 Sample PPT for Research methodology aa
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Big dataclasses 2019_nosql
Big dataclasses 2019_nosqlBig dataclasses 2019_nosql
Big dataclasses 2019_nosql
 
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
 
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
 
Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]
Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]
Next Generation Technologies – Syllabus (2018 - 2019) [Mumbai University]
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
 
CodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdf
CodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdfCodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdf
CodeCV - Mining Expertise of GitHub Users from Coding Activities - Online.pdf
 
CRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to NodesCRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to Nodes
 
Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 

More from Gong Cheng

从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
Gong Cheng
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
Gong Cheng
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Gong Cheng
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
Gong Cheng
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
Gong Cheng
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
Gong Cheng
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
Gong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
Gong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
Gong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
Gong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
Gong Cheng
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Gong Cheng
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
Gong Cheng
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
Gong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Gong Cheng
 
知识的摘要
知识的摘要知识的摘要
知识的摘要
Gong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Gong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Gong Cheng
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
Gong Cheng
 

More from Gong Cheng (20)

从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
知识的摘要
知识的摘要知识的摘要
知识的摘要
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 

Recently uploaded

Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
savindersingh16
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
RDhivya6
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
VetriVel359477
 

Recently uploaded (20)

Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
 

Towards Content-Based Dataset Search - Test Collections and Beyond

  • 1. Towards Content-Based Dataset Search: Test Collections and Beyond Gong Cheng (Nanjing University) Presented at NTCIR-16 Task Session (Data Search 2), June 17, 2022 2022/6/17 1
  • 2. 2022/6/17 2 Metadata-Based Dataset Search (MBDS) https://datasetsearch.research.google.com/
  • 3. 2022/6/17 3 Metadata-Based Dataset Search (MBDS) Metadata for the COLINDA Dataset Query: conference location Lyon Lyon?
  • 4. ☹ Metadata contains limited information. ☹ Metadata suffers from quality issues. 2022/6/17 4 MBDS is easy to implement, but …
  • 5. 2022/6/17 5 Content-Based Dataset Search (CBDS) Content-Based Snippet for the COLINDA Dataset Query: conference location Lyon
  • 6. ☺ Higher Relevance ☺ Better Explainability ☺ Lower Redundancy 2022/6/17 6 CBDS is demanded because …
  • 7. 2022/6/17 7 Research Tasks in CBDS An architecture of content-based ad hoc dataset retrieval.
  • 8.  It is non-trivial to  create content-oriented dataset queries  make content-based relevance judgments  because data is ☹ big ☹ complex 2022/6/17 8 We aimed at building a test collection for ad hoc content-based dataset retrieval, but … Softic et al., COLINDA: Modeling, Representing and Using Scientific Events in the Web of Data (DeRiVE 2015) 149,020 triples Metadata Data
  • 9. 2022/6/17 9 So we developed a dashboard for browsing RDF datasets. Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
  • 10. 2022/6/17 10 So we developed a dashboard for browsing RDF datasets. Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
  • 11. 2022/6/17 11 ACORDAR: RDF Datasets Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
  • 12.  Synthetic Queries ① Browse a dataset ② Write a content summary ③ Extract keywords as a query  TREC Queries  Ad hoc topics in the English Test Collections of TREC 1–8 2022/6/17 12 ACORDAR: Queries Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
  • 13.  Retrieval Models  TF-IDF based cosine similarity  BM25F  Fielded Sequential Dependence Model (FSDM)  Language Model using Dirichlet priors for smoothing (LMD)  Index Fields  Metadata Fields: title, description, author, tags  Data Fields: literals, classes, properties, entities 2022/6/17 13 ACORDAR: Pooling and Qrels Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022) Inter-Annotator Agreement: Krippendorff’s α = 0.59
  • 14.  Model Configurations  Default: using all fields  [m]: using only metadata fields  [d]: using only data fields 2022/6/17 14 ACORDAR: Evaluation Results Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022) Conclusion 2: TREC queries are more difficult. Conclusion 1: Metadata and data are both useful.
  • 15. 2022/6/17 15 ACORDAR vs NTCIR-15/16 Lin et al., ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
  • 16.  Research Challenges  Scalability  Tractability  Heterogeneity  Projectability  Universality  Impacts  to Users  to Researchers and Developers  to Data Providers 2022/6/17 16 CBDS is probably a trend.
  • 17. Contributors to ACORDAR  Tengteng Lin (NJU)  Qiaosheng Chen (NJU)  Ahmet Soylu (OsloMet & NTNU)  Basil Ell (U Bielefeld & UiO)  Ruoqi Zhao (NJU)  Qing Shi (NJU)  Xiaxia Wang (NJU)  Yu Gu (OSU)  Evgeny Kharlamov (Bosch & UiO)  … 2022/6/17 17 Thanks for your attention.