SlideShare a Scribd company logo
1 of 1
Download to read offline
at the NTCIR-15 WWW-3 Task
Kohei Shinden, Atsuki Maruta and Makoto P. Kato (University of Tsukuba)
• NTCIR WWW-3 Task
‒ Ad-hoc search tasks for Web documents
• Proposed search model using BERT (Birch)
‒ Yilmaz et al: Cross-Domain Modeling of Sentence-level Evidence for
Document Retrieval, EMNLP 2019
‒ BERT has been successfully applied to a broad range of NLP tasks
including document ranking tasks
Shinjuku
Pet Pig
Halloween Pictures
Search system
ClueWeb12
Web documents
730 million
Input Output
THE 10 BEST Cheap
Hotels in Tokyo...
Keeping and Caring
for Potbellied Pigs ...
Halloween Picture
Collection 30...
Evaluation
nDCG
0.6
⁝
0.5
80 Topics
Input
Ranked documents
• Applying the relevance between sentences learned in QA and
Microblog search datasets to ad hoc searches
‒ Estimate the relevance of the sentences in the query and document
BERT
Model
MB
BERT
Halloween Pictures
Microblog
Data
Trick or Treat...
0.7
Children get candy...
0.3
カボチャを使...
0.1
0.4
BERT + BM25 Score
0.6
Sentence-Level
Relevance
Judgments Model
BM25
Score
BERT
Score
Sentences Document
⁝
Using Query and
Sentence Relevance
BERT cannot handle Text
at the Document Level
Less Query and Sentence
Relevance Data in
Ad-hoc Search Tasks
Using Models Learned in
Other Tasks
Use Test Collection
• MS MARCO: QA Task
• TREC CAR : A is a complex QA task
• TREC MB : Tweet search task
• A typical web document exceeds
BERT's maximum input
sequence length of 512 tokens.
• Linear sum of the BM25 and the score of the highest
BERT-scoring sentence in the document
‒ Assuming that the most relevant sen-tences in a document are
good indicators of the document-levelrelevance
• BERTi : The BERT score of the sentence with the i-th highest BERT score in the document
• wi : Hyperparameter
[1] Yilmaz et al: Cross-Domain Modeling of Sentence-level
Evidence for Document Retrieval, EMNLP 2019
!"#$% = '(25 + , -!
!
"#$
. '/01!
KASYS for NEW Runs
Birch (Yilmaz et al, 2019)
Key Ideas in Birch
Details of Birch
Experimental Results
Achieved the best performances in terms of nDCG, Q, iRBU
• KASYS-E-CO-NEW-1 (Fine-tuning with MS MARCO & TREC MB)
• KASYS-E-CO-NEW-4 (Fine-tuning with MS MARCO & TREC MB)
• KASYS-E-CO-NEW-5 (Fine-tuning with TREC CAR & TREC MB)
KASYS for REP Runs
Replicating and reproducing the THUIR runs
at the NTCIT-14 WWW-2 Task
Whether the results across models are consistent in each experiment
THUIR KASYS(ours)
BM25 BM25
LambdaMART
(learning-to-rank model)
LambdaMART
(learning-to-rank-model)
<
<
❓
disney
switch
Canon
‥‥
Clueweb
Collection
Ranked by
BM25
algorithm
input output
Disney shop
Tokyo Disney
resort
Disney
official
‥‥
Ranked web documents
1st
2nd
3rd
input
Feature
extracting
program
Extracted eight features
・Clueweb : A dataset with about 730 million pages of English web pages
Using tf, idf, docement
length, BM25, LMIR as
features
Up to BM25
LamdbaMART from here
WWW-2 and WWW-3 topics
honda
Pokemon
ice age
‥‥
Extraction
feature
program
qid:001 1:0.2 ‥
qid:001 1:0.5 ‥
qid:001 1:0.1 ‥
qid:001 1:0.9 ‥
output
‥‥
Query and features from document
LambdaMART
input
MQ Track WWW-1 test
collection
train validate
Disney
official
Disney shop
Tokyo Disney
resort
Re-ranked web document
1st
2nd
3rd
‥‥
output
Evaluate
Replication Procedure
Implementation Details
Experimental Results
• Features for learning to rank
• TF, IDF, TF-IDF, document length, BM25 score, and language-
model-based IR scores
• The differences from the original run (THUIR)
• Although THUIR extracted the features from four fields (whole
document, anchor text, title, URL),
we extracted from only the whole document
• Normalized by maximum and minimum values because the
normalization of features was not described in THUIR paper
0.43
0.53
AdaRank LamdbaMART Coordinate Ascent BM25
0.3
0.35
0.4
0.28
0.33
nDCG@10 Q@10 nERR@10
Ours Original
Ours Original Ours Original
Successfully replicated with the original WWW-2 qrels
Original WWW-2 qrels
WWW-3 official results
Seems that KASYS
failed to replicate
and reproduce the
THUIR run

More Related Content

What's hot

Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
Ashish Garg
 
Magnitude comparator
Magnitude comparatorMagnitude comparator
Magnitude comparator
Jessan Khan
 

What's hot (14)

Cluster Forest
Cluster ForestCluster Forest
Cluster Forest
 
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
 
Image Inpainting
Image InpaintingImage Inpainting
Image Inpainting
 
Currency recognition on mobile phones
Currency recognition on mobile phonesCurrency recognition on mobile phones
Currency recognition on mobile phones
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
 
Scaling algebraic multigrid to over 287K processors
Scaling algebraic multigrid to over 287K processorsScaling algebraic multigrid to over 287K processors
Scaling algebraic multigrid to over 287K processors
 
Advanced Techniques for Mobile Robotics
Advanced Techniques for Mobile RoboticsAdvanced Techniques for Mobile Robotics
Advanced Techniques for Mobile Robotics
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex scene
 
Recommender System Challenge
Recommender System ChallengeRecommender System Challenge
Recommender System Challenge
 
Optforml
OptformlOptforml
Optforml
 
Finding Maximum Edge Biclique in Bipartite Networks by Integer Programming
Finding Maximum Edge Biclique in Bipartite Networks by Integer ProgrammingFinding Maximum Edge Biclique in Bipartite Networks by Integer Programming
Finding Maximum Edge Biclique in Bipartite Networks by Integer Programming
 
Magnitude comparator
Magnitude comparatorMagnitude comparator
Magnitude comparator
 

Similar to NTCIR-15 www-3 kasys poster

Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) FrameworkMultimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
theijes
 

Similar to NTCIR-15 www-3 kasys poster (20)

Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Weaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdfWeaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdf
 
Dataset Culling: Towards Efficient Training of Distillation based Domain Spec...
Dataset Culling: Towards Efficient Training of Distillation based Domain Spec...Dataset Culling: Towards Efficient Training of Distillation based Domain Spec...
Dataset Culling: Towards Efficient Training of Distillation based Domain Spec...
 
Object Detection & Machine Learning Paper
 Object Detection & Machine Learning Paper Object Detection & Machine Learning Paper
Object Detection & Machine Learning Paper
 
Deep feature synthesis
Deep feature synthesisDeep feature synthesis
Deep feature synthesis
 
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) FrameworkMultimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
 
Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object RetrievalRegion-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi Rides
 
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
 
PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni   PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni
 
Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

NTCIR-15 www-3 kasys poster

  • 1. at the NTCIR-15 WWW-3 Task Kohei Shinden, Atsuki Maruta and Makoto P. Kato (University of Tsukuba) • NTCIR WWW-3 Task ‒ Ad-hoc search tasks for Web documents • Proposed search model using BERT (Birch) ‒ Yilmaz et al: Cross-Domain Modeling of Sentence-level Evidence for Document Retrieval, EMNLP 2019 ‒ BERT has been successfully applied to a broad range of NLP tasks including document ranking tasks Shinjuku Pet Pig Halloween Pictures Search system ClueWeb12 Web documents 730 million Input Output THE 10 BEST Cheap Hotels in Tokyo... Keeping and Caring for Potbellied Pigs ... Halloween Picture Collection 30... Evaluation nDCG 0.6 ⁝ 0.5 80 Topics Input Ranked documents • Applying the relevance between sentences learned in QA and Microblog search datasets to ad hoc searches ‒ Estimate the relevance of the sentences in the query and document BERT Model MB BERT Halloween Pictures Microblog Data Trick or Treat... 0.7 Children get candy... 0.3 カボチャを使... 0.1 0.4 BERT + BM25 Score 0.6 Sentence-Level Relevance Judgments Model BM25 Score BERT Score Sentences Document ⁝ Using Query and Sentence Relevance BERT cannot handle Text at the Document Level Less Query and Sentence Relevance Data in Ad-hoc Search Tasks Using Models Learned in Other Tasks Use Test Collection • MS MARCO: QA Task • TREC CAR : A is a complex QA task • TREC MB : Tweet search task • A typical web document exceeds BERT's maximum input sequence length of 512 tokens. • Linear sum of the BM25 and the score of the highest BERT-scoring sentence in the document ‒ Assuming that the most relevant sen-tences in a document are good indicators of the document-levelrelevance • BERTi : The BERT score of the sentence with the i-th highest BERT score in the document • wi : Hyperparameter [1] Yilmaz et al: Cross-Domain Modeling of Sentence-level Evidence for Document Retrieval, EMNLP 2019 !"#$% = '(25 + , -! ! "#$ . '/01! KASYS for NEW Runs Birch (Yilmaz et al, 2019) Key Ideas in Birch Details of Birch Experimental Results Achieved the best performances in terms of nDCG, Q, iRBU • KASYS-E-CO-NEW-1 (Fine-tuning with MS MARCO & TREC MB) • KASYS-E-CO-NEW-4 (Fine-tuning with MS MARCO & TREC MB) • KASYS-E-CO-NEW-5 (Fine-tuning with TREC CAR & TREC MB) KASYS for REP Runs Replicating and reproducing the THUIR runs at the NTCIT-14 WWW-2 Task Whether the results across models are consistent in each experiment THUIR KASYS(ours) BM25 BM25 LambdaMART (learning-to-rank model) LambdaMART (learning-to-rank-model) < < ❓ disney switch Canon ‥‥ Clueweb Collection Ranked by BM25 algorithm input output Disney shop Tokyo Disney resort Disney official ‥‥ Ranked web documents 1st 2nd 3rd input Feature extracting program Extracted eight features ・Clueweb : A dataset with about 730 million pages of English web pages Using tf, idf, docement length, BM25, LMIR as features Up to BM25 LamdbaMART from here WWW-2 and WWW-3 topics honda Pokemon ice age ‥‥ Extraction feature program qid:001 1:0.2 ‥ qid:001 1:0.5 ‥ qid:001 1:0.1 ‥ qid:001 1:0.9 ‥ output ‥‥ Query and features from document LambdaMART input MQ Track WWW-1 test collection train validate Disney official Disney shop Tokyo Disney resort Re-ranked web document 1st 2nd 3rd ‥‥ output Evaluate Replication Procedure Implementation Details Experimental Results • Features for learning to rank • TF, IDF, TF-IDF, document length, BM25 score, and language- model-based IR scores • The differences from the original run (THUIR) • Although THUIR extracted the features from four fields (whole document, anchor text, title, URL), we extracted from only the whole document • Normalized by maximum and minimum values because the normalization of features was not described in THUIR paper 0.43 0.53 AdaRank LamdbaMART Coordinate Ascent BM25 0.3 0.35 0.4 0.28 0.33 nDCG@10 Q@10 nERR@10 Ours Original Ours Original Ours Original Successfully replicated with the original WWW-2 qrels Original WWW-2 qrels WWW-3 official results Seems that KASYS failed to replicate and reproduce the THUIR run