SlideShare a Scribd company logo
1 of 29
Neural Information Retrieval
In search of meaningful progress
Bhaskar Mitra
Principal Applied Scientist
Microsoft
@UnderdogGeek bmitra@microsoft.com
Shout out to all my mentors/collaborators/co-authors over the years!
…and many others
Early days of neural IR
First wave of deep document ranking models
Trained on 200K English
queries from Bing.com
(proprietary dataset) Trained on 95K Chinese
queries from Sogou.com
(public dataset)
Trained using BM25-based
weak labels
But are we making
real progress?
¯_(ツ)_/¯
Passage Ranking Leaderboard
MS MARCO passage ranking benchmark
launches with 0.5M+ English training queries
The myth of “no neural IR model worked before BERT”: first generation deep
ranking models, e.g., Duet and KNRM, and their variants, outperform most
traditional IR methods by reasonable margin on the MS MARCO benchmark
2018
Did neural IR really have a “weak baselines” problem?
I will argue NO: (i) pre-MS MARCO, most neural IR papers benchmarking on Robust04 were NOT trained on
large labeled datasets and represent a biased sample of neural IR papers, and (ii) even in those cases there is
little evidence that these papers employed any weaker baselines than non-neural IR papers
Why is this important?
1. Can’t expect every paper to beat SOTA. Focus is often on hypothesis testing. Check for appropriate
baselines, not SOTA baselines. Improvements over emerging methods (not yet SOTA) should be encouraged.
2. Early generation deep ranking models provided many useful
insights and created the demand for large training dataset. 👏🏽
But we had a BIGGER benchmarking problem
The lack of public IR benchmarks with large scale training
data led to:
Comparisons under low-data regime
 e.g., older TREC collections with few hundred queries
Comparisons on (semi-)synthetic benchmarks
 e.g., TREC CAR
Comparisons under weak supervision training
Comparisons on corpus of language different than what
the models were designed for
Performance of deep models typically
improve with more training data
(image source: The Duet paper) Non-standardized benchmarks also required reimplementation of baselines (specially,
neural baselines) which in turn meant that many of them were under-tuned, in turn,
contributing to the “weak baselines” problem!
The year of BERT
Less than 3 months after the BERT paper hits arXiv the first BERT-based reranking model
achieves 0.359 MRR compared to previous state-of-the-art of 0.281 on MS MARCO
2019
TREC Deep Learning Track (2019)
2020
Document Ranking Leaderboard
+
TREC 2020 Deep Learning Track
Also, this guy
😒👇
Are we making progress?
Deep learning models have gone from
novelty to commodity in communities
like SIGIR—parallels to how learning-to-
rank models “took over” IR
Deep models have demonstrated large
gains over previous state-of-the-art,
and the gap continues to grow
But we must be careful of how we
interpret “progress”, and interrogate the
evidence when it is largely based on a
single benchmark
Are we making meaningful progress?
Internal validity
Overfitting via multiple testing
validity of leaderboard ranking
External validity
Overfitting to single task or data
distribution
Statistical validity
IR metrics and interval-scale
Externalities
No data ≠ no progress
Social harms and ecological costs
Internal validity
Best practice for avoiding multiple testing
 Participate at TREC (single-shot
submission + pooled judgments)
MS MARCO Leaderboard allows multiple
submissions, but we discourage frequent
submissions and metadata updates
Least robust: Reuse TREC test set from
previous year for evaluation—but useful if
we follow strict experiment protocols
Stability of MS MARCO public
leaderboard
Under bootstrap analysis we find the
leaderboard rankings fairly stable!
Very unlikely that a lower-ranked run would
overtake a top-ranked run under
bootstrapping 😊👍
Private leaderboard
We included 45 TREC 2020 queries in
the document ranking eval set
The top leaderboard run has a more
“spread out” rank on the TREC queries
and is overtaken by the best TREC
2020
This may be due to distribution
difference between the two test sets or
the smaller size of the TREC set
External validity
If MS MARCO’s training data were to be only useful for achieving good
results on MS MARCO’s test set, then it’s less useful for the IR community
Important: transfer learning from MS MARCO to other benchmarks
• TREC DL is transfer learning (MS MARCO sparse binary labels  NIST’s
5-point labels)
• Promising results: MS MARCO  Robust04, TREC-COVID, TREC-CAsT
• Med-MARCO (medical subset of MS MARCO)
BERT-scale deep ranking models in
production search systems
Industry impact
Statistical validity
Recent debate by Ferrante et al. on whether IR metrics like RR and NDCG are
interval-scale
Their argument based on representational theory of measurement: we must
satisfy the solvability condition over the empirical set of all possible SERP states
In this example involving the domain set of
all SERPs of length 3 and binary notion of
relevance, this requires the existence of
some SERP corresponding to RR of 0.17
and 0.83
Our position on the interval-scale debate
IR metrics are fundamentally not measurements over SERP states, but over all
possible user-perceived relevance/utility states
It may not be important that we cannot realize a RR value of 0.17 if we believe
there exists some user-perceived relevance state that corresponds to that value
of the metric
Of course, there’s no reason to believe these metrics are interval-scale even with
respect to user-perceived relevance/utility states
How to correctly calibrate these metrics is an interesting area for future research
Externalities
When we create benchmarks, we
implicitly tell the community where to
focus their research
Scenarios without data (e.g., non-English
IR) can suffer consequently
We must also consider the social and
ecological costs of the models that we
are encouraging to be developed
The “IR” in
Neural IR
Are we making meaningful connections
between decades of research on traditional
IR models and recent deep models?
Are we incorporating insights from
traditional IR into deep model design?
Are deep ranking models teaching us
something fundamental about IR?
Compared to the first wave of deep models,
recent BERT-style models are:
1. Harder to interpret
2. Not obvious what they encode
3. Not obvious what we learn about IR
(revisiting probability of relevance and
retrieval for different document lengths)
(IR axioms to guide and
diagnose deep neural models)
(Incorporating properties of traditional
IR approaches into deep models)
Revisiting old debates: verbosity vs. scope hypotheses
A typical recipe for using BERT-style models for document ranking is to compare
the query independently with individual body chunks and then aggregate signals
These neural architectures are
more in line with the scope
hypothesis—what does their
efficacy say about how we should
think about long documents?
New opportunities: Optimizing for new IR measures
Deep models with gradient-
based optimization may allow
deep models for new IR tasks
and metrics
E.g., stochastic ranking and
optimizing for exposure-based
metrics
May be important in the
context of fairness, diversity,
and monetization
New opportunities: data
structure aware ML models
Thinking of neural IR more holistically
Rapid exploration of ML models that can achieve large improvements on
standard IR tasks. But it’s not all about leaderboard chasing.
Thoughtful exploration of how IR and deep learning interacts. what can deep
learning teach IR? What can IR teach deep learning?
Critical conversations about the impact of the technology we build. Centering on
social and ecological impact. Being intentional about where the field is going.
Careful curation of benchmarks and other artifacts to support the research
community. Making it easy to build on each other’s work. Bridging the industry-
academia divide.
Reusable research
artifacts: Data
https://microsoft.github.io/msmarco/ORCAS
http://msmarco.org/
https://microsoft.github.io/msmarco/TREC-Deep-Learning
Reusable research
artifacts: Code
Relatively cheap to reproduce neural baseline that
outperformed all trad + nn runs and two-thirds of all
nnlm runs at TREC 2020 Deep Learning Track
https://github.com/bmitra-msft/TREC-Deep-Learning-Quick-Start
Learning resources
(slides, video)
http://bit.ly/fntir-neural
(slides)
(website)
Thank you!
@UnderdogGeek bmitra@microsoft.com

More Related Content

What's hot

Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Rich Heimann
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241Urjit Patel
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project PresentationAryak Sengupta
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 

What's hot (20)

Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
DS4G
DS4GDS4G
DS4G
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 

Similar to Neural Information Retrieval: In search of meaningful progress

What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POIIRJET Journal
 
Derogatory Comment Classification
Derogatory Comment ClassificationDerogatory Comment Classification
Derogatory Comment ClassificationIRJET Journal
 
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...hamidnazary2002
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and TextNBER
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
25.ranking on data manifold with sink points
25.ranking on data manifold with sink points25.ranking on data manifold with sink points
25.ranking on data manifold with sink pointsVenkatesh Neerukonda
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET Journal
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayAmazon Web Services
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGEVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGIJwest
 
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGEVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGdannyijwest
 

Similar to Neural Information Retrieval: In search of meaningful progress (20)

What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
Derogatory Comment Classification
Derogatory Comment ClassificationDerogatory Comment Classification
Derogatory Comment Classification
 
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Sub1579
Sub1579Sub1579
Sub1579
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
25.ranking on data manifold with sink points
25.ranking on data manifold with sink points25.ranking on data manifold with sink points
25.ranking on data manifold with sink points
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGEVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
 
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGEVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
 

More from Bhaskar Mitra

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural NetworksBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Neu-IR 2017: welcome
Neu-IR 2017: welcomeNeu-IR 2017: welcome
Neu-IR 2017: welcomeBhaskar Mitra
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Bhaskar Mitra
 
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Bhaskar Mitra
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovBhaskar Mitra
 

More from Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Neu-IR 2017: welcome
Neu-IR 2017: welcomeNeu-IR 2017: welcome
Neu-IR 2017: welcome
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
 
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Neural Information Retrieval: In search of meaningful progress

  • 1. Neural Information Retrieval In search of meaningful progress Bhaskar Mitra Principal Applied Scientist Microsoft @UnderdogGeek bmitra@microsoft.com
  • 2. Shout out to all my mentors/collaborators/co-authors over the years! …and many others
  • 3. Early days of neural IR
  • 4. First wave of deep document ranking models Trained on 200K English queries from Bing.com (proprietary dataset) Trained on 95K Chinese queries from Sogou.com (public dataset) Trained using BM25-based weak labels
  • 5. But are we making real progress? ¯_(ツ)_/¯ Passage Ranking Leaderboard MS MARCO passage ranking benchmark launches with 0.5M+ English training queries The myth of “no neural IR model worked before BERT”: first generation deep ranking models, e.g., Duet and KNRM, and their variants, outperform most traditional IR methods by reasonable margin on the MS MARCO benchmark 2018
  • 6. Did neural IR really have a “weak baselines” problem? I will argue NO: (i) pre-MS MARCO, most neural IR papers benchmarking on Robust04 were NOT trained on large labeled datasets and represent a biased sample of neural IR papers, and (ii) even in those cases there is little evidence that these papers employed any weaker baselines than non-neural IR papers Why is this important? 1. Can’t expect every paper to beat SOTA. Focus is often on hypothesis testing. Check for appropriate baselines, not SOTA baselines. Improvements over emerging methods (not yet SOTA) should be encouraged. 2. Early generation deep ranking models provided many useful insights and created the demand for large training dataset. 👏🏽
  • 7. But we had a BIGGER benchmarking problem The lack of public IR benchmarks with large scale training data led to: Comparisons under low-data regime  e.g., older TREC collections with few hundred queries Comparisons on (semi-)synthetic benchmarks  e.g., TREC CAR Comparisons under weak supervision training Comparisons on corpus of language different than what the models were designed for Performance of deep models typically improve with more training data (image source: The Duet paper) Non-standardized benchmarks also required reimplementation of baselines (specially, neural baselines) which in turn meant that many of them were under-tuned, in turn, contributing to the “weak baselines” problem!
  • 8. The year of BERT Less than 3 months after the BERT paper hits arXiv the first BERT-based reranking model achieves 0.359 MRR compared to previous state-of-the-art of 0.281 on MS MARCO 2019
  • 9. TREC Deep Learning Track (2019)
  • 10. 2020 Document Ranking Leaderboard + TREC 2020 Deep Learning Track Also, this guy 😒👇
  • 11. Are we making progress? Deep learning models have gone from novelty to commodity in communities like SIGIR—parallels to how learning-to- rank models “took over” IR Deep models have demonstrated large gains over previous state-of-the-art, and the gap continues to grow But we must be careful of how we interpret “progress”, and interrogate the evidence when it is largely based on a single benchmark
  • 12. Are we making meaningful progress? Internal validity Overfitting via multiple testing validity of leaderboard ranking External validity Overfitting to single task or data distribution Statistical validity IR metrics and interval-scale Externalities No data ≠ no progress Social harms and ecological costs
  • 13. Internal validity Best practice for avoiding multiple testing  Participate at TREC (single-shot submission + pooled judgments) MS MARCO Leaderboard allows multiple submissions, but we discourage frequent submissions and metadata updates Least robust: Reuse TREC test set from previous year for evaluation—but useful if we follow strict experiment protocols
  • 14. Stability of MS MARCO public leaderboard Under bootstrap analysis we find the leaderboard rankings fairly stable! Very unlikely that a lower-ranked run would overtake a top-ranked run under bootstrapping 😊👍
  • 15. Private leaderboard We included 45 TREC 2020 queries in the document ranking eval set The top leaderboard run has a more “spread out” rank on the TREC queries and is overtaken by the best TREC 2020 This may be due to distribution difference between the two test sets or the smaller size of the TREC set
  • 16. External validity If MS MARCO’s training data were to be only useful for achieving good results on MS MARCO’s test set, then it’s less useful for the IR community Important: transfer learning from MS MARCO to other benchmarks • TREC DL is transfer learning (MS MARCO sparse binary labels  NIST’s 5-point labels) • Promising results: MS MARCO  Robust04, TREC-COVID, TREC-CAsT • Med-MARCO (medical subset of MS MARCO)
  • 17. BERT-scale deep ranking models in production search systems Industry impact
  • 18. Statistical validity Recent debate by Ferrante et al. on whether IR metrics like RR and NDCG are interval-scale Their argument based on representational theory of measurement: we must satisfy the solvability condition over the empirical set of all possible SERP states In this example involving the domain set of all SERPs of length 3 and binary notion of relevance, this requires the existence of some SERP corresponding to RR of 0.17 and 0.83
  • 19. Our position on the interval-scale debate IR metrics are fundamentally not measurements over SERP states, but over all possible user-perceived relevance/utility states It may not be important that we cannot realize a RR value of 0.17 if we believe there exists some user-perceived relevance state that corresponds to that value of the metric Of course, there’s no reason to believe these metrics are interval-scale even with respect to user-perceived relevance/utility states How to correctly calibrate these metrics is an interesting area for future research
  • 20. Externalities When we create benchmarks, we implicitly tell the community where to focus their research Scenarios without data (e.g., non-English IR) can suffer consequently We must also consider the social and ecological costs of the models that we are encouraging to be developed
  • 21. The “IR” in Neural IR Are we making meaningful connections between decades of research on traditional IR models and recent deep models? Are we incorporating insights from traditional IR into deep model design? Are deep ranking models teaching us something fundamental about IR? Compared to the first wave of deep models, recent BERT-style models are: 1. Harder to interpret 2. Not obvious what they encode 3. Not obvious what we learn about IR (revisiting probability of relevance and retrieval for different document lengths) (IR axioms to guide and diagnose deep neural models) (Incorporating properties of traditional IR approaches into deep models)
  • 22. Revisiting old debates: verbosity vs. scope hypotheses A typical recipe for using BERT-style models for document ranking is to compare the query independently with individual body chunks and then aggregate signals These neural architectures are more in line with the scope hypothesis—what does their efficacy say about how we should think about long documents?
  • 23. New opportunities: Optimizing for new IR measures Deep models with gradient- based optimization may allow deep models for new IR tasks and metrics E.g., stochastic ranking and optimizing for exposure-based metrics May be important in the context of fairness, diversity, and monetization
  • 25. Thinking of neural IR more holistically Rapid exploration of ML models that can achieve large improvements on standard IR tasks. But it’s not all about leaderboard chasing. Thoughtful exploration of how IR and deep learning interacts. what can deep learning teach IR? What can IR teach deep learning? Critical conversations about the impact of the technology we build. Centering on social and ecological impact. Being intentional about where the field is going. Careful curation of benchmarks and other artifacts to support the research community. Making it easy to build on each other’s work. Bridging the industry- academia divide.
  • 27. Reusable research artifacts: Code Relatively cheap to reproduce neural baseline that outperformed all trad + nn runs and two-thirds of all nnlm runs at TREC 2020 Deep Learning Track https://github.com/bmitra-msft/TREC-Deep-Learning-Quick-Start