Benchmarking search relevance in industry vs academia

•Download as PPTX, PDF•

1 like•345 views

Update of my WSDM2017 practice and experience talk (also on slideshare) talking about lessons from industry on the use of offline metrics in information retrieval. Since a key thing is to have more training and test sets, this talk describes our more recent data releases.

Science

Benchmarking search relevance
• Search task: Retrieve documents in response to a query
• Benchmark data: Queries, Corpus, Judgments (a test collection)
• Application-specific benchmarks -> Lots of room for optimization+ML e.g.
incorporating temporal factors in a news search product
• Core IR benchmarks (flat Q, flat D) -> Not always making progress?*
• Core IR task is important
• Unsolved. Fundamental. Building block
• Need benchmarks to encourage progress
* Armstrong, Moffatt, Webber, Zobel.
Improvements That Don’t Add Up:
Ad-Hoc Retrieval Results Since 1998.
CIKM 2009

What does progress look like?
Chris Buckley, Mandar Mitra, Janet A. Walz, and Claire Cardie. "SMART high precision: TREC 7." NIST Special Publication 500-242 TREC-7 (1999)
0
0.1
0.2
0.3
0.4
0.5
0.6
TREC-1
Task
TREC-2
Task
TREC-3
Task
TREC-4
Task
TREC-5
Task
TREC-6
Task TD
TREC-6
Task D
TREC-7
Task
AveragePrecision
Progress
TREC-1 system (1992)
TREC-7 system (1998)

Yang, Wei, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically Examining the “Neural Hype” Weak Baselines and the Additivity of Effectiveness Gains from
Neural Ranking Models. SIGIR 2019.
Three comments on this:
A. Test data is reused too much
B. Baseline is unclear
C. Not enough training data

A. Avoiding test data reuse
• Using multiple querysets in industry
• Make many decisions using queryset 1, few on 2, none on 3
• Refresh querysets often
• Academia: 1) Multiple test collections, 2) Leaderboards can reduce
iteration, 3) Most convincing is one-time submission (e.g. TREC)
• Thought experiment:
Queryset 1:
Find an improvement
Queryset 2: Choose a
release candidate
Queryset 3: Post-
release measurement

B. Production baseline
• Evaluate production ranker changes, which we want to deploy
• Pro: Avoid the weak baseline problem
• Con: Repeated incremental improvements increase complexity
• Pro: Improvements can add up
• Academic options:
• Not sure!
• Winners at TREC/leaderboard may be lucky. Strongest baseline is also lucky
• I would trust a high-ish baseline with SS gains e.g. two runs from one group
Ben Carterette. 2015. The Best Published Result is Random: Sequential
Testing and Its Effect on Reported Effectiveness. In SIGIR ’15.

C. Get more data
200K queries, human-labeled, proprietary
Academic data release:
MS MARCO and TREC DL
In industry
300+K queries, human-labeled, open
Mitra, Diaz and Craswell. Learning to match using local
and distributed representations of text for web search.
WWW 2017
More data
Bettersearchresults

DNN vs 1990s IR
Artist’s impression of total victory
0
0.1
0.2
0.3
0.4
0.5
0.6
TREC-1
Task
TREC-2
Task
TREC-3
Task
TREC-4
Task
TREC-5
Task
TREC-6
Task TD
TREC-6
Task D
TREC-7
Task
Blind
Test
AveragePrecision
TREC-1 SMART
TREC-2 SMART
TREC-3 SMART
TREC-4 SMART
TREC-5 SMART
TREC-6 SMART
TREC-7 SMART
TREC-26+ DNN
Nick Craswell. Neural Models for Full Text Search: Could the
improvements add up? WSDM 2017 Practice and Experience Talk

• We decided to release data: Labels, clicks, etc
• Public leaderboard and TREC track (and code)
• Part of a larger open effort “AI at Scale”

Our external ranking benchmarks
TREC Deep Learning Track
https://msmarco.org
BM25
BERT
Leader

Conclusion: Industry perspective on academia
• Reusing test collections a lot is not something we’d advise
• Are you sure you made no decisions based on robust04
• What if you had another robust04. Would your conclusions stand up?
• Submit to TREC, this is the most reliable way of avoiding overfitting
• With large training data we can significantly beat 1990s methods on
core IR tasks e.g. BERT-style DNN rankers
• Not sure how to handle baselines in academia
• Would trust an experiment where baseline is not too low and there’s a gain

Similar to Benchmarking search relevance in industry vs academia

how to build a Length of Stay model for a ProofOfConcept project

Zenodia Charpy

Overview of the TREC 2019 Deep Learning Track

Nick Craswell

Implementing analytics for development processes is challenging. As in discussed in the previous webinars, the right analytics are determined by the goals of the organization, not by the available data. So implementing your analytics solutions will require an efficient analytics and data architecture, including the ability to combine and stage data from heterogeneous sources. An architecture that excludes the ability to gain access to the necessary data will create a barrier to deploying your newly designed analytics program, and will force you back into the “light is brighter here” anti-pattern. This webinar will describe the technical considerations of implementing the data architecture for your analytics program, and explain how Tasktop can help.

Doing Analytics Right - Building the Analytics Environment

Tasktop

Tom DeMarco states that “You can’t control what you can’t measure”, but how much can we change and control (with) what we measure? This talk investigates the opportunities and limits of data-driven software engineering, shows which opportunities lie ahead of us when we engage in mining and analyzing software engineering process data, but also highlights important factors that influence the success and adaptability of data-based improvement approaches.

Can we induce change with what we measure?

Michaela Greiler

Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.

Best Practices in Recommender System Challenges

Alan Said

Learning by example: training users through high-quality query suggestions

Claudia Hauff

Which institute is best for data science?

DIGITALSAI1

Best Selenium certification course

KumarNaik21

Data Science Online Training In HA comprehensive up-to-date Data Science course that includes all the essential topics of the Data Science domain, presented in a well-thought-out structure. Taught and developed by experienced and certified data professionals, the course goes right from collecting raw digital data to presenting it visually. Suitable for those with computer backgrounds, analytic mindset, and coding knowledge.hyderabad Data Science Online Training #datascienceonlinetraininginhyderabad #datascienceonline #datascienceonlinetraining #datascience

Data science training in hyd ppt (1)

SayyedYusufali

Exploring the EduXfactor Data Science Training program, you will learn components of the Data Science lifecycle such as Big Data, Hadoop, Machine Learning, Deep Learning & R programming. Our professional experts will teach you how to adopt a blend of mathematics, statistics, business acumen, tools, algorithms & machine learning techniques. You will learn how to handle a large amount of data information & process it according to any firm business strategy.

Data science training institute in hyderabad

VamsiNihal

Data science training in Hyderabad

saitejavella

Data science training Hyderabad

Nithinsunil1

Data science online training in hyderabad

VamsiNihal

Overview of Data Science Courses Online A comprehensive up-to-date Data Science course that includes all the essential topics of the Data Science domain, presented in a well-thought-out structure. Taught and developed by experienced and certified data professionals, the course goes right from collecting raw digital data to presenting it visually. Suitable for those with computer backgrounds, analytic mindset, and coding knowledge. What You'll Learn In Data Science Courses Online Grasp the key fundamentals of data science, coding, and machine learning. Develop mastery over essential analytic tools like R, Python, SQL, and more. Comprehend the crucial steps required to solve real-world data problems and get familiar with the methodology to think and work like a Data Scientist. Learn to collect, clean, and analyze big data with R. Understand how to employ appropriate modeling and methods of analytics to extract meaningful data for decision making. Implement clustering methodology, an unsupervised learning method, and a deep neural network (a supervised learning method). Build a data analysis pipeline, from collection to analysis to presenting data visually. #datasciencecoursesonline #datascience #datasciencecourses

Data science training in hyd ppt (1)

SayyedYusufali

data science training and placement

SaiprasadVella

online data science training

DIGITALSAI1

Data science online training in hyderabad

VamsiNihal

A comprehensive up-to-date Data Science course that includes all the essential topics of the Data Science domain, presented in a well-thought-out structure. Taught and developed by experienced and certified data professionals, the course goes right from collecting raw digital data to presenting it visually. Suitable for those with computer backgrounds, analytic mindset, and coding knowledge. Grasp the key fundamentals of data science, coding, and machine learning. Develop mastery over essential analytic tools like R, Python, SQL, and more.

data science online training in hyderabad

VamsiNihal

Best data science training in Hyderabad

KumarNaik21

Data science training Hyderabad

Nithinsunil1

Similar to Benchmarking search relevance in industry vs academia (20)

how to build a Length of Stay model for a ProofOfConcept project

Overview of the TREC 2019 Deep Learning Track

Doing Analytics Right - Building the Analytics Environment

Can we induce change with what we measure?

Best Practices in Recommender System Challenges

Learning by example: training users through high-quality query suggestions

Which institute is best for data science?

Best Selenium certification course

Data science training in hyd ppt (1)

Data science training institute in hyderabad

Data science training in Hyderabad

Data science training Hyderabad

Data science online training in hyderabad

Data science training in hyd ppt (1)

data science training and placement

online data science training

Data science online training in hyderabad

data science online training in hyderabad

Best data science training in Hyderabad

Data science training Hyderabad

Recently uploaded

Bollworms are among the most damaging pests in cotton cultivation, affecting the bolls where the cotton fibers are formed. There are several species of bollworms, each capable of causing significant yield loss and quality degradation if not effectively managed. Here’s a detailed look at the primary bollworm species affecting cotton: Cotton Bollworm (Helicoverpa armigera): Also known as the corn earworm or the Old World bollworm, this pest is found in many regions around the world. It is highly polyphagous (feeds on many different plants) and poses a threat not only to cotton but also to maize, tomatoes, and legumes. The larvae bore into the cotton bolls, feeding on the developing seeds and fibers, which can lead to boll rot. Pink Bollworm (Pectinophora gossypiella): A significant pest of cotton, the pink bollworm larvae infest the cotton bolls, feeding on the seeds and lint. This can severely damage or destroy the bolls. In regions where pink bollworms are prevalent, they have been a major driver for the adoption of genetically engineered Bt cotton, which expresses a bacterium gene toxic to certain insects. Tobacco Budworm (Heliothis virescens): Closely related to the cotton bollworm, the tobacco budworm primarily attacks tobacco but is also a common pest in cotton. It primarily damages the flowers and bolls of the cotton plant. Differentiating between the tobacco budworm and the cotton bollworm based on appearance can be challenging, but it is crucial for effective management. American Bollworm (Helicoverpa zea): Known in some regions as the corn earworm, it is similar in behavior to Helicoverpa armigera and poses a threat to a variety of crops, including cotton. The larvae attack the cotton bolls, leading to direct damage to the cotton lint and seeds. Management Strategies: Cultural Controls: Crop rotation, destruction of crop residues, and deep plowing can help break the pest’s life cycle. Timing of planting can also be adjusted to avoid peak pest infestation. Biological Controls: Natural enemies like Trichogramma wasps, which parasitize bollworm eggs, and predators such as lacewings and ladybugs can be encouraged. Bacillus thuringiensis (Bt) products can also be sprayed, which are particularly effective against young larvae. Chemical Controls: Insecticides may be required when infestation levels exceed economic thresholds. However, resistance management must be considered, alternating modes of action to avoid developing resistance. Genetic Approaches: Bt cotton, genetically modified to express Bacillus thuringiensis toxin, has been highly effective in controlling bollworms and has dramatically reduced the reliance on chemical insecticides. Monitoring and Scouting: Regular field scouting and using pheromone traps to monitor adult populations can help in timely and targeted application of control measures. The effective management of bollworms often requires an integrated approach

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

PirithiRaju

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...

chandars293

Clean In Place(CIP).pptx .

Poonam Aher Patil

Kochi CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL IN We are Providing :- ● – Private independent collage Going girls . ● – independent Models . ● – House Wife’s . ● – Private Independent House Wife’s ● – Corporate M.N.C Working Profiles . ● – Call Center Girls . ● – Live Band Girls . ●- Foreigners & Many More . Service type: 1.In call 2.out call 3. full Lip to Lip kiss 4.69 5.b-job without Condom 6. Hard Core sex & Much More. 7 Body to Body Touch 8 Kissing 9 Sucking Boobs and More 10 Enjoy by Hand 11 Relax By Oral 12 Sex with Happy Ending • In Call and Out Call Service • 3* 5* 7* Hotels Service • 24 Hours Available • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily Escorts Staff Available • Minimum to Maximum Range Available.

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL

kantirani197

GBSN - Biochemistry (Unit 1)

Areesha Ahmad

www.seribangash.com The Mariana Trench is one of the most remarkable geological features on Earth. Here are some details about it: Location: The Mariana Trench is located in the western Pacific Ocean, east of the Mariana Islands. It stretches for about 2,550 kilometers (1,580 miles) and is known as the deepest part of the world's oceans. Depth: The trench reaches incredible depths, with its deepest point known as the Challenger Deep, which plunges down to approximately 10,984 meters (36,037 feet) below sea level. To put this into perspective, if Mount Everest, the tallest mountain on Earth, were placed at the bottom of the Challenger Deep, its peak would still be over 2 kilometers (1.25 miles) underwater. Formation: The Mariana Trench was formed by the subduction of the Pacific Plate beneath the Mariana Plate. This process creates a deep trench as the heavier Pacific Plate is forced beneath the lighter Mariana Plate. Geological Features: The trench is characterized by steep, V-shaped valleys, and its walls are composed of highly compressed sedimentary rock. At the bottom of the trench, there are also large amounts of marine sediment. Pressure: The pressure at the bottom of the Mariana Trench is immense, reaching over 1,000 times the pressure at the surface. This extreme pressure creates a challenging environment for exploration and makes it difficult for organisms to survive. Exploration: Despite its extreme conditions, the Mariana Trench has been the subject of numerous scientific expeditions and explorations. One of the most famous explorations was the dive to the Challenger Deep by Swiss scientist Jacques Piccard and U.S. Navy Lieutenant Don Walsh in 1960. More recently, in 2012, filmmaker James Cameron made a solo dive to the bottom of the Challenger Deep in the Deepsea Challenger submersible. Biological Discoveries: Despite the harsh conditions, the Mariana Trench is home to a surprising variety of life forms, including unique species of deep-sea fish, crustaceans, and microbial life. Some organisms have adapted to survive in the extreme pressure and darkness of the trench. Environmental Importance: Studying the Mariana Trench provides valuable insights into the geology, biology, and oceanography of the deep sea. It also helps scientists better understand the processes that shape the Earth's crust and the distribution of life in the oceans. Conservation: Due to its remote location and extreme depths, the Mariana Trench has remained relatively untouched by human activity. However, there is growing concern about the potential impacts of deep-sea mining and pollution on this fragile ecosystem, highlighting the need for conservation efforts to protect this unique environment. https://seribangash.com/barber-shop-business-complete-guide-for-beginners/ https://seribangash.com/legend-virat-kohli-in-cricket-history/

The Mariana Trench remarkable geological features on Earth.pptx

seri bangash

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

muralinath2

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

sakshisoni2385

Choose Our GTB Nagar Call Girls Through Our WhatsApp Number: - 8250077686 Easy & Fast Booking Procedure of GTB Nagar Call Girls Call girls are ready and waiting to provide the quality essentialness that will help ease away life's tensions GTB Nagar Call Girls are regular companions of elite gentlemen. They can take them places they could only dream of visiting. These ladies take great care in maintaining themselves to look appealing; visiting salons and spas regularly keeps them looking their best; their portfolios can be found online via an escort agency website. Visit our website: https://www.prishagill.com/ Visit our website: https://www.dighacallgirls.com/ Visit our website: https://www.sheetalarora.com/ Visit our website: https://www.streetgirls69.in/

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑

Damini Dixit

GBSN - Microbiology (Unit 3)

Areesha Ahmad

Digital Dentistry.Digital Dentistryvv.pptx

MohamedFarag457087

99992-vip-66834 📞Noida Sector 22 Noida Low price 100% genuine sexy VIP call girls are provided safe and secure service .call 📞,,24 hours 🕰️-- ✅100% gesnuine young RIYA SERVICE COMPANY ✔✔✔ ★ A-Level (5 star ) ★ Strip-tease ★ BBBJ (Bareback Blowjob) Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without a Condom) ★ DATING (Dinner At Night) ★ DSL (Dick Sucking Lips) ★ DT (Dining at the Toes English Spanking) ★ Doggie (Sex style from behind) ★ Duo (shot with two escorts; Threesome with the client) ★ S-GFE (Special Girl Friend Experience) ★ HJ (Hand Job) ★ Special Massage ★ O-Level (Oral sex) ★ Tour (International) ★ 69 (69 sex) ★ BJ (Blowjob With Condom) ★ GFE (Girl Friend Experience) ★ CBJ (Covered Blow Job; Oral sex with a condom _ LOW PRICE V I P MODEL FULL SAFE AND SECURE CALL MExxxs CALL ME

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service

nishacall1

300003-World Science Day For Peace And Development.pptx

ryanrooker

Forensic Biology & Its biological significance.pdf

rohankumarsinghrore1

Bacterial Identification and Classifications

Areesha Ahmad

Conjugation, transduction and transformation

Areesha Ahmad

Zoology 5th semester notes( Sumit_yadav).pdf

Sumit Kumar yadav

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service Best High Class call girls Service Escorts Service in Home Hotel in Delhi NCR 24 Hours Available Service. We provide Super Class Hot and Sexy Indian Female Escorts Service, A to Z Body and Mind Satisfaction by Top Class Female Models in Delhi Gurgaon Noida NCR in Hotel 24hrs In Call and Out Call Service At Lowest Price. We Have Indian Punjabi Kashmiri Northeast Every Type Sexy Bold Beautiful Young Soft Cute Charming Female Escorts Available. Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24hrs in Delhi NCR

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service

monikaservice1

development of diagnostic enzyme assay to detect leuser virus

NazaninKarimi6

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified

Delhi Call girls

Recently uploaded (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...

Clean In Place(CIP).pptx .

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL

GBSN - Biochemistry (Unit 1)

The Mariana Trench remarkable geological features on Earth.pptx

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑

GBSN - Microbiology (Unit 3)

Digital Dentistry.Digital Dentistryvv.pptx

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service

300003-World Science Day For Peace And Development.pptx

Forensic Biology & Its biological significance.pdf

Bacterial Identification and Classifications

Conjugation, transduction and transformation

Zoology 5th semester notes( Sumit_yadav).pdf

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service

development of diagnostic enzyme assay to detect leuser virus

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified

Benchmarking search relevance in industry vs academia

1. Benchmarking search relevance in industry vs academia Nick Craswell Principal Group Science Manager Microsoft WebXT

2. Benchmarking search relevance • Search task: Retrieve documents in response to a query • Benchmark data: Queries, Corpus, Judgments (a test collection) • Application-specific benchmarks -> Lots of room for optimization+ML e.g. incorporating temporal factors in a news search product • Core IR benchmarks (flat Q, flat D) -> Not always making progress?* • Core IR task is important • Unsolved. Fundamental. Building block • Need benchmarks to encourage progress * Armstrong, Moffatt, Webber, Zobel. Improvements That Don’t Add Up: Ad-Hoc Retrieval Results Since 1998. CIKM 2009

3. What does progress look like? Chris Buckley, Mandar Mitra, Janet A. Walz, and Claire Cardie. "SMART high precision: TREC 7." NIST Special Publication 500-242 TREC-7 (1999) 0 0.1 0.2 0.3 0.4 0.5 0.6 TREC-1 Task TREC-2 Task TREC-3 Task TREC-4 Task TREC-5 Task TREC-6 Task TD TREC-6 Task D TREC-7 Task AveragePrecision Progress TREC-1 system (1992) TREC-7 system (1998)

4. Yang, Wei, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically Examining the “Neural Hype” Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. SIGIR 2019. Three comments on this: A. Test data is reused too much B. Baseline is unclear C. Not enough training data

5. Yang, Wei, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically Examining the “Neural Hype” Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. SIGIR 2019. Three comments on this: A. Test data is reused too much B. Baseline is unclear C. Not enough training data

6. Yang, Wei, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically Examining the “Neural Hype” Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. SIGIR 2019. Three comments on this: A. Test data is reused too much B. Baseline is unclear C. Not enough training data

7. A. Avoiding test data reuse • Using multiple querysets in industry • Make many decisions using queryset 1, few on 2, none on 3 • Refresh querysets often • Academia: 1) Multiple test collections, 2) Leaderboards can reduce iteration, 3) Most convincing is one-time submission (e.g. TREC) • Thought experiment: Queryset 1: Find an improvement Queryset 2: Choose a release candidate Queryset 3: Post- release measurement

8. B. Production baseline • Evaluate production ranker changes, which we want to deploy • Pro: Avoid the weak baseline problem • Con: Repeated incremental improvements increase complexity • Pro: Improvements can add up • Academic options: • Not sure! • Winners at TREC/leaderboard may be lucky. Strongest baseline is also lucky • I would trust a high-ish baseline with SS gains e.g. two runs from one group Ben Carterette. 2015. The Best Published Result is Random: Sequential Testing and Its Effect on Reported Effectiveness. In SIGIR ’15.

9. C. Get more data 200K queries, human-labeled, proprietary Academic data release: MS MARCO and TREC DL In industry 300+K queries, human-labeled, open Mitra, Diaz and Craswell. Learning to match using local and distributed representations of text for web search. WWW 2017 More data Bettersearchresults

10. DNN vs 1990s IR Artist’s impression of total victory 0 0.1 0.2 0.3 0.4 0.5 0.6 TREC-1 Task TREC-2 Task TREC-3 Task TREC-4 Task TREC-5 Task TREC-6 Task TD TREC-6 Task D TREC-7 Task Blind Test AveragePrecision TREC-1 SMART TREC-2 SMART TREC-3 SMART TREC-4 SMART TREC-5 SMART TREC-6 SMART TREC-7 SMART TREC-26+ DNN Nick Craswell. Neural Models for Full Text Search: Could the improvements add up? WSDM 2017 Practice and Experience Talk

11. • We decided to release data: Labels, clicks, etc • Public leaderboard and TREC track (and code) • Part of a larger open effort “AI at Scale”

12. Our external ranking benchmarks TREC Deep Learning Track https://msmarco.org BM25 BERT Leader

13. Conclusion: Industry perspective on academia • Reusing test collections a lot is not something we’d advise • Are you sure you made no decisions based on robust04 • What if you had another robust04. Would your conclusions stand up? • Submit to TREC, this is the most reliable way of avoiding overfitting • With large training data we can significantly beat 1990s methods on core IR tasks e.g. BERT-style DNN rankers • Not sure how to handle baselines in academia • Would trust an experiment where baseline is not too low and there’s a gain

Benchmarking search relevance in industry vs academia

Recommended

Recommended

More Related Content

Similar to Benchmarking search relevance in industry vs academia

Similar to Benchmarking search relevance in industry vs academia (20)

Recently uploaded

Recently uploaded (20)

Benchmarking search relevance in industry vs academia