SlideShare a Scribd company logo
1 of 1
Anserini: enabling the use of Lucene for
information retrieval research
Peilin Yang and Hui Fang
Electrical and Computer Engineering Department,
University of Delaware
This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and
the U.S. National Science Foundation under IIS-1423002 and CNS-1405688.
 Information Retrieval Research relies on the open-source
toolkits to evaluate retrieval models
 Efforts are generally directed toward better ranking and less
attention is usually given to scalability and other operational
considerations
 Lucene has become the de facto platform in industry for
building search applications BUT lacks systematic support
for evaluation over standard test collections
Motivation
http://anserini.io/
 Lucene is Slow (because of Java) [FALSE]
 See evaluation results below
 Lucene doesn’t offer modern ranking functions. [FALSE]
 BM25, Dirichlet LM, Axiomatic Models,
Divergence from Randomness Models
 Lucene is difficult to use. [PARTIALLY TRUE]
 Poor Documentation
 Not Intuitive APIs
Dispelling the Myth of Lucene
 Multi-threaded indexing
 Streamlined TREC Adhoc/Web tracks Evaluation
 Relevance Feedback
 Many more to come…
Main Components
Modules
Evaluation
(The lower the better)
Latency
ms/query
Throughput
query/sec
Indri 2403 0.42
Anserini 382 2.61
Models Disk12 ROBUST04 WT2G WT10G GOV2
BM25(I) 0.2040 0.2478 0.3152 0.1955 0.2970
BM25(A) 0.2267 0.2500 0.3015 0.1981 0.3030
LM(I) 0.2269 0.2516 0.3116 0.1915 0.2995
LM(A) 0.2232 0.2465 0.2922 0.2015 0.2951
Retrieval Efficiency
Terabyte 06 efficiency queries
Retrieval Effectiveness
(Mean Average Precision)
Lucene
Core
Word
Embedding
Relevance
Feedback
Learning 2
Rank
Indexing
Doc
Parser
Tokenizer
Stemmer
ClueWeb
TREC
Space lower Porter
Index
Options
Counts
Positions
Searching
Query
Reader
Webxml
TREC
Ranking
Model
Evaluation
BM25
LM
trec_eval AIR
Lucene Module Static Components
Pluggable Module Anserini Extensions
Jimmy Lin
David R. Cheriton School of Computer Science,
University of Waterloo
Disk12
Disk45
AQUAINT
WT2g
Indri
A(doc)
A(pos)
A(cnt)
GOV2
CW09b
CW09
CW12b13
CW12
A(pos)
A(cnt)
Indexing Time
Dual AMD Opteron 6128 processors (2.0GHz, 8 cores) with
40GB RAM running CentOS 6.8
Dual Intel Xeon E5-2699 v4 processors (2.2GHz, 22 cores) and 1
TB RAM running Ubuntu 16.04
Index Size
Collection A(cnt) A(pos)
CW09b 28GB 75GB
CW09 254GB 649GB
CW12b13 29GB 76GB
CW12 376GB 1.1TB
6
Times
Faster
Similar and
Reliable
performance

More Related Content

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Anserini SIGIR 2017 Poster

  • 1. Anserini: enabling the use of Lucene for information retrieval research Peilin Yang and Hui Fang Electrical and Computer Engineering Department, University of Delaware This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the U.S. National Science Foundation under IIS-1423002 and CNS-1405688.  Information Retrieval Research relies on the open-source toolkits to evaluate retrieval models  Efforts are generally directed toward better ranking and less attention is usually given to scalability and other operational considerations  Lucene has become the de facto platform in industry for building search applications BUT lacks systematic support for evaluation over standard test collections Motivation http://anserini.io/  Lucene is Slow (because of Java) [FALSE]  See evaluation results below  Lucene doesn’t offer modern ranking functions. [FALSE]  BM25, Dirichlet LM, Axiomatic Models, Divergence from Randomness Models  Lucene is difficult to use. [PARTIALLY TRUE]  Poor Documentation  Not Intuitive APIs Dispelling the Myth of Lucene  Multi-threaded indexing  Streamlined TREC Adhoc/Web tracks Evaluation  Relevance Feedback  Many more to come… Main Components Modules Evaluation (The lower the better) Latency ms/query Throughput query/sec Indri 2403 0.42 Anserini 382 2.61 Models Disk12 ROBUST04 WT2G WT10G GOV2 BM25(I) 0.2040 0.2478 0.3152 0.1955 0.2970 BM25(A) 0.2267 0.2500 0.3015 0.1981 0.3030 LM(I) 0.2269 0.2516 0.3116 0.1915 0.2995 LM(A) 0.2232 0.2465 0.2922 0.2015 0.2951 Retrieval Efficiency Terabyte 06 efficiency queries Retrieval Effectiveness (Mean Average Precision) Lucene Core Word Embedding Relevance Feedback Learning 2 Rank Indexing Doc Parser Tokenizer Stemmer ClueWeb TREC Space lower Porter Index Options Counts Positions Searching Query Reader Webxml TREC Ranking Model Evaluation BM25 LM trec_eval AIR Lucene Module Static Components Pluggable Module Anserini Extensions Jimmy Lin David R. Cheriton School of Computer Science, University of Waterloo Disk12 Disk45 AQUAINT WT2g Indri A(doc) A(pos) A(cnt) GOV2 CW09b CW09 CW12b13 CW12 A(pos) A(cnt) Indexing Time Dual AMD Opteron 6128 processors (2.0GHz, 8 cores) with 40GB RAM running CentOS 6.8 Dual Intel Xeon E5-2699 v4 processors (2.2GHz, 22 cores) and 1 TB RAM running Ubuntu 16.04 Index Size Collection A(cnt) A(pos) CW09b 28GB 75GB CW09 254GB 649GB CW12b13 29GB 76GB CW12 376GB 1.1TB 6 Times Faster Similar and Reliable performance

Editor's Notes

  1. Title wrong Metric and conclusion of retrieval effectiveness Keep Motivation consistent with the paper