SlideShare a Scribd company logo
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Solr Highlighting at Full Speed
Timothy M. Rodriguez
Verticals Search Team Lead, Bloomberg
David Smiley
Search Developer/Consultant
3
Agenda
§  Legal Search
§  Business Requirements
§  Highlighters Overview
§  Improving the Standard Highlighter
§  Unified Highlighter
§  Questions
01
Bloomberg Law
§  Suite of legal business tools
for lawyers and legal
professionals
§  Business development
§  Drafting
§  Analytics
§  Search
5
Legal Search
§  Recall Matters
§  Large Documents
§  Citizens United is 130 pages
long
§  Some in the 100s of MB
§  Researchers rely on
highlighting to help them
decide if they should read a
document
01
Requirements
ü
Accuracy
§  Legal users issue detailed searches
§  “cafeteria plan” AND tax
§  Custom Span Queries
§  “insurance fraud” /s conviction
“Just Right” Digest Sizing
9
Full Document Highlighting
01
Zone Highlighting
11
Speed
01
12
Solr Highlighters Overview
01
Highlighter Offset Source Accuracy Speed
Default/Standard Highlighter
Analysis
Better
Slowest
Term Vectors Slow
Fast Vector Highlighter Term Vectors Good Medium
Postings Highlighter Postings (+ Analysis for wildcards) Okay Fast*
But poor wildcard
performance
Offset Source & Index Size
§  Analysis requires no extra data on-disk
§  But analyzing text on the fly is expensive
§  Term vectors are heavy
§  Adding offsets to postings is much lighter than TVs
0 0.5 1 1.5 2 2.5 3
Multiples of Stored Value
StoredValue
Terms
Positions
Offsets
TV Terms
TV Positions
TV Offsets
14
Initial Attempt
§  Chose the default highlighter and added in customizations as needed
§  Added payload support to the MemoryIndex - LUCENE-6155 v5.0
§  We investigated using the PostingsHighlighter and
FastVectorHighlighter but the accuracy trade-offs were not acceptable
for our users
§  Ran into performance problems as highlighting was taking the bulk of
our execution time
01
15
Make it Faster
§  Improvements to the Standard Highlighter
§  Fast uninverting of term vectors to a token stream – LUCENE-6031
v5.0 (remove expensive sort)
§  Rely on Term Vectors for phrase highlighting instead of the analyzing
into the MemoryIndex – LUCENE-6034 v5.0
Still not fast enough…
01
16
Multithreaded Highlighting
§  Highlighting each returned doc is easily parallelizable
§  Greatly improved performance
§  Greatly increased memory consumption
Still a sizeable fraction of our query times…
01
17
Re-evaluating
§  Didn’t look like we could make the Standard Highlighter much faster
§  Perhaps we could federate to one of the highlighters based on the
query?
§  Our customizations would have to be ported to each of the
highlighters
§  Work would need to be repeated 3x
§  Increased disk utilization from adding postings to the main index
01
18
Enhance the Postings Highlighter?
§  Fastest of the bunch
§  Add accuracy at least as good as the standard highlighter
§  Add support for the other offset sources too
§  (supports our full-doc-highlighting use-case)
§  But it’s a big job with major internal highlighter surgery…
Let’s do it!
01
Offsets Overview
§  Getting character offsets is key to highlighting. 3 ways:
§  Analysis:
§  Analyzer è TokenStream è OffsetAttribute,
oa.startOffset()
§  Term vectors:
§  IndexReader.getTermVector(docId,field) è Terms è
TermsEnum, te.postings(…, PostingsEnum.OFFSETS) è
PostingsEnum è pe.startOffset()
§  Postings:
§  IndexReader è LeafReader è Terms è … (see above)
PostingsHighlighter Algorithm
1.  Fetches all stored-value content needed up-front
2.  Highlight in field sorted order, then doc sorted order loop:
1.  Get PostingsEnum from a Terms for each query term
2.  MTQs: Fake PostingsEnum around filtered TokenStream
3.  Process PostingsEnum[ ] into Passage[ ]
java.text.BreakIterator: for passage delineation
PassageScorer: for passage scoring (BM25 default)
4.  PassageFormatter: for formatting/mark-up
UnifiedHighlighter
§  Forked PH; given new name agnostic of offset source
§  Mostly same PH API; internals re-arranged and expanded
§  Solr adapter is nearly identical too
§  Untouched: Passage, PassageScorer, PassageFormatter
§  Re-uses some standard-Highlighter code too:
§  WeightedSpanTermExtractor (for phrase accuracy)
§  TokenStreamFromTermVector (for wildcards/MTQs)
UH: Accurate Phrases
(including any SpanQuery)
§  Convert position-sensitive Queries to SpanQueries
§  Re-use WeightedSpanTermExtractor (WSTE) for this
§  Wrap PostingsEnum for position-sensitive words with one that filters
by position-span extracted from span queries
§  Custom: WSTE is not used for this, although it’s similar
§  Note: not 100% accurate with query but very good
UH: Analysis Offset Source
§  The most difficult offset source…
§  Honor positionIncrementGap for multi-valued data
§  Populates a MemoryIndex when query has phrases
§  But smartly filters irrelevant terms! (new trick)
§  Wildcards/MTQs too? Uninvert MemoryIndex with re-used
TokenStreamFromTermVector
§  If just terms, treat them like wildcards to avoid MemoryIndex usage
UH: Postings Plus Light TVs
§  Postings offset source is great, but not for MTQs (wildcards)
§  MTQs need to see all terms in just the document
§  A plain term vector (no offsets or postings) has that!
§  Trick:
§  Wrap the main index with a term vector’s TermsEnum
§  Then TokenStreamFromTermVector for MTQ
25
Benchmark Results
01
§  Unified Highlighter performed
similarly or better than peers
§  Best performance: Postings
with “light” Term Vectors
§  No use case for full term
vectors anymore?
§  Caveats
§  Substantial variability in test
runs (YMMV)
§  Depends on the specifics of
your use case
§  Benchmark code available
Highlighter Offset Source Terms Phrases Wildcards
(search) N/A 1.0x! 1.0x! 1.0x	
  
Standard Highlighter Analysis 4.6x! 4.7x! 7.4x	
  
Unified Highlighter Analysis 2.8x! 2.4x! 3.7x	
  
Standard Highlighter Term Vectors 2.7x! 2.3x! 3.7x	
  
Fast Vector Highlighter Term Vectors 1.8x! 2.1x! 2.6x	
  
Unified Highlighter Term Vectors 1.7x! 1.8x! 2.3x	
  
Postings Highlighter Postings 1.8x! 1.5x! 3.8x	
  
Unified Highlighter Postings 1.6x! 1.3x! 3.8x	
  
Unified Highlighter Postings with Term Vectors* 1.5x! 1.3x! 2.2x	
  
Times shown in multiples of the original
search time (top row).
26
Future Potential Improvements
§  Accuracy
§  Switch from WSTE approach to SpanCollector API
§  Honor conjunctions “(X AND Y) OR Z”
§  Relevancy
§  Consider term diversity across top-X passages
§  Incorporate query boosts in passage scores
§  Support “requireFieldMatch=false”
01
Summary
§  Importance of highlighting in Legal Search
§  Overview of the existing Highlighters
§  Improvements to the Standard Highlighter
§  UnifiedHighlighter
§  Contributed to Lucene/Solr! LUCENE-7438
§  Your new favorite highlighter?
28
Questions?
01
Timothy M. Rodriguez
Verticals Search Team Lead, Bloomberg
@Timothy055
David Smiley
Search Developer/Consultant
@DavidWSmiley

More Related Content

What's hot

Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
MatthewOverstreet2
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Lucidworks
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
Alessandro Benedetti
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
Trey Grainger
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
Kevin Watters
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
lucenerevolution
 
SharePoint 2013 search improvements
SharePoint 2013 search improvementsSharePoint 2013 search improvements
SharePoint 2013 search improvements
Kunaal Kapoor
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
Trey Grainger
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
Lucidworks
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
National Inistitute of Informatics (NII), Tokyo, Japann
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Trey Grainger
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Sease
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
Sease
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Lucidworks
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Trey Grainger
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Lucidworks
 

What's hot (18)

Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters Kluwer
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
SharePoint 2013 search improvements
SharePoint 2013 search improvementsSharePoint 2013 search improvements
SharePoint 2013 search improvements
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 

Viewers also liked

Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Lucidworks
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Lucidworks
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
Lucidworks
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Lucidworks
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Lucidworks
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
Lucidworks
 
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Lucidworks
 
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMUnderstanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Lucidworks
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
Lucidworks
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Lucidworks
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
Anshum Gupta
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
Anshum Gupta
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
Anshum Gupta
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
lucenerevolution
 
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkWebinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Lucidworks
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Lucidworks
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
Anshum Gupta
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Lucidworks
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Lucidworks
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
Anshum Gupta
 

Viewers also liked (20)

Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
 
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMUnderstanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkWebinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 

Similar to Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & David Smiley, D W Smiley, LLC

MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)
Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)
Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)
Vivek Chawla
 
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
TAUS - The Language Data Network
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
Masoud Kalali
 
Five Enterprise Development Best Practices That EVERY Salesforce Org Can Use
Five Enterprise Development Best Practices That EVERY Salesforce Org Can UseFive Enterprise Development Best Practices That EVERY Salesforce Org Can Use
Five Enterprise Development Best Practices That EVERY Salesforce Org Can Use
Salesforce Developers
 
Publishing Data to REST APIs with Lightning Process Builder
Publishing Data to REST APIs with Lightning Process BuilderPublishing Data to REST APIs with Lightning Process Builder
Publishing Data to REST APIs with Lightning Process Builder
Scott Coleman
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
rcmuir
 
How To Use Selenium Successfully
How To Use Selenium SuccessfullyHow To Use Selenium Successfully
How To Use Selenium Successfully
Dave Haeffner
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
Betclic Everest Group Tech Team
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
Volodymyr Kraietskyi
 
Oleksandr Khotemskyi - Serverless architecture and how to apply it in Automa...
Oleksandr Khotemskyi  - Serverless architecture and how to apply it in Automa...Oleksandr Khotemskyi  - Serverless architecture and how to apply it in Automa...
Oleksandr Khotemskyi - Serverless architecture and how to apply it in Automa...
Web Tech Fun
 
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Dakiry
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
Christian Berg
 
MDM: Integrating Oracle PIM & iStore
MDM: Integrating Oracle PIM & iStoreMDM: Integrating Oracle PIM & iStore
MDM: Integrating Oracle PIM & iStore
AXIA Consulting Inc.
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Lucidworks
 
Advantages of DITA for the Life Sciences
Advantages of DITA for the Life SciencesAdvantages of DITA for the Life Sciences
Advantages of DITA for the Life Sciences
dclsocialmedia
 
Agile Mumbai 2020 Conference | How to get the best ROI on Your Test Automati...
Agile Mumbai 2020 Conference |  How to get the best ROI on Your Test Automati...Agile Mumbai 2020 Conference |  How to get the best ROI on Your Test Automati...
Agile Mumbai 2020 Conference | How to get the best ROI on Your Test Automati...
AgileNetwork
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
Rohan Rasane
 
Query processing
Query processingQuery processing
Query processing
Ravinder Kamboj
 
Gherkin model1
Gherkin model1Gherkin model1

Similar to Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & David Smiley, D W Smiley, LLC (20)

MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)
Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)
Five Enterprise Best Practices That EVERY Salesforce Org Can Use (DF15 Session)
 
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
 
Five Enterprise Development Best Practices That EVERY Salesforce Org Can Use
Five Enterprise Development Best Practices That EVERY Salesforce Org Can UseFive Enterprise Development Best Practices That EVERY Salesforce Org Can Use
Five Enterprise Development Best Practices That EVERY Salesforce Org Can Use
 
Publishing Data to REST APIs with Lightning Process Builder
Publishing Data to REST APIs with Lightning Process BuilderPublishing Data to REST APIs with Lightning Process Builder
Publishing Data to REST APIs with Lightning Process Builder
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
How To Use Selenium Successfully
How To Use Selenium SuccessfullyHow To Use Selenium Successfully
How To Use Selenium Successfully
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
Oleksandr Khotemskyi - Serverless architecture and how to apply it in Automa...
Oleksandr Khotemskyi  - Serverless architecture and how to apply it in Automa...Oleksandr Khotemskyi  - Serverless architecture and how to apply it in Automa...
Oleksandr Khotemskyi - Serverless architecture and how to apply it in Automa...
 
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
MDM: Integrating Oracle PIM & iStore
MDM: Integrating Oracle PIM & iStoreMDM: Integrating Oracle PIM & iStore
MDM: Integrating Oracle PIM & iStore
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
Advantages of DITA for the Life Sciences
Advantages of DITA for the Life SciencesAdvantages of DITA for the Life Sciences
Advantages of DITA for the Life Sciences
 
Agile Mumbai 2020 Conference | How to get the best ROI on Your Test Automati...
Agile Mumbai 2020 Conference |  How to get the best ROI on Your Test Automati...Agile Mumbai 2020 Conference |  How to get the best ROI on Your Test Automati...
Agile Mumbai 2020 Conference | How to get the best ROI on Your Test Automati...
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Query processing
Query processingQuery processing
Query processing
 
Gherkin model1
Gherkin model1Gherkin model1
Gherkin model1
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & David Smiley, D W Smiley, LLC

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. Solr Highlighting at Full Speed Timothy M. Rodriguez Verticals Search Team Lead, Bloomberg David Smiley Search Developer/Consultant
  • 3. 3 Agenda §  Legal Search §  Business Requirements §  Highlighters Overview §  Improving the Standard Highlighter §  Unified Highlighter §  Questions 01
  • 4. Bloomberg Law §  Suite of legal business tools for lawyers and legal professionals §  Business development §  Drafting §  Analytics §  Search
  • 5. 5 Legal Search §  Recall Matters §  Large Documents §  Citizens United is 130 pages long §  Some in the 100s of MB §  Researchers rely on highlighting to help them decide if they should read a document 01
  • 7. Accuracy §  Legal users issue detailed searches §  “cafeteria plan” AND tax §  Custom Span Queries §  “insurance fraud” /s conviction
  • 12. 12 Solr Highlighters Overview 01 Highlighter Offset Source Accuracy Speed Default/Standard Highlighter Analysis Better Slowest Term Vectors Slow Fast Vector Highlighter Term Vectors Good Medium Postings Highlighter Postings (+ Analysis for wildcards) Okay Fast* But poor wildcard performance
  • 13. Offset Source & Index Size §  Analysis requires no extra data on-disk §  But analyzing text on the fly is expensive §  Term vectors are heavy §  Adding offsets to postings is much lighter than TVs 0 0.5 1 1.5 2 2.5 3 Multiples of Stored Value StoredValue Terms Positions Offsets TV Terms TV Positions TV Offsets
  • 14. 14 Initial Attempt §  Chose the default highlighter and added in customizations as needed §  Added payload support to the MemoryIndex - LUCENE-6155 v5.0 §  We investigated using the PostingsHighlighter and FastVectorHighlighter but the accuracy trade-offs were not acceptable for our users §  Ran into performance problems as highlighting was taking the bulk of our execution time 01
  • 15. 15 Make it Faster §  Improvements to the Standard Highlighter §  Fast uninverting of term vectors to a token stream – LUCENE-6031 v5.0 (remove expensive sort) §  Rely on Term Vectors for phrase highlighting instead of the analyzing into the MemoryIndex – LUCENE-6034 v5.0 Still not fast enough… 01
  • 16. 16 Multithreaded Highlighting §  Highlighting each returned doc is easily parallelizable §  Greatly improved performance §  Greatly increased memory consumption Still a sizeable fraction of our query times… 01
  • 17. 17 Re-evaluating §  Didn’t look like we could make the Standard Highlighter much faster §  Perhaps we could federate to one of the highlighters based on the query? §  Our customizations would have to be ported to each of the highlighters §  Work would need to be repeated 3x §  Increased disk utilization from adding postings to the main index 01
  • 18. 18 Enhance the Postings Highlighter? §  Fastest of the bunch §  Add accuracy at least as good as the standard highlighter §  Add support for the other offset sources too §  (supports our full-doc-highlighting use-case) §  But it’s a big job with major internal highlighter surgery… Let’s do it! 01
  • 19. Offsets Overview §  Getting character offsets is key to highlighting. 3 ways: §  Analysis: §  Analyzer è TokenStream è OffsetAttribute, oa.startOffset() §  Term vectors: §  IndexReader.getTermVector(docId,field) è Terms è TermsEnum, te.postings(…, PostingsEnum.OFFSETS) è PostingsEnum è pe.startOffset() §  Postings: §  IndexReader è LeafReader è Terms è … (see above)
  • 20. PostingsHighlighter Algorithm 1.  Fetches all stored-value content needed up-front 2.  Highlight in field sorted order, then doc sorted order loop: 1.  Get PostingsEnum from a Terms for each query term 2.  MTQs: Fake PostingsEnum around filtered TokenStream 3.  Process PostingsEnum[ ] into Passage[ ] java.text.BreakIterator: for passage delineation PassageScorer: for passage scoring (BM25 default) 4.  PassageFormatter: for formatting/mark-up
  • 21. UnifiedHighlighter §  Forked PH; given new name agnostic of offset source §  Mostly same PH API; internals re-arranged and expanded §  Solr adapter is nearly identical too §  Untouched: Passage, PassageScorer, PassageFormatter §  Re-uses some standard-Highlighter code too: §  WeightedSpanTermExtractor (for phrase accuracy) §  TokenStreamFromTermVector (for wildcards/MTQs)
  • 22. UH: Accurate Phrases (including any SpanQuery) §  Convert position-sensitive Queries to SpanQueries §  Re-use WeightedSpanTermExtractor (WSTE) for this §  Wrap PostingsEnum for position-sensitive words with one that filters by position-span extracted from span queries §  Custom: WSTE is not used for this, although it’s similar §  Note: not 100% accurate with query but very good
  • 23. UH: Analysis Offset Source §  The most difficult offset source… §  Honor positionIncrementGap for multi-valued data §  Populates a MemoryIndex when query has phrases §  But smartly filters irrelevant terms! (new trick) §  Wildcards/MTQs too? Uninvert MemoryIndex with re-used TokenStreamFromTermVector §  If just terms, treat them like wildcards to avoid MemoryIndex usage
  • 24. UH: Postings Plus Light TVs §  Postings offset source is great, but not for MTQs (wildcards) §  MTQs need to see all terms in just the document §  A plain term vector (no offsets or postings) has that! §  Trick: §  Wrap the main index with a term vector’s TermsEnum §  Then TokenStreamFromTermVector for MTQ
  • 25. 25 Benchmark Results 01 §  Unified Highlighter performed similarly or better than peers §  Best performance: Postings with “light” Term Vectors §  No use case for full term vectors anymore? §  Caveats §  Substantial variability in test runs (YMMV) §  Depends on the specifics of your use case §  Benchmark code available Highlighter Offset Source Terms Phrases Wildcards (search) N/A 1.0x! 1.0x! 1.0x   Standard Highlighter Analysis 4.6x! 4.7x! 7.4x   Unified Highlighter Analysis 2.8x! 2.4x! 3.7x   Standard Highlighter Term Vectors 2.7x! 2.3x! 3.7x   Fast Vector Highlighter Term Vectors 1.8x! 2.1x! 2.6x   Unified Highlighter Term Vectors 1.7x! 1.8x! 2.3x   Postings Highlighter Postings 1.8x! 1.5x! 3.8x   Unified Highlighter Postings 1.6x! 1.3x! 3.8x   Unified Highlighter Postings with Term Vectors* 1.5x! 1.3x! 2.2x   Times shown in multiples of the original search time (top row).
  • 26. 26 Future Potential Improvements §  Accuracy §  Switch from WSTE approach to SpanCollector API §  Honor conjunctions “(X AND Y) OR Z” §  Relevancy §  Consider term diversity across top-X passages §  Incorporate query boosts in passage scores §  Support “requireFieldMatch=false” 01
  • 27. Summary §  Importance of highlighting in Legal Search §  Overview of the existing Highlighters §  Improvements to the Standard Highlighter §  UnifiedHighlighter §  Contributed to Lucene/Solr! LUCENE-7438 §  Your new favorite highlighter?
  • 28. 28 Questions? 01 Timothy M. Rodriguez Verticals Search Team Lead, Bloomberg @Timothy055 David Smiley Search Developer/Consultant @DavidWSmiley