SlideShare a Scribd company logo
Faceted Search with Lucene
Shai Erera
Researcher, IBM
Who Am I
•
•
•
•

Working at IBM – Information Retrieval Research
Lucene/Solr committer and PMC member
http://shaierera.blogspot.com
shaie@apache.org
Lucene Facets 101
Faceted Search
•

Technique for accessing documents that were classified into a taxonomy of categories
–

•

Flat: Author/John Doe, Tags/Lucene, Popularity/High

–

Hierarchical: Computers/Software/Information Retrieval/Fulltext/Apache Lucene (ODP)

Quick overview of the break down of the search results
–

•

How many documents are in category Committed Paths/lucene/core vs. Committed Paths/lucene/facet

Simplifies interaction with the search application
–

Drilldown to issues that were updated in Past 2 days by clicking a link

–

No knowledge required about search syntax and index schema

http://jirasearch.mikemccandless.com
Lucene Facets
•
•

Contributed by IBM in 2011, released in 3.4.0
Major changes since 4.1.0+
–
–
–
–

•

Two main indexing-time modes
–
–

•

Taxonomy-based: hierarchical facets, managed by a
sidecar index, low NRT reopen cost
SortedSetDocValues: flat facets only, no sidecar index,
higher NRT reopen cost

Runtime modes
–

•

NRT support
Nearly 400% search speedups
Complete API revamp
New features (SortedSet, range faceting, drill-sideways)

Range facets (on NumericDocValues fields)

Other implementations: Solr, ElasticSearch, Bobo
Browse
Lucene Facet Components
•

TaxonomyWriter/Reader
–

•

FacetFields
–

•

Defines which facets to aggregate and the FacetsAggregator (aggregation function)

FacetsCollector
–

•

Add facets information to documents (DocValues fields, drilldown terms)

FacetRequest
–

•

Manage the taxonomy information

Collects matching documents and computes the top-K categories for each facet request
(invokes FacetsAccumulator)

DrillDownQuery / DrillSideways
–

Execute drilldown and drill-sideways requests
Sample Code – Indexing
// Builds the taxonomy as documents are indexed, multi-threaded, single instance
TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
// Adds facets information to a document, can be initialized once per thread
FacetFields facetFields = new FacetFields(taxoWriter);
// List of categories to add to the document
List<CategoryPath> cats = new ArrayList<CategoryPath>();
cats.add(new CategoryPath("Author", "Erik Hatcher"));
cats.add(new CategoryPath("Author/Otis Gospodnetić“, ‘/’));
cats.add(new CategoryPath("Pub Date", "2004", "December", "1"));
Document bookDoc = new Document();
bookDoc.add(new TextField(“title”, “lucene in action”, Store.YES);
// add categories fields (DocValues, Postings)
facetFields.addFields(bookDoc, cats);
// index the document
indexWriter.addDocument(bookDoc);
Sample Code – Search
// Open an NRT TaxonomyReader
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
// Define the facets to
FacetSearchParams fsp =
fsp.addFacetRequest(new
fsp.addFacetRequest(new

aggregate (top-10 categories for each)
new FacetSearchParams();
CountFacetRequest(new CategoryPath("Author"), 10));
CountFacetRequest(new CategoryPath("Pub Date"), 10));

// Collect both top-K facets and top-N matching documents
TopDocsCollector tdc = TopScoredDocCollector.create(10, true);
FacetsCollector fc = FacetsCollector.create(fsp, indexr, taxor);
Query q = new TermQuery(new Term(“title”, “lucene”));
searcher.search(q, MultiCollector.wrap(tdc, fc));
// Traverse the top facets
for (FacetResult fres : facetsCollector.getFacetResults()) {
FacetResultNode root = fres.getFacetResultNode();
System.out.println(String.format("%s (%d)", root.label, root.value));
for (FacetResultNode cat : root.getSubResults()) {
System.out.println(“ “ + cat.label.components[0] + “ (“ + cat.value + “)”);
}
}
Drilldown and Drill-Sideways
•

Drilldown adds a filter to the search
–

Multiple categories can be OR’d

// Drilldown – filter results to “Component/core/index”;
// All other “Component/*” and “Component/core/*” get count 0
Query base = new MatchAllDocsQuery();
DrillDownQuery ddq = new DrillDownQuery(facetIndexingParams, base);
ddq.add(new CategoryPath(“Component/core/index”, ‘/’));

•

Drill sideways allows drilldown, yet still aggregate “sideways”
categories

// Drill-Sideways – drilldown on “Component/core/index”;
// Other “Component/*” and “Component/core/*” are counted too
DrillSideways ds = new DrillSideways(searcher, taxoReader);
DrillSidewaysResult sidewaysRes = ds.search(null, ddq, 10, fsp);
http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
Dynamic Facets
•

Range facets on NumericDocValues fields
–
–

Define interested buckets during query
Supports any arbitrary ValueSource (Lucene 4.6.0)

// Aggregate matching documents into buckets
RangeAccumulator a = new RangeAccumulator(new
RangeFacetRequest<LongRange>("field",
new LongRange(“1-5", 1L, true, 5L, true),
new LongRange(“6-20", 6L, true, 20L, true),
new LongRange(“21-100", 21L, false, 100L, false),
new LongRange(“over 100", 100L, false, Long.MAX_VALUE, true)));
Facet Associations
•

Not all facets created equal
–
–
–

•

Categories can have values associated with them per document
–
–

•

Categories added by an automatic categorization system, e.g. Category/Apache
Lucene (0.74) (confidence level is 0.74)
Important metadata about the facet, e.g. Contracts/US ($5M) (total $$$ generated
from contracts)
Complex structures, e.g. Users/Shai Erera (lastAccess=YYYY/MM/DD,
numUpdates=8…)
They are later aggregated by these values
NOTE: ≠ NumericDocValuesFields!

Facet associations are completely customizable – encoded as a byte[] per
document

http://shaierera.blogspot.com/2013/01/facet-associations.html
More Features
•

Complements
–
–
–

•

Sampling
–
–

•

Holds the count of each category in-memory, per IndexReader
When number of search results is >50% of the index, count the “complement set”
Useful for “overview” queries, e.g. MatchAllDocsQuery
Aggregate a sampled set of the search results
Optionally re-count top-K facets for accurate values

Partitions
–
–

Partition the taxonomy space to control memory usage during faceted search
Useful for very big taxonomies (10s of millions of categories)
Lucene Facets Under the Hood
The Taxonomy Index
•

The taxonomy maps categories to integer codes (referred to as ordinals)
–
–
–

•

Kind of like a Map<CategoryPath,Integer>, with hierarchy support
Provides taxonomy browsing services
DirectoryTaxonomyWriter is managed as a sidecar Lucene index

Categories are broken down to their path components, e.g.
Date/2012/March/20 becomes:
–
–
–
–

Date, with ordinal=1
Date/2012, with ordinal=2
Date/2012/March, with ordinal=3
Date/2012/March/20, with ordinal=4
The Search Index
•

Categories are added as drilldown terms, e.g. for Date/2012/March/20:
–
–
–

•

$facets:Date
$facets:Date/2012
…

All category ordinals associated with the document are added as a
BinaryDocValuesField
–
–

All path components ordinals’ are added, not just the leafs’
Encoded as VInt + gap for efficient compression and speed
•

–

Other compression methods attempted, but were slower to decode (LUCENE-4609)

Used during faceted search to read all the associated ordinals and aggregate accordingly
(e.g. count)
SortedSet Facets
•
•
•
•

SortedSetFacetFields add SortedSetDocValuesFields and drilldown
terms to documents
Local-segment SortedSet ordinals are mapped to global ones through
SortedSetDocValuesReaderState
Use SortedSetDocValuesAccumulator to accumulate SortedSet facets
Advantages:
–
–
–

•

Taxonomy representation requires less RAM (flat taxonomy)
No sidecar index
Tie-breaks by label-sort order

Disadvantages:
–
–
–
–

Not full taxonomy
Overall uses more RAM (local-to-global ordinal mapping)
Adds NRT reopen cost
Slower than taxonomy-based facets
Global Ordinals
•

Per-segment integer codes (as used by the SortedSet approach) are less efficient
–
–
–

•

Global ordinals allow efficient per-segment faceting and aggregation
–
–

•

Different ordinals for same categories across segments
Hold in-memory codes map (e.g. local-to-global) – more RAM and less scalable
Resolve top-K on the String representation of categories – more CPU
No translation maps required (no extra RAM, highly scalable)
Aggregation, top-K computation done on integer codes

But, do not play well with IndexWriter.addIndexes(Directory…)
–

Must use IndexWriter.addIndexes(IndexReader…), so that the ordinals in the
input search are mapped to the destination’s
Two-Phase Aggregation
•

FacetsCollector works in two steps:
–
–

•

Performance tests show that this improves faceted search (LUCENE-4600)
–

•

Collects matching documents (and optionally their scores)
Invokes FacetsAccumulator to accumulate the top-K facets
Locality of reference?

Useful for Sampling and Complements
–

Hard to do otherwise
FacetIndexingParams
•

Determine how facets are encoded
–
–
–

•

CategoryListParams holds parameters for a category list
–
–

•

Partition size
Facet delimiter character (for drilldown terms, default u001F)
CategoryListParams
Encoder/Decoder (default DGapVInt)
OrdinalPolicy (how path components are encoded): ALL_PARENTS, NO_PARENTS and
ALL_BUT_DIMENSION (default)

CategoryListParams can be used to group facets together
–
–

Default: all facets are put in the same “category list” (i.e. one BinaryDocValues field)
Expert: separate categories by dimension into different category lists
•

•

Useful when sets of categories are always aggregated together, but not with other categories

FacetIndexingParams are currently not recorded per-segment and therefore you
should be careful if you suddenly change them!
Questions?

More Related Content

What's hot

DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
lucenerevolution
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Shagun Rathore
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
Entity Linking @ Scale Using Elasticsearch
Entity Linking @ Scale Using ElasticsearchEntity Linking @ Scale Using Elasticsearch
Entity Linking @ Scale Using Elasticsearch
Atif Khan
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Divij Sehgal
 
Taxonomies for Users
Taxonomies for UsersTaxonomies for Users
Taxonomies for Users
Heather Hedden
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
The HDF-EOS Tools and Information Center
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Sease
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
lucenerevolution
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
hypto
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLMorgan Tocker
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
Vadim Kirilchuk
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務
Kedy Chang
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
Louise Grandjonc
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
SmartCat
 
RonDB, a NewSQL Feature Store for AI applications.pdf
RonDB, a NewSQL Feature Store for AI applications.pdfRonDB, a NewSQL Feature Store for AI applications.pdf
RonDB, a NewSQL Feature Store for AI applications.pdf
mikael329498
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
Saumitra Srivastav
 

What's hot (20)

DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Entity Linking @ Scale Using Elasticsearch
Entity Linking @ Scale Using ElasticsearchEntity Linking @ Scale Using Elasticsearch
Entity Linking @ Scale Using Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Taxonomies for Users
Taxonomies for UsersTaxonomies for Users
Taxonomies for Users
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
RonDB, a NewSQL Feature Store for AI applications.pdf
RonDB, a NewSQL Feature Store for AI applications.pdfRonDB, a NewSQL Feature Store for AI applications.pdf
RonDB, a NewSQL Feature Store for AI applications.pdf
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 

Viewers also liked

The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
lucenerevolution
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Lucidworks
 
How to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldHow to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real World
Brian McKeiver
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Lucidworks
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
Earley Information Science
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?
epek
 
Faceted Navigation
Faceted NavigationFaceted Navigation
Faceted Navigation
Ruslan Zavacky
 
Extending facet search to the general web
Extending facet search to the general webExtending facet search to the general web
Extending facet search to the general web
祺傑 林
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
tomhill
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
Laura Loveday Maury
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
searchbox-com
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Lucidworks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
Grokking VN
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
Saumitra Srivastav
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
Nadeem Nazir
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Provectus
 

Viewers also liked (20)

The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
How to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldHow to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real World
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?
 
Faceted Navigation
Faceted NavigationFaceted Navigation
Faceted Navigation
 
Extending facet search to the general web
Extending facet search to the general webExtending facet search to the general web
Extending facet search to the general web
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 

Similar to Faceted Search with Lucene

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
ashish61_scs
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
farhan "Frank"​ mashraqi
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
Chartio
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
lucenerevolution
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
Julien Nioche
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
gramana
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
Robert Viseur
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDuraSpace
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Aaron Shilo
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
lucenerevolution
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overviewhonglee71
 

Similar to Faceted Search with Lucene (20)

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/Export
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overview
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
lucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
lucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
lucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
lucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
lucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
lucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
lucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
lucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
lucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
lucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 

Faceted Search with Lucene

  • 1.
  • 2. Faceted Search with Lucene Shai Erera Researcher, IBM
  • 3. Who Am I • • • • Working at IBM – Information Retrieval Research Lucene/Solr committer and PMC member http://shaierera.blogspot.com shaie@apache.org
  • 5. Faceted Search • Technique for accessing documents that were classified into a taxonomy of categories – • Flat: Author/John Doe, Tags/Lucene, Popularity/High – Hierarchical: Computers/Software/Information Retrieval/Fulltext/Apache Lucene (ODP) Quick overview of the break down of the search results – • How many documents are in category Committed Paths/lucene/core vs. Committed Paths/lucene/facet Simplifies interaction with the search application – Drilldown to issues that were updated in Past 2 days by clicking a link – No knowledge required about search syntax and index schema http://jirasearch.mikemccandless.com
  • 6. Lucene Facets • • Contributed by IBM in 2011, released in 3.4.0 Major changes since 4.1.0+ – – – – • Two main indexing-time modes – – • Taxonomy-based: hierarchical facets, managed by a sidecar index, low NRT reopen cost SortedSetDocValues: flat facets only, no sidecar index, higher NRT reopen cost Runtime modes – • NRT support Nearly 400% search speedups Complete API revamp New features (SortedSet, range faceting, drill-sideways) Range facets (on NumericDocValues fields) Other implementations: Solr, ElasticSearch, Bobo Browse
  • 7. Lucene Facet Components • TaxonomyWriter/Reader – • FacetFields – • Defines which facets to aggregate and the FacetsAggregator (aggregation function) FacetsCollector – • Add facets information to documents (DocValues fields, drilldown terms) FacetRequest – • Manage the taxonomy information Collects matching documents and computes the top-K categories for each facet request (invokes FacetsAccumulator) DrillDownQuery / DrillSideways – Execute drilldown and drill-sideways requests
  • 8. Sample Code – Indexing // Builds the taxonomy as documents are indexed, multi-threaded, single instance TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir); // Adds facets information to a document, can be initialized once per thread FacetFields facetFields = new FacetFields(taxoWriter); // List of categories to add to the document List<CategoryPath> cats = new ArrayList<CategoryPath>(); cats.add(new CategoryPath("Author", "Erik Hatcher")); cats.add(new CategoryPath("Author/Otis Gospodnetić“, ‘/’)); cats.add(new CategoryPath("Pub Date", "2004", "December", "1")); Document bookDoc = new Document(); bookDoc.add(new TextField(“title”, “lucene in action”, Store.YES); // add categories fields (DocValues, Postings) facetFields.addFields(bookDoc, cats); // index the document indexWriter.addDocument(bookDoc);
  • 9. Sample Code – Search // Open an NRT TaxonomyReader TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter); // Define the facets to FacetSearchParams fsp = fsp.addFacetRequest(new fsp.addFacetRequest(new aggregate (top-10 categories for each) new FacetSearchParams(); CountFacetRequest(new CategoryPath("Author"), 10)); CountFacetRequest(new CategoryPath("Pub Date"), 10)); // Collect both top-K facets and top-N matching documents TopDocsCollector tdc = TopScoredDocCollector.create(10, true); FacetsCollector fc = FacetsCollector.create(fsp, indexr, taxor); Query q = new TermQuery(new Term(“title”, “lucene”)); searcher.search(q, MultiCollector.wrap(tdc, fc)); // Traverse the top facets for (FacetResult fres : facetsCollector.getFacetResults()) { FacetResultNode root = fres.getFacetResultNode(); System.out.println(String.format("%s (%d)", root.label, root.value)); for (FacetResultNode cat : root.getSubResults()) { System.out.println(“ “ + cat.label.components[0] + “ (“ + cat.value + “)”); } }
  • 10. Drilldown and Drill-Sideways • Drilldown adds a filter to the search – Multiple categories can be OR’d // Drilldown – filter results to “Component/core/index”; // All other “Component/*” and “Component/core/*” get count 0 Query base = new MatchAllDocsQuery(); DrillDownQuery ddq = new DrillDownQuery(facetIndexingParams, base); ddq.add(new CategoryPath(“Component/core/index”, ‘/’)); • Drill sideways allows drilldown, yet still aggregate “sideways” categories // Drill-Sideways – drilldown on “Component/core/index”; // Other “Component/*” and “Component/core/*” are counted too DrillSideways ds = new DrillSideways(searcher, taxoReader); DrillSidewaysResult sidewaysRes = ds.search(null, ddq, 10, fsp); http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
  • 11. Dynamic Facets • Range facets on NumericDocValues fields – – Define interested buckets during query Supports any arbitrary ValueSource (Lucene 4.6.0) // Aggregate matching documents into buckets RangeAccumulator a = new RangeAccumulator(new RangeFacetRequest<LongRange>("field", new LongRange(“1-5", 1L, true, 5L, true), new LongRange(“6-20", 6L, true, 20L, true), new LongRange(“21-100", 21L, false, 100L, false), new LongRange(“over 100", 100L, false, Long.MAX_VALUE, true)));
  • 12. Facet Associations • Not all facets created equal – – – • Categories can have values associated with them per document – – • Categories added by an automatic categorization system, e.g. Category/Apache Lucene (0.74) (confidence level is 0.74) Important metadata about the facet, e.g. Contracts/US ($5M) (total $$$ generated from contracts) Complex structures, e.g. Users/Shai Erera (lastAccess=YYYY/MM/DD, numUpdates=8…) They are later aggregated by these values NOTE: ≠ NumericDocValuesFields! Facet associations are completely customizable – encoded as a byte[] per document http://shaierera.blogspot.com/2013/01/facet-associations.html
  • 13. More Features • Complements – – – • Sampling – – • Holds the count of each category in-memory, per IndexReader When number of search results is >50% of the index, count the “complement set” Useful for “overview” queries, e.g. MatchAllDocsQuery Aggregate a sampled set of the search results Optionally re-count top-K facets for accurate values Partitions – – Partition the taxonomy space to control memory usage during faceted search Useful for very big taxonomies (10s of millions of categories)
  • 15. The Taxonomy Index • The taxonomy maps categories to integer codes (referred to as ordinals) – – – • Kind of like a Map<CategoryPath,Integer>, with hierarchy support Provides taxonomy browsing services DirectoryTaxonomyWriter is managed as a sidecar Lucene index Categories are broken down to their path components, e.g. Date/2012/March/20 becomes: – – – – Date, with ordinal=1 Date/2012, with ordinal=2 Date/2012/March, with ordinal=3 Date/2012/March/20, with ordinal=4
  • 16. The Search Index • Categories are added as drilldown terms, e.g. for Date/2012/March/20: – – – • $facets:Date $facets:Date/2012 … All category ordinals associated with the document are added as a BinaryDocValuesField – – All path components ordinals’ are added, not just the leafs’ Encoded as VInt + gap for efficient compression and speed • – Other compression methods attempted, but were slower to decode (LUCENE-4609) Used during faceted search to read all the associated ordinals and aggregate accordingly (e.g. count)
  • 17. SortedSet Facets • • • • SortedSetFacetFields add SortedSetDocValuesFields and drilldown terms to documents Local-segment SortedSet ordinals are mapped to global ones through SortedSetDocValuesReaderState Use SortedSetDocValuesAccumulator to accumulate SortedSet facets Advantages: – – – • Taxonomy representation requires less RAM (flat taxonomy) No sidecar index Tie-breaks by label-sort order Disadvantages: – – – – Not full taxonomy Overall uses more RAM (local-to-global ordinal mapping) Adds NRT reopen cost Slower than taxonomy-based facets
  • 18. Global Ordinals • Per-segment integer codes (as used by the SortedSet approach) are less efficient – – – • Global ordinals allow efficient per-segment faceting and aggregation – – • Different ordinals for same categories across segments Hold in-memory codes map (e.g. local-to-global) – more RAM and less scalable Resolve top-K on the String representation of categories – more CPU No translation maps required (no extra RAM, highly scalable) Aggregation, top-K computation done on integer codes But, do not play well with IndexWriter.addIndexes(Directory…) – Must use IndexWriter.addIndexes(IndexReader…), so that the ordinals in the input search are mapped to the destination’s
  • 19. Two-Phase Aggregation • FacetsCollector works in two steps: – – • Performance tests show that this improves faceted search (LUCENE-4600) – • Collects matching documents (and optionally their scores) Invokes FacetsAccumulator to accumulate the top-K facets Locality of reference? Useful for Sampling and Complements – Hard to do otherwise
  • 20. FacetIndexingParams • Determine how facets are encoded – – – • CategoryListParams holds parameters for a category list – – • Partition size Facet delimiter character (for drilldown terms, default u001F) CategoryListParams Encoder/Decoder (default DGapVInt) OrdinalPolicy (how path components are encoded): ALL_PARENTS, NO_PARENTS and ALL_BUT_DIMENSION (default) CategoryListParams can be used to group facets together – – Default: all facets are put in the same “category list” (i.e. one BinaryDocValues field) Expert: separate categories by dimension into different category lists • • Useful when sets of categories are always aggregated together, but not with other categories FacetIndexingParams are currently not recorded per-segment and therefore you should be careful if you suddenly change them!