SlideShare a Scribd company logo
1 of 44
Download to read offline
The Many Facets of Apache Solr
          Yonik Seeley, Lucid Imagination
     yonik@lucidimagination.com, Oct 20 2011
What I Will Cover

§    What is Faceted Search
§    Solr’s Faceted Search
§    Tips & Tricks
§    Performance & Algorithms




                           3
My Background
§  Creator of Solr
§  Co-founder of Lucid Imagination
§  Expertise: Distributed Search systems and
    performance
§  Lucene/Solr committer, a member of the
    Lucene PMC,
    member of the Apache Software
    Foundation
§  Work: CNET Networks, BEA, Telcordia,
    among others
§  M.S. in Computer Science, Stanford
What is Faceted Search




           5
Faceted Search Example

                                  The facet count or
                                    constraint count
 Manufacturer is
                                   shows how many
a facet, a way of
                                  results match each
categorizing the
                                         value
     results




 Canon, Sony,
 and Nikon are
constraints, or
  facet values



The breadcrumb
 trail shows what
constraints have
   already been
    applied and
  allows for their
       removal
Key Elements of Faceted Search
§  No hierarchy of options is enforced
   •  Users can apply facet constraints in any order
   •  Users can remove facet constraints in any order
§  No surprises
   •  The user is only given facets and constraints
      that make sense in the context of the items they
      are looking at
   •  The user always knows what to expect before
      they apply a constraint
§  Also known as guided navigation, faceted
    navigation, faceted browsing, parametric search

                            7
Solr’s Faceted Search




          8
Field Faceting
§  Specifies a Field to be used as a Facet
§  Uses each term indexed in that Field as a
    Constraint
§  Field must be indexed
§  Can be used multiple times for multiple fields

  q
  !                    =   iPhone!
    !
 !fq                   =   inStock:true!
 !facet                =   true!
 !facet.field          =   color!
 !facet.field          =   category!
                            9
Field Faceting Response
http://localhost:8983/solr/select?q=iPhone&fq=inStock:true!
&facet=true&facet.field=color&facet.field=category!


<lst name="facet_counts”>!
 <lst name="facet_fields">!
  <lst name="color">!
    <int name="red">17</int>!
    <int name="green">6</int>!
    <int name="blue">2</int>!
    <int name="yellow">2</int>!
  <lst name=”category">!
    <int name=“accessories”>16</int>!
    <int name=“electronics”>11</int>!
                             10
Or if you prefer JSON…
http://localhost:8983/solr/select?q=iPhone&fq=inStock:true!
&facet=true&facet.field=color&facet.field=category&wt=json!


"facet_counts":{!
  "facet_fields":{!
    "color":[!
      "red",17,!
      "green",6,!
      "blue",2,      !
      "yellow",2]!
    "category":[!
      "accessories",16,!
      "electronics",11]!     11
Applying Constraints
  Assume the user clicked on “red”…
                                         Simply add another filter
                                           query to apply that
                                                constraint

http://localhost:8983/solr/select?
q=iPhone&fq=inStock:true&fq=color:red&facet=true&
facet.field=color&facet.field=category!



   Remove redundant facet.field
   (assuming single valued field)

                                    12
facet.field Options
§  facet.prefix - Restricts the possible constraints to only
    indexed values with a specified prefix.
§  facet.mincount=0 - Restricts the constraints returned to
    those containing a minimum number of documents in the
    result set.
§  facet.sort=count - The ordering of constraints: count or
    index
§  facet.offset=0 - Indicates how many constraints in the
    specified sort ordering should be skipped in the response.
§  facet.limit=100 - The number of constraints to return
§  facet.missing=false – Return the number of docs with no
    value in the field
                              13
facet.query
                        !
§  Specifies a query string to be used as a Facet
    Constraint
§  Typically used multiple times to get multiple
    (discrete) sets
§  Any type of query supported


!facet.query = rank:[* TO 20]!
!facet.query = rank:[21 TO *]!


                          14
facet.query Results

<result numFound="27" ... />!
...!
<lst name="facet_counts">!
 <lst name="facet_queries">!
  <int name="rank:[* TO 20]">2</int>!
  <int name="rank:[21 TO *]">15</int>!
 </lst>!
 ...!
The lat,lon center point to
 Spatial faceting                  search from

  !q=*:*&facet=true!
 !pt=45.15,-93.85!       Name of the field
                       containing lat+lon data
 !sfield=store!
 !facet.query={!geofilt d=5}!
 !facet.query={!geofilt d=10}!

                            geospatial query type
"facet_counts":{!
   "facet_queries":{!
     "{!geofilt d=5}":3,!
     "{!geofilt d=10}":6},!
                     16
Range Faceting
                              "facet_counts":{
§  Simpler than a sequence     "facet_ranges":{
    of facet.query params         "price":{
                                   "counts”:[
                                     "0.0”,5,
http://...&facet=true                "50.0”,2,
&facet.range=price                   "100.0”,0,
                                     "150.0”,2,
&facet.range.start=0                 "200.0”,0,
&facet.range.end=500                 "250.0”,1,
                                     "300.0”,2,
&facet.range.gap=50                  "350.0”,2,
                                     "400.0”,0,
                                     "450.0”,1],
                                   "gap":50.0,
                                   "start":0.0,
                                   "end":500.0}}}}
Date Faceting
l  facet.date  is deprecated, use facet.range on a date field now
l  Creates Constraints based on evenly sized date ranges using
    the Gregorian Calendar
l  Ranges are specified using "Date Math" so they DWIM in spite
    of variable length months and leap years


  facet.range                    =   pubdate!
  facet.range.start              =   NOW/YEAR-1YEAR!
  facet.range.end                =   NOW/MONTH+1MONTH!
  facet.range.gap                =   +1MONTH!
Date Faceting Results
 "facet_counts":{!
   "facet_ranges":{!
      ”pubdate":{!
        "counts":[!
          "2010-01-01T00:00:00Z",4,!
          "2010-02-01T00:00:00Z",6,!
          "2010-03-01T00:00:00Z",0,!
          "2010-04-01T00:00:00Z",13,!
[…]!
          "2011-09-01T00:00:00Z",5,!
          "2011-10-01T00:00:00Z",2],!
        "gap":"+1MONTH",!
        "start":"2010-01-01T00:00:00Z",!
        "end":"2011-11-01T00:00:00Z”}}}!
Range Faceting Options
l  facet.range.hardend=false     - Determines what effective
    end value is used when the specified "start" and "end"
    don't divide into even "gap" sized buckets; false means
    the last Constraint range may be shorter then the others
l  facet.range.other=none - Allows you to specify what
    other Constraints you are interested in besides the
    generated ranges: before, after, between, none, all
l  facet.range.include=lower – Specifies what bounds are
    inclusive vs exclusive: lower, upper, edge, outer, all
Pivot Faceting (trunk)
l  Computes   a Matrix of Constraint Counts across multiple
    Facet Fields
l  Syntax: facet.pivot=field1,field2,field3,…


facet.pivot=cat,inStock
                      #docs #docs w/         #docs w/
                            inStock:true     instock:false
cat:electronics       14     10              4
cat:memory            3      3               0
cat:connector         2      0               2
cat:graphics card     2      0               2
cat:hard drive        2      2               0
Pivot Faceting
   http://...&facet=true&facet.pivot=cat,popularity
            "facet_counts":{                    (continued)
               "facet_pivot":{
                 "cat,popularity":[{           {
                   "field":"cat",                "field":"popularity",
14 docs w/         "value":"electronics",        "value”:1,
cat==electronics   "count":14,                   "count":2}]},
                   "pivot":[{               {
5 docs w/             "field":"popularity", "field":"cat",
cat==electronics      "value":6,              "value":"memory",
&& popularity==6      "count":5},             "count":3,
                    {                         "pivot":[]},
                      "field":"popularity",
                      "value":7,                […]
                      "count":4},
Tips & Tricks




      23
term QParser
l  Default Query Parser does special things with whitespace
    and punctuation
l  Problematic when "filtering" on Facet Field Constraints
    that contain whitespace, punctuation, or other reserved
    characters.
l  Use the term parser to filter on an exact Term


 fq = {!term f=category}Books & Magazines
                         OR
 fq = {!term f=category v=$t}
  t = Books & Magazines
Taxonomy Facets
l  What   If Your Documents Are Organized in a Taxonomy?
Taxonomy Facets: Data
l  Flattened   Data
  !Doc#1: NonFic > Law!
  !Doc#2: NonFic > Sci!
  !Doc#3: NonFic > Hist!
          NonFic > Sci > Phys!
l  Indexed   Terms (prepend number of nodes in path segment)

Doc#1: 1/NonFic, 2/NonFic/Law!
Doc#2: 1/NonFic, 2/NonFic/Sci!
Doc#3: 1/NonFic, 2/NonFic/Hist, !
       2/NonFic/Sci, 3/NonFic/Sci/Phys!
Taxonomy Facets: Initial Query

facet.field    = category!
facet.prefix   = 2/NonFic!
facet.mincount = 1!

<result numFound="164" ...!
<lst name="facet_fields">!
 <lst name="category">!
   <int name="2/NonFic/Sci">2</int>!
   <int name="2/NonFic/Hist">1</int>!
   <int name="2/NonFic/Law">1</int>!
Taxonomy Facets: Drill Down

fq           =   {!term f=category}2/NonFic/Sci!
facet.field      = category!
facet.prefix     = 3/NonFic/Sci!
facet.mincount   = 1!


<result numFound="2" ...!
<lst name="facet_fields">!
 <lst name="category">!
   <int name=”3/NonFic/Sci/Phys">1</int>!
Multi-Select Faceting
http://search.lucidimagination.com   §  Very	
  generic	
  support	
  
                                     •  Reuses	
  localParams	
  syntax	
  {!name=val}	
  
                                     •  Ability	
  to	
  tag	
  lters	
  
                                     •  Ability	
  to	
  exclude	
  certain	
  lters	
  when	
  
                                        faceCng,	
  by	
  tag	
  
                                       	
  

                                      	
  q=index	
  replicaCon	
  
                                      	
  facet=true	
  
                                      	
  fq={!tag=pr}project:(lucene	
  OR	
  solr)	
  
                                      	
  facet.eld={!ex=pr}project	
  
                                      	
  facet.eld={!ex=src}source	
  


                                              29	
  
Same Facet, Different Exclusions
  §  A key can be specified for a facet to change the
      name used to identify it in the response.
          q   =   Hot Rod!
         fq   =   {!df=colors tag=cx}purple green!
facet.field   =   {!key=all_colors ex=cx}colors!
facet.field   =   {!key=overlap_colors}colors!

"facet_counts":{!                 ”overlap_colors":[!
  "facet_fields":{!                   "red",7,!
    ”all_colors":[!                   "green",6,!
      "red",19,!                      "blue”,1]!
      "green",6,!                 }!
      "blue",2],!            }!
                            30
“Pretty” facet.field Terms
      §  Field Faceting uses Indexed Terms
      §  Leverage copyField and TokenFilters that will
          give you good looking Constraints

<tokenizer	
  class="solr.PaPernTokenizerFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  paPern="(,|;)s*"	
  />	
  
<lter	
  class="solr.PaPernReplaceFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  paPern="s+"	
  replacement="	
  "	
  />	
  
<lter	
  class="solr.PaPernReplaceFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  paPern="	
  and	
  "	
  replacement="	
  &amp;	
  "	
  />	
  
<lter	
  class="solr.TrimFilterFactory"	
  />	
  
<lter	
  class="solr.CapitalizaConFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  onlyFirstWord="false"	
  />	
  
                                                      31
“Pretty” facet.field Results
{“id” : “MyExampleDoc”,!
 "category” : ” books !
 and magazines; !
computers, “!
}!
   copyField
  in schema    <copyField “source”=“category” “dest”=“category_pretty”/>

"facet_counts":{!
  "facet_fields":{!
    "category_pretty":[!
      "Books & Magazines",1,!
      "Computers",1]!
                                 32
facet.field Labels
  §  facet.query params are echoed verbatim when
      returning the constraint counts
  §  Optionally, one can declare a facet.query in
      solrconfig.xml and include a "label" that the
      presentation layer can parse out for display. “label” has no
                                                                                    meaning to solr
facet.query = {!label=‘Hot!’	
  }!
              +pop:[1 TO *] !
              +pub_date:[NOW/DAY-1DAY TO *]!

"facet_queries"	
  :	
  {	
  
	
  	
  "	
  {!label=‘Hot!’	
  }	
  pop:[1	
  TO	
  *]	
  ...	
  "	
  :	
  15	
  
}	
  
                                                        33
Performance




     34
facet.method
                                    !
name     facet.method description                 memory               CPU


enum     enum            Iterates over terms,     filter-per-term in   ~O(nTerms)
                         calculating set          the filterCache
                         intersections

field    fc (single-     Iterates over documents, Lucene               O(nDocs)
cache    valued field)   counting terms           FieldCache
                                                  Entry… int
                                                  [maxDoc]+terms
UnInvert fc (multi-      Iterates over documents, O(maxDoc * num       ~O(nDocs)
edField valued field)    counting terms           terms per doc)

Per-    fcs              Like field-cache, just   Lucene               O(nDocs)
segment                  better for NRT since     FieldCache           +O(nTerms)
field                    FieldCacheEntry is at    Entries… int
cache                    segment level.           [maxDoc]+terms

                                          35
facet.method=fc                                  Mem=int[maxDoc]

CPU=O(nDocs in
                      (single-valued field)                              +unique_values

   base set)
                             Documents
                             matching the
                             base query            Lucene FieldCache Entry
                              Juggernaut           (StringIndex) for the hero
 q=Juggernaut                                      field
                                 0                order: for each
 &facet=true                     2     lookup     doc, an index into   lookup: the
 &facet.field=hero                                the lookup array
                                                                       string values
                                 7
                                                         5                (null)
                                                         3              batman
                              accumulator
                                                         5                flash
                                 0
                                                         1             spiderman
                                 1
                                                         4             superman
            Priority queue       0    increment
               flash, 5
                                                         5             wolverine
                                 0
             Batman, 3                                   2
                                 0
                                                         1
                                 2
facet.method=fcs (trunk)

          (per-segment single-valued)
                  Segment1             Segment2           Segment3               Segment4
                  FieldCache           FieldCache         FieldCache             FieldCache
                     Entry                Entry              Entry                  Entry

                        accumulator1      accumulator2           accumulator3      accumulator4
                  inc
         lookup         0                 0                   1                       0
                        3                 2                   3                       1
         0
Base                    5                 1                   0                       0
DocSet   2
                        0                 0                   4
         7                                                                         thread4
                        1              thread2            thread3
                        2
                    thread1                                                     Priority queue
                                              FieldCache +
                                                                                   flash, 5
                                              accumulator                        Batman, 3
                                              merger
                                              (Priority queue)
facet.method=fcs!
l  Controllable
           multi-threading
   facet.method=fcs!
   facet.field={!threads=4}myfield!
   	
  
l  Disadvantages
   l     Larger memory use (FieldCaches + accumulators)
   l     Slower (extra FieldCache merge step needed) – O(nTerms)
l  Advantages
   l     Rebuilds FieldCache entries only for new segments (NRT friendly)
   l     Multi-threaded
Per-segment faceting performance
                 comparison
Test index: 10M documents, 18 segments, single valued field

    Base DocSet=100 docs, facet.field on a field with 100,000 unique terms
A   Time for request*           facet.method=fc        facet.method=fcs
    static index                3 ms                   244 ms
    quickly changing index      1388 ms                267 ms



    Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms

B   Time for request*          facet.method=fc          facet.method=fcs
    static index               26 ms                    34 ms
    quickly changing index     741 ms                   94 ms

                   *complete request time, measured externally
facet.method=fc 

                (multi-valued field)
§  UnInvertedField - like single-valued FieldCache algorithm, but
    with multi-valued FieldCache
§  Good for many unique terms, relatively few values per doc
   •  Best case: 50x faster, 5x smaller than “enum” (100K unique values,
      1-5 per doc)
   •  O(n_docs), but optimization to count the inverse when n>maxDoc/2
§  Memory efficient
   •  Terms in a document are delta coded variable width ords (vints)
   •  Ord list for document packed in an int or in a shared byte[]
   •  Hybrid approach: “big terms” that match >5% of index use
      filterCache instead
   •  Only 1/128th of string values in memory
facet.method=fc
              fieldValueCache
§  Implicit cache with UnInvertedField entries
   •  Not autowarmed – use static warming request
   •  http://localhost:8983/solr/admin/stats.jsp (mem size,
      time to create, etc)
Faceting: fieldValueCache
§  Implicit cache with UnInvertedField entries
   •  Not autowarmed – use static warming request
   •  http://localhost:8983/solr/admin/stats.jsp (mem size,
      time to create, etc)


item_cat:
{field=cat,memSize=5376,tindexSize=52,time=2,phase1=2,
nTerms=16,bigTerms=10,termInstances=6,uses=44}
Multi-valued faceting:
               facet.method=enum
§  facet.method=enum
§  For each term in field:
   •  Retrieve filter                                     Solr filterCache (in memory)
   •  Calculate intersection size                         hero:batman     hero:flash

         Lucene Inverted    Docs matching
         Index (on disk)    base query                        1             0
                                         intersection
                                 0                            3             1
      batman      1 3 5 8                   count
                                 1                            5             5
      flash       0 1 5          5                            8
      spiderman   2 4 7          9       Priority queue
                                          batman=2
      superman    0 6 9
      wolverine   1 2 7 8
facet.method=enum
                              !

§    O(n_terms_in_field)
§    Short circuits based on term.df
§    filterCache entries int[ndocs] or BitSet(maxDoc)
§    Size filterCache appropriately
      •  Either autowarm filterCache, or use static warming
         queries (via newSearcher event) in solrconfig.xml
§  facet.enum.cache.minDf - prevent filterCache use
    for small terms
      •  Also useful for huge index w/ no time constraints
Q&A




 45

More Related Content

What's hot

HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Databricks
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
 
Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Labs
 
NY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with MaxscaleNY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with MaxscaleWagner Bianchi
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...confluent
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier GonzĂĄlez
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...HostedbyConfluent
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsHostedbyConfluent
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteDatabricks
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...confluent
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Jvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies applicationJvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies applicationQuentin Ambard
 

What's hot (20)

HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven Microservices
 
NY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with MaxscaleNY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with Maxscale
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the Trade
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Jvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies applicationJvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies application
 

Viewers also liked

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Yonik Seeley
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Issei Nishigata
 
Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術karupanerura
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 

Viewers also liked (12)

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
 
Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 

Similar to The Many Facets of Apache Solr - Yonik Seeley

Fazendo mĂĄgica com ElasticSearch
Fazendo mĂĄgica com ElasticSearchFazendo mĂĄgica com ElasticSearch
Fazendo mĂĄgica com ElasticSearchPedro Franceschi
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme SwiftMovel
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesMongoDB
 
Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Aaron Patterson
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyElmaLyrics
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksGrgur Grisogono
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Codemotion
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 

Similar to The Many Facets of Apache Solr - Yonik Seeley (20)

Solr 3.1 and beyond
Solr 3.1 and beyondSolr 3.1 and beyond
Solr 3.1 and beyond
 
Fazendo mĂĄgica com ElasticSearch
Fazendo mĂĄgica com ElasticSearchFazendo mĂĄgica com ElasticSearch
Fazendo mĂĄgica com ElasticSearch
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading Strategies
 
Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, py
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha Frameworks
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

The Many Facets of Apache Solr - Yonik Seeley

  • 1. The Many Facets of Apache Solr Yonik Seeley, Lucid Imagination yonik@lucidimagination.com, Oct 20 2011
  • 2. What I Will Cover §  What is Faceted Search §  Solr’s Faceted Search §  Tips & Tricks §  Performance & Algorithms 3
  • 3. My Background §  Creator of Solr §  Co-founder of Lucid Imagination §  Expertise: Distributed Search systems and performance §  Lucene/Solr committer, a member of the Lucene PMC, member of the Apache Software Foundation §  Work: CNET Networks, BEA, Telcordia, among others §  M.S. in Computer Science, Stanford
  • 4. What is Faceted Search 5
  • 5. Faceted Search Example The facet count or constraint count Manufacturer is shows how many a facet, a way of results match each categorizing the value results Canon, Sony, and Nikon are constraints, or facet values The breadcrumb trail shows what constraints have already been applied and allows for their removal
  • 6. Key Elements of Faceted Search §  No hierarchy of options is enforced •  Users can apply facet constraints in any order •  Users can remove facet constraints in any order §  No surprises •  The user is only given facets and constraints that make sense in the context of the items they are looking at •  The user always knows what to expect before they apply a constraint §  Also known as guided navigation, faceted navigation, faceted browsing, parametric search 7
  • 8. Field Faceting §  Specifies a Field to be used as a Facet §  Uses each term indexed in that Field as a Constraint §  Field must be indexed §  Can be used multiple times for multiple fields q ! = iPhone! ! !fq = inStock:true! !facet = true! !facet.field = color! !facet.field = category! 9
  • 9. Field Faceting Response http://localhost:8983/solr/select?q=iPhone&fq=inStock:true! &facet=true&facet.field=color&facet.field=category! <lst name="facet_counts”>! <lst name="facet_fields">! <lst name="color">! <int name="red">17</int>! <int name="green">6</int>! <int name="blue">2</int>! <int name="yellow">2</int>! <lst name=”category">! <int name=“accessories”>16</int>! <int name=“electronics”>11</int>! 10
  • 10. Or if you prefer JSON… http://localhost:8983/solr/select?q=iPhone&fq=inStock:true! &facet=true&facet.field=color&facet.field=category&wt=json! "facet_counts":{! "facet_fields":{! "color":[! "red",17,! "green",6,! "blue",2, ! "yellow",2]! "category":[! "accessories",16,! "electronics",11]! 11
  • 11. Applying Constraints Assume the user clicked on “red”… Simply add another filter query to apply that constraint http://localhost:8983/solr/select? q=iPhone&fq=inStock:true&fq=color:red&facet=true& facet.field=color&facet.field=category! Remove redundant facet.field (assuming single valued field) 12
  • 12. facet.field Options §  facet.prefix - Restricts the possible constraints to only indexed values with a specified prefix. §  facet.mincount=0 - Restricts the constraints returned to those containing a minimum number of documents in the result set. §  facet.sort=count - The ordering of constraints: count or index §  facet.offset=0 - Indicates how many constraints in the specified sort ordering should be skipped in the response. §  facet.limit=100 - The number of constraints to return §  facet.missing=false – Return the number of docs with no value in the field 13
  • 13. facet.query ! §  Specifies a query string to be used as a Facet Constraint §  Typically used multiple times to get multiple (discrete) sets §  Any type of query supported !facet.query = rank:[* TO 20]! !facet.query = rank:[21 TO *]! 14
  • 14. facet.query Results <result numFound="27" ... />! ...! <lst name="facet_counts">! <lst name="facet_queries">! <int name="rank:[* TO 20]">2</int>! <int name="rank:[21 TO *]">15</int>! </lst>! ...!
  • 15. The lat,lon center point to Spatial faceting search from !q=*:*&facet=true! !pt=45.15,-93.85! Name of the field containing lat+lon data !sfield=store! !facet.query={!geofilt d=5}! !facet.query={!geofilt d=10}! geospatial query type "facet_counts":{! "facet_queries":{! "{!geofilt d=5}":3,! "{!geofilt d=10}":6},! 16
  • 16. Range Faceting "facet_counts":{ §  Simpler than a sequence "facet_ranges":{ of facet.query params "price":{ "counts”:[ "0.0”,5, http://...&facet=true "50.0”,2, &facet.range=price "100.0”,0, "150.0”,2, &facet.range.start=0 "200.0”,0, &facet.range.end=500 "250.0”,1, "300.0”,2, &facet.range.gap=50 "350.0”,2, "400.0”,0, "450.0”,1], "gap":50.0, "start":0.0, "end":500.0}}}}
  • 17. Date Faceting l  facet.date is deprecated, use facet.range on a date field now l  Creates Constraints based on evenly sized date ranges using the Gregorian Calendar l  Ranges are specified using "Date Math" so they DWIM in spite of variable length months and leap years facet.range = pubdate! facet.range.start = NOW/YEAR-1YEAR! facet.range.end = NOW/MONTH+1MONTH! facet.range.gap = +1MONTH!
  • 18. Date Faceting Results "facet_counts":{! "facet_ranges":{! ”pubdate":{! "counts":[! "2010-01-01T00:00:00Z",4,! "2010-02-01T00:00:00Z",6,! "2010-03-01T00:00:00Z",0,! "2010-04-01T00:00:00Z",13,! […]! "2011-09-01T00:00:00Z",5,! "2011-10-01T00:00:00Z",2],! "gap":"+1MONTH",! "start":"2010-01-01T00:00:00Z",! "end":"2011-11-01T00:00:00Z”}}}!
  • 19. Range Faceting Options l  facet.range.hardend=false - Determines what effective end value is used when the specified "start" and "end" don't divide into even "gap" sized buckets; false means the last Constraint range may be shorter then the others l  facet.range.other=none - Allows you to specify what other Constraints you are interested in besides the generated ranges: before, after, between, none, all l  facet.range.include=lower – Specifies what bounds are inclusive vs exclusive: lower, upper, edge, outer, all
  • 20. Pivot Faceting (trunk) l  Computes a Matrix of Constraint Counts across multiple Facet Fields l  Syntax: facet.pivot=field1,field2,field3,… facet.pivot=cat,inStock #docs #docs w/ #docs w/ inStock:true instock:false cat:electronics 14 10 4 cat:memory 3 3 0 cat:connector 2 0 2 cat:graphics card 2 0 2 cat:hard drive 2 2 0
  • 21. Pivot Faceting http://...&facet=true&facet.pivot=cat,popularity "facet_counts":{ (continued) "facet_pivot":{ "cat,popularity":[{ { "field":"cat", "field":"popularity", 14 docs w/ "value":"electronics", "value”:1, cat==electronics "count":14, "count":2}]}, "pivot":[{ { 5 docs w/ "field":"popularity", "field":"cat", cat==electronics "value":6, "value":"memory", && popularity==6 "count":5}, "count":3, { "pivot":[]}, "field":"popularity", "value":7, […] "count":4},
  • 23. term QParser l  Default Query Parser does special things with whitespace and punctuation l  Problematic when "filtering" on Facet Field Constraints that contain whitespace, punctuation, or other reserved characters. l  Use the term parser to filter on an exact Term fq = {!term f=category}Books & Magazines OR fq = {!term f=category v=$t} t = Books & Magazines
  • 24. Taxonomy Facets l  What If Your Documents Are Organized in a Taxonomy?
  • 25. Taxonomy Facets: Data l  Flattened Data !Doc#1: NonFic > Law! !Doc#2: NonFic > Sci! !Doc#3: NonFic > Hist! NonFic > Sci > Phys! l  Indexed Terms (prepend number of nodes in path segment) Doc#1: 1/NonFic, 2/NonFic/Law! Doc#2: 1/NonFic, 2/NonFic/Sci! Doc#3: 1/NonFic, 2/NonFic/Hist, ! 2/NonFic/Sci, 3/NonFic/Sci/Phys!
  • 26. Taxonomy Facets: Initial Query facet.field = category! facet.prefix = 2/NonFic! facet.mincount = 1! <result numFound="164" ...! <lst name="facet_fields">! <lst name="category">! <int name="2/NonFic/Sci">2</int>! <int name="2/NonFic/Hist">1</int>! <int name="2/NonFic/Law">1</int>!
  • 27. Taxonomy Facets: Drill Down fq = {!term f=category}2/NonFic/Sci! facet.field = category! facet.prefix = 3/NonFic/Sci! facet.mincount = 1! <result numFound="2" ...! <lst name="facet_fields">! <lst name="category">! <int name=”3/NonFic/Sci/Phys">1</int>!
  • 28. Multi-Select Faceting http://search.lucidimagination.com §  Very  generic  support   •  Reuses  localParams  syntax  {!name=val}   •  Ability  to  tag  lters   •  Ability  to  exclude  certain  lters  when   faceCng,  by  tag      q=index  replicaCon    facet=true    fq={!tag=pr}project:(lucene  OR  solr)    facet.eld={!ex=pr}project    facet.eld={!ex=src}source   29  
  • 29. Same Facet, Different Exclusions §  A key can be specified for a facet to change the name used to identify it in the response. q = Hot Rod! fq = {!df=colors tag=cx}purple green! facet.field = {!key=all_colors ex=cx}colors! facet.field = {!key=overlap_colors}colors! "facet_counts":{! ”overlap_colors":[! "facet_fields":{! "red",7,! ”all_colors":[! "green",6,! "red",19,! "blue”,1]! "green",6,! }! "blue",2],! }! 30
  • 30. “Pretty” facet.field Terms §  Field Faceting uses Indexed Terms §  Leverage copyField and TokenFilters that will give you good looking Constraints <tokenizer  class="solr.PaPernTokenizerFactory"                          paPern="(,|;)s*"  />   <lter  class="solr.PaPernReplaceFilterFactory"                    paPern="s+"  replacement="  "  />   <lter  class="solr.PaPernReplaceFilterFactory"                    paPern="  and  "  replacement="  &amp;  "  />   <lter  class="solr.TrimFilterFactory"  />   <lter  class="solr.CapitalizaConFilterFactory"                    onlyFirstWord="false"  />   31
  • 31. “Pretty” facet.field Results {“id” : “MyExampleDoc”,! "category” : ” books ! and magazines; ! computers, “! }! copyField in schema <copyField “source”=“category” “dest”=“category_pretty”/> "facet_counts":{! "facet_fields":{! "category_pretty":[! "Books & Magazines",1,! "Computers",1]! 32
  • 32. facet.field Labels §  facet.query params are echoed verbatim when returning the constraint counts §  Optionally, one can declare a facet.query in solrconfig.xml and include a "label" that the presentation layer can parse out for display. “label” has no meaning to solr facet.query = {!label=‘Hot!’  }! +pop:[1 TO *] ! +pub_date:[NOW/DAY-1DAY TO *]! "facet_queries"  :  {      "  {!label=‘Hot!’  }  pop:[1  TO  *]  ...  "  :  15   }   33
  • 34. facet.method ! name facet.method description memory CPU enum enum Iterates over terms, filter-per-term in ~O(nTerms) calculating set the filterCache intersections field fc (single- Iterates over documents, Lucene O(nDocs) cache valued field) counting terms FieldCache Entry… int [maxDoc]+terms UnInvert fc (multi- Iterates over documents, O(maxDoc * num ~O(nDocs) edField valued field) counting terms terms per doc) Per- fcs Like field-cache, just Lucene O(nDocs) segment better for NRT since FieldCache +O(nTerms) field FieldCacheEntry is at Entries… int cache segment level. [maxDoc]+terms 35
  • 35. facet.method=fc Mem=int[maxDoc] CPU=O(nDocs in (single-valued field) +unique_values base set) Documents matching the base query Lucene FieldCache Entry Juggernaut (StringIndex) for the hero q=Juggernaut field 0 order: for each &facet=true 2 lookup doc, an index into lookup: the &facet.field=hero the lookup array string values 7 5 (null) 3 batman accumulator 5 flash 0 1 spiderman 1 4 superman Priority queue 0 increment flash, 5 5 wolverine 0 Batman, 3 2 0 1 2
  • 36. facet.method=fcs (trunk)
 (per-segment single-valued) Segment1 Segment2 Segment3 Segment4 FieldCache FieldCache FieldCache FieldCache Entry Entry Entry Entry accumulator1 accumulator2 accumulator3 accumulator4 inc lookup 0 0 1 0 3 2 3 1 0 Base 5 1 0 0 DocSet 2 0 0 4 7 thread4 1 thread2 thread3 2 thread1 Priority queue FieldCache + flash, 5 accumulator Batman, 3 merger (Priority queue)
  • 37. facet.method=fcs! l  Controllable multi-threading facet.method=fcs! facet.field={!threads=4}myfield!   l  Disadvantages l  Larger memory use (FieldCaches + accumulators) l  Slower (extra FieldCache merge step needed) – O(nTerms) l  Advantages l  Rebuilds FieldCache entries only for new segments (NRT friendly) l  Multi-threaded
  • 38. Per-segment faceting performance comparison Test index: 10M documents, 18 segments, single valued field Base DocSet=100 docs, facet.field on a field with 100,000 unique terms A Time for request* facet.method=fc facet.method=fcs static index 3 ms 244 ms quickly changing index 1388 ms 267 ms Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms B Time for request* facet.method=fc facet.method=fcs static index 26 ms 34 ms quickly changing index 741 ms 94 ms *complete request time, measured externally
  • 39. facet.method=fc 
 (multi-valued field) §  UnInvertedField - like single-valued FieldCache algorithm, but with multi-valued FieldCache §  Good for many unique terms, relatively few values per doc •  Best case: 50x faster, 5x smaller than “enum” (100K unique values, 1-5 per doc) •  O(n_docs), but optimization to count the inverse when n>maxDoc/2 §  Memory efficient •  Terms in a document are delta coded variable width ords (vints) •  Ord list for document packed in an int or in a shared byte[] •  Hybrid approach: “big terms” that match >5% of index use filterCache instead •  Only 1/128th of string values in memory
  • 40. facet.method=fc fieldValueCache §  Implicit cache with UnInvertedField entries •  Not autowarmed – use static warming request •  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc)
  • 41. Faceting: fieldValueCache §  Implicit cache with UnInvertedField entries •  Not autowarmed – use static warming request •  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc) item_cat: {field=cat,memSize=5376,tindexSize=52,time=2,phase1=2, nTerms=16,bigTerms=10,termInstances=6,uses=44}
  • 42. Multi-valued faceting: facet.method=enum §  facet.method=enum §  For each term in field: •  Retrieve filter Solr filterCache (in memory) •  Calculate intersection size hero:batman hero:flash Lucene Inverted Docs matching Index (on disk) base query 1 0 intersection 0 3 1 batman 1 3 5 8 count 1 5 5 flash 0 1 5 5 8 spiderman 2 4 7 9 Priority queue batman=2 superman 0 6 9 wolverine 1 2 7 8
  • 43. facet.method=enum ! §  O(n_terms_in_field) §  Short circuits based on term.df §  filterCache entries int[ndocs] or BitSet(maxDoc) §  Size filterCache appropriately •  Either autowarm filterCache, or use static warming queries (via newSearcher event) in solrconfig.xml §  facet.enum.cache.minDf - prevent filterCache use for small terms •  Also useful for huge index w/ no time constraints