SlideShare a Scribd company logo
1 of 43
Native Code, Off-Heap Data & 
JSON Facet API for Solr 
Yonik Seeley 
Apachecon EU 2014 
Budapest, Hungary
My Background 
• Creator of Solr 
• Heliosearch Founder 
• LucidWorks Co-Founder 
• Lucene/Solr committer, PMC member 
• Apache Software Foundation member 
• M.S. in Computer Science, Stanford
Heliosearch Project 
• The Next Evolution of Solr 
• Forked from Solr, Developing at github 
– Started Jan 2014 
– Well aligned community 
– Open Source, Apache licensed 
• Bring back to Apache in the future? 
• Currently drop-in replacement for Solr at the HTTP-API level 
– A super-set… we continually merge in upstream changes 
– Latest version of Heliosearch includes latest Solr 
• Current Features: Off-heap filters, Off-heap fieldcache, facet-by- 
function, sub-facets, native code performance 
enhancements
Garbage Collection
Garbage Collection Basics 
Eden Space 
Survivor Space 1 
Survivor Space 2 
Tenured Space 
Permanent Space 
 New objects allocated in Eden 
 Find live objects by tracing from GC 
“roots” (threads, stack locals, etc) 
 Make a copy of live objects, leaving 
“garbage” behind 
 Eden + Survivor Space copied 
together to other Survivor space 
 Tenured from Survivor when old 
enough 
 “stop-the-world” needed when GC 
can’t keep up 
 Out of memory when too much time 
spent in GC 
Thread
Java Memory Waste 
- Need to size for worst case scenario 
- OS needs free memory to cache index files 
- JVMs aren’t good at “sharing” with rest of the system 
- mmap allocations managed by OS, can be immediately reused on free 
OS Real Memory 
max heap 
Unused Heap 
Heap in use 
JVM 
max heap 
Unused Heap 
Heap in use 
JVM 
mmap alloced mmap alloced 
Unused Heap 
C Heap in use 
C Process 
Unused Heap 
C Heap in use 
C Process 
“Free” Memory 
includes buffer 
cache, important 
to cache index files
GC Impact 
 GC Reduces Throughput 
Time to copy all that memory around could be spent 
better! 
 Stop-the-world pauses 
Seconds to Minutes long 
Pause time proportional to heap size 
Still exists in all Hotspot GCs… CMS, G1GC, etc 
Breaks Application SLAs (request timeouts, etc) 
Can cause SolrCloud Zookeeper session timeouts 
 Reducing max pause size normally means reduced 
throughput 
 Non-graceful degradation 
if you don't size your heap big enough… BOOM!
GC Tuning 
UseSerialGC 
UseParallelGC 
UseParallelOldGC 
UseParallelOldGCCompacting 
UseParallelDensePrefixUpdate 
HeapMaximumCompactionInterval 
HeapFirstMaximumCompactionCount 
UseMaximumCompactionOnSystemGC 
ParallelOldDeadWoodLimiterMean 
ParallelOldDeadWoodLimiterStdDev 
UseParallelOldGCDensePrefix 
ParallelGCThreads 
ParallelCMSThreads 
YoungPLABSize 
OldPLABSize 
GCTaskTimeStampEntries 
AlwaysTenure 
NeverTenure 
ScavengeBeforeFullGC 
UseConcMarkSweepGC 
ExplicitGCInvokesConcurrent 
UseCMSBestFit 
UseCMSCollectionPassing 
UseParNewGC 
ParallelGCVerbose 
ParallelGCBufferWastePct 
ParallelGCRetainPLAB 
TargetPLABWastePct 
PLABWeight 
ResizePLAB 
PrintPLAB 
ParGCArrayScanChunk 
ParGCDesiredObjsFromOverflowList 
CMSParPromoteBlocksToClaim 
AlwaysPreTouch 
CMSUseOldDefaults 
CMSYoungGenPerWorker 
CMSIncrementalMode 
CMSIncrementalDutyCycle 
CMSIncrementalPacing 
CMSIncrementalDutyCycleMin 
CMSIncrementalSafetyFactor 
CMSIncrementalOffset 
CMSExpAvgFactor 
CMS_FLSWeight 
CMS_FLSPadding 
FLSCoalescePolicy 
CMS_SweepWeight 
CMS_SweepPadding 
CMS_SweepTimerThresholdMillis 
CMSClassUnloadingEnabled 
CMSCompactWhenClearAllSoftRefs 
UseCMSCompactAtFullCollection 
CMSFullGCsBeforeCompaction 
CMSIndexedFreeListReplenish 
CMSLoopWarn 
CMSMarkStackSize 
CMSMarkStackSizeMax 
CMSMaxAbortablePrecleanLoops 
CMSMaxAbortablePrecleanTime 
CMSAbortablePrecleanMinWorkPerIteration 
CMSAbortablePrecleanWaitMillis 
CMSRescanMultiple 
CMSConcMarkMultiple 
CMSRevisitStackSize 
CMSAbortSemantics 
CMSParallelRemarkEnabled 
CMSParallelSurvivorRemarkEnabled 
CMSPLABRecordAlways 
CMSConcurrentMTEnabled 
CMSPermGenPrecleaningEnabled 
CMSPermGenSweepingEnabled 
CMSPrecleaningEnabled 
CMSPrecleanIter 
CMSPrecleanNumerator 
CMSPrecleanDenominator 
CMSPrecleanRefLists1 
CMSPrecleanRefLists2 
CMSPrecleanSurvivors1 
CMSPrecleanSurvivors2 
CMSPrecleanThreshold 
CMSCleanOnEnter 
CMSRemarkVerifyVariant 
CMSScheduleRemarkEdenSizeThreshold 
CMSScheduleRemarkEdenPenetration 
CMSScheduleRemarkSamplingRatio 
CMSSamplingGrain 
CMSScavengeBeforeRemark 
CMSWorkQueueDrainThreshold 
CMSWaitDuration 
CMSYield 
CMSBitMapYieldQuantum 
UseGCLogFileRotation 
NumberOfGCLogFiles 
GCLogFileSize 
LargePageSizeInBytes 
LargePageHeapSizeThreshold 
PrintGCApplicationConcurrentTime 
PrintGCApplicationStoppedTime 
OnOutOfMemoryError 
ClassUnloading 
BlockOffsetArrayUseUnallocatedBlock 
RefDiscoveryPolicy 
ParallelRefProcEnabled 
CMSTriggerRatio 
CMSBootstrapOccupancy 
CMSInitiatingOccupancyFraction 
UseCMSInitiatingOccupancyOnly 
HandlePromotionFailure 
PreserveMarkStackSize 
ZeroTLAB 
PrintTLAB 
TLABStats 
AlwaysActAsServerClassMachine 
DefaultMaxRAM 
DefaultMaxRAMFraction 
DefaultInitialRAMFraction 
UseAutoGCSelectPolicy 
AutoGCSelectPauseMillis 
UseAdaptiveSizePolicy 
UsePSAdaptiveSurvivorSizePolicy 
UseAdaptiveGenerationSizePolicyAtMinorCollection 
UseAdaptiveGenerationSizePolicyAtMajorCollection 
UseAdaptiveSizePolicyWithSystemGC 
UseAdaptiveGCBoundary 
AdaptiveSizeThroughPutPolicy 
AdaptiveSizePausePolicy 
AdaptiveSizePolicyInitializingSteps 
AdaptiveSizePolicyOutputInterval 
UseAdaptiveSizePolicyFootprintGoal 
AdaptiveSizePolicyWeight 
AdaptiveTimeWeight 
PausePadding 
PromotedPadding 
SurvivorPadding 
AdaptivePermSizeWeight 
PermGenPadding 
ThresholdTolerance 
AdaptiveSizePolicyCollectionCostMargin 
YoungGenerationSizeIncrement 
YoungGenerationSizeSupplement 
YoungGenerationSizeSupplementDecay 
TenuredGenerationSizeIncrement 
TenuredGenerationSizeSupplement 
TenuredGenerationSizeSupplementDecay 
MaxGCPauseMillis 
MaxGCMinorPauseMillis 
GCTimeRatio 
AdaptiveSizeDecrementScaleFactor 
UseAdaptiveSizeDecayMajorGCCost 
AdaptiveSizeMajorGCDecayTimeScale 
MinSurvivorRatio 
InitialSurvivorRatio 
BaseFootPrintEstimate 
UseGCOverheadLimit 
GCTimeLimit 
GCHeapFreeLimit 
PrintAdaptiveSizePolicy 
DisableExplicitGC 
CollectGen0First 
BindGCTaskThreadsToCPUs 
UseGCTaskAffinity 
ProcessDistributionStride 
CMSCoordinatorYieldSleepCount 
CMSYieldSleepCount 
PrintGCTaskTimeStamps 
TraceClassLoadingPreorder 
TraceGen0Time 
TraceGen1Time 
PrintTenuringDistribution 
PrintHeapAtSIGBREAK 
TraceParallelOldGCTasks 
PrintParallelOldGCPhaseTimes 
MaxHeapSize 
MaxNewSize 
PretenureSizeThreshold 
MinTLABSize 
TLABAllocationWeight 
TLABWasteTargetPercent 
TLABRefillWasteFraction 
TLABWasteIncrement 
MaxLiveObjectEvacuationRatio 
OldSize 
MinHeapFreeRatio 
MaxHeapFreeRatio 
SoftRefLRUPolicyMSPerMB 
MinHeapDeltaBytes 
MinPermHeapExpansion 
MaxPermHeapExpansion 
QueuedAllocationWarningCount 
MaxTenuringThreshold 
InitialTenuringThreshold 
TargetSurvivorRatio 
MarkSweepDeadRatio 
PermMarkSweepDeadRatio 
MarkSweepAlwaysCompactCount 
PrintCMSStatistics 
PrintCMSInitiationStatistics 
PrintFLSStatistics 
PrintFLSCensus 
DeferThrSuspendLoopCount 
DeferPollingPageLoopCount 
SafepointSpinBeforeYield 
UseDepthFirstScavengeOrder 
GCDrainStackTargetSize 
ThreadSafetyMargin 
CodeCacheMinimumFreeSpace 
MaxDirectMemorySize 
PerfDataMemorySize 
AggressiveHeap 
UseCompressedStrings 
UseStringCache 
HeapDumpOnOutOfMemoryError 
HeapDumpPath 
PrintGC 
PrintGCDetails 
PrintGCTimeStamps 
PG1HeapRegionSize 
G1ReservePercent 
G1ConfidencePercent 
PrintPromotionFailure 
PrintGCDateStamps 
-XX:InitiatingHeapOccupancyPercent=n 
-XX:MaxGCPauseMillis=n 
-XX:ConcGCThreads=n 
-XX:MaxHeapFreeRatio=70 
-XX:MaxTenuringThreshold=n 
-XX:+ScavengeBeforeFullGC
GC Reduction 
 Reuse objects – cause less garbage 
 Move certain things off-heap (invisible to GC) 
 Option1: Direct ByteBuffers 
Limited to “int” (2GB) 
No way to directly “free” – still relies on GC 
 Option2: sun.misc.Unsafe 
malloc() + free() + direct memory access 
Supported on all major JVMs 
Widely used: Java (nio, concurrent),JSR166, Google 
Guava, objenesis (which is used in Kyro, which is used 
in Twitter Storm), Apache DirectMemory,Lightning, 
Hazelcast, snappy, gson, … 
Being considered for Java 9
Off-Heap Filters 
50M docs 
(3.8 GB index) 
8GB RAM 
20K requests 
8 req threads 
500 filters 
JVM Options: 
-Xmx4G (solr)
Off-Heap title 
Filters Test 
Observed max process sizes 
Solr : 3.8GB – 4.3GB 
Heliosearch: 3.6GB – 3.7GB
Off-Heap FieldCache 
Normal (on-heap) FieldCache 
 Typically the largest data structures kept on the heap 
 Used for sorting, function query values, single-valued faceting, 
grouping 
 Uses weak references 
Heliosearch nCache (n is for “native”) 
 Allocated off-heap 
 First-class managed Solr cache 
 Configure size, warming policies 
 View statistics 
 Per-segment (NRT friendly) 
 No weak references
nCache admin stats 
item_id:{ "field":"id", "uses":8, "class":"StrTopValues", 
"refcount":2, "numSegments":7, "carriedOver":6, "size":612} 
item_popularity:{ "field":"popularity", "uses":5, 
"class":"IntTopValues", "refcount":2, "numSegments":7, 
"carriedOver":6, "size":106} 
item_price:{ 
"field":"price”, 
"uses":0, -- the number of top-level uses for searcher 
"class":"FloatTopValues", 
"refcount":2, 
"numSegments":5, -- number of segments populated 
"carriedOver":5, -- number of segments carried over from last searcher 
"size":272 -- size in bytes for all populated segments 
}
Off-Heap Integer Field 
 50M document index 
 Sorting on 6 different integer fields (10,100,1000,10000,1M unique values) 
 4 request threads 
Results 
 42% faster sorting 
 73% faster functions
String Field Sorting 
 10M document index 
 10 different string fields, each field 80% populated 
 Median latency
String Field Sorting Throughput 
 Concurrent throughput sorting on random fields in random order (asc/desc) 
 ~50% performance gain
Native Code
Native Code 
 The Idea: create native accelerators for CPU hotspots 
Faceting anyone? 
 But…. JNI Sucks! (and it’s GC’s fault again) 
jint *buf= (*env)->GetIntArrayElements(env, arr, 0); 
for (i=0; i<len; i++) { 
sum += buf[i]; 
 GetArrayElements() – makes a *copy* of the array! 
 GetPrimitiveArrayCritical() – blocks garbage collection! 
Tons of other restrictions… it’s a “critical section” 
 Defeats the purpose of going to native code in the first place 
 But… our data is already off-heap, we’re good! 
}
Native Single Valued String Faceting 
 Top-Level off-heap String cache 
Improves Sorting and Faceting speed 
Eliminates FieldCache “insanity” 
 Native Code 
Written in C++, compiled with GCC 4.7, 4.8 
Currently supports 64 bit Windows, OS-X, Linux (x86) 
static compilation avoids JVM hotspot warmup period, 
mis-compilation bugs, and variations between runs
Native Faceting Performance
Terms Query Optimization
New Facet Module
Facet Module Goals 
 Replace the aging “SimpleFacets” 
 First class JSON support 
 Easier programmatic construction of complex nested facet 
commands 
 Canonical response format that is easier for clients to 
parse 
 First class analytics support 
 Cleaner distributed search support 
 Fully pluggable 
 Better base for integration of other search features 
Heliosearch is a Solr super-set, so you can still chose to 
use the old faceting or mix-n-match.
API Comparison 
Old Style New JSON API 
&facet=true 
&facet.range={!key=age_ranges}age 
&f.age_ranges.facet.range.start=0 
&f.age_ranges.facet.range.end=100 
&f.age_ranges.facet.range.gap=10 
&facet.range={!key=price_ranges}price 
&f.price_ranges.facet.range.start=0 
&f.price_ranges.facet.range.end=1000 
&f.price_ranges.facet.range.gap=50 
{ 
age_ranges: { // facet name 
range: { // facet type 
field : age, // facet params 
start : 0, 
end : 100, 
gap : 10 
} 
}, 
price_ranges: { 
range: { 
field : price, 
start : 0, 
end : 1000, 
gap : 50 
} 
} 
}
Facet Functions 
 Sort/Report by things other than “count” 
Aggregation Functions / Stats: 
count 
sum(function) 
avg(function) 
sumsq(function) 
min(function) 
max(function) 
unique(string_field) 
any “function query” that yields a 
numeric value! 
Example: 
sum(mul(num_units, unit_price)) 
 Stats are calculated “per bucket” 
 Buckets created by Query, Range, or Terms (field) facets
Simple Request + Response 
$ curl http://localhost:8983/solr/query -d 'q=widgets& 
json.facet= 
{ // Comments can help with clarity 
/* traditional C-style comments are also supported */ 
x : "avg(price)" , // Simple strings can occur unquoted 
y : 'unique(brand)' // Strings can also use single quotes 
} 
' 
[…] 
"facets" : { 
"count" : 314, 
"x" : 102.5, 
"y" : 28 
} 
Number of documents in 
the facet bucket
Terms Facet Example 
json.facet={ 
shoes:{ 
terms:{ 
field: shoe_style, 
sort: {x : desc}, 
facet:{ 
x : "avg(price)", 
y : "unique(brand)" 
} 
} 
} 
} 
"facets": { 
"count" : 472, 
"shoes": { 
"buckets" : [ 
{ 
"val" : "Hiking", 
"count" : 34, 
"x" : 135.25, 
"y" : 17, 
}, 
{ 
"val" : "Running", 
"count" : 45, 
"x" : 110.75, 
"y" : 24, 
}, 
Executed per-bucket
Sub-Facets 
Any facet that produces buckets can have sub-facets 
(terms/field, range, query) 
Sub-facets can have facet functions (stats) or their 
own sub-facets (no limit to nesting). 
A subfacet can be any type (field, range, query) 
Multiple subfacets can be added to any given facet 
Subfacets are first-class facets - can be configured 
independently like any other facet. 
Different offsets, limits, stats, sorts, etc
Sub-Facet Example 
json.facet={ 
shoes:{ 
terms:{ 
field: shoe_style, 
sort: {x : desc}, 
facet:{ 
x : "avg(price)", 
y : "unique(brand)", 
colors :{terms:color} 
} 
} 
} 
} 
"facets": { 
"count" : 472, 
"shoes": { 
"buckets" : [ 
{ 
"val" : "Hiking", 
"count" : 34, 
"x" : 135.25, 
"y" : 17, 
"colors" : { 
"buckets" : [ 
{ "val" : "brown", 
"count" : 12 }, 
{ "val" : "black", 
"count" : 10 
}, […] 
] 
} // end of colors sub-facet 
}, // end of Hiking bucket 
{ 
"val" : "Running", 
"count" : 45, 
"x" : 110.75, 
"y" : 24, 
"colors" : { 
"buckets" : […] 
Short-form for terms facet simply 
specifies the field. Sorts buckets 
by count descending.
Terms Facet 
Terms facet creates buckets of docs with the same value in a field 
- field – The field name to facet over. 
- offset – Used for paging, this skips the first N buckets. Defaults to 0. 
- limit – Limits the number of buckets returned. Defaults to 10. 
- mincount – Only return buckets with a count of at least this number. Defaults to 1. 
- sort – Specifies how to sort the buckets produced. “count” specifies document count, 
“index” sorts by the index (natural) order of the bucket value. One can also sort by any 
facet function / statistic that occurs in the bucket. The default is “count desc”. This 
parameter may also be specified in JSON like sort:{count:desc}. The sort order may 
either be “asc” or “desc” 
- missing – A boolean that specifies if a special “missing” bucket should be returned that is 
defined by documents without a value in the field. Defaults to false. 
- numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer 
representing the number of buckets for the facet (as opposed to the number of buckets 
returned). Defaults to false. 
- allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing 
the union of all of the buckets. For multi-valued fields, this is different than a bucket for all 
of the documents in the domain since a single document can belong to multiple buckets. 
Defaults to false. 
- prefix – Only produce buckets for terms starting with the specified prefix.
Query Facet 
Query facet creates a single bucket of documents matching the 
query. 
{ // simple example 
highpop:{ query:{ q:"inStock:true AND popularity[8 TO 10]" } } 
} 
{ // example with multiple sub-facets 
highpop:{ query:{ 
q : "inStock:true AND popularity[8 TO 10]", 
facet : { 
average_price : "agv(price)", 
available_colors : { terms : color }, 
price_ranges : { range : { 
field:price, start:0, end:200, gap:10 
}} 
}} 
}
Range Facet 
Creates buckets over ranges on a numeric or date field 
Parameter names/values "in sync" with Solr range parameters: 
field – The numeric field or date field to produce range buckets from 
start – Lower bound of the ranges 
end – Upper bound of the ranges 
gap – Size of each range bucket produced 
hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, 
the last bucket will be “gap” wide, which may extend past “end”. 
other – This param indicates that in addition to the counts for each range constraint between facet.range.start and 
facet.range.end, counts should also be computed for… 
– "before" all records with field values lower then lower bound of the first range 
– "after" all records with field values greater then the upper bound of the last range 
– "between" all records with field values between the start and end bounds of all ranges 
– "none" compute none of this information 
– "all" shortcut for before, between, and after 
include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are 
inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is 
inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can 
be modified by the facet.range.include param, which can be any combination of the following options… 
– "lower" all gap based ranges include their lower bound 
– "upper" all gap based ranges include their upper bound 
– "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) 
even if the corresponding upper/lower option is not specified 
– "outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already 
include those boundaries. 
– "all" shorthand for lower, upper, edge, outer
Sub-Facets + Facet-Functions 
= 
Business Intelligence / Analytics
Fantasy ($1045) 
Top Authors 
$423 George R.R. Martin 
$347 Brandon Sanderson 
$155 JK Rowling 
Top Books 
$252 A Game of Thrones 
$113 Emperor of Thorns 
$101 Nine Princes in Amber 
$82 Steel Heart 
Sci-Fi ($898) 
Top Authors 
$321 Iain M Banks 
$218 Neal Asher 
$155 Neal Stephenson 
Top Books 
$113 Gridlinked 
$101 Use of Weapons 
$93 Snow Crash 
$82 The Skinner 
Mystery ($645) 
Top Authors 
$191 James Patterson 
$145 Patricia Cornwell 
$126 John Grisham 
Top Books 
$85 One for the Money 
$77 Angels & Daemons 
$64 Shutter Island 
$35 The Firm 
Filter By 
State 
$852 NJ (14 stores) 
$658 NY (11 stores) 
$421 CT (8 stores) 
Chain 
$984 Amazoon (14 stores) 
$734 Houses&Royalty (9 stores) 
$387 Books-r-us (7 stores) 
Store 
$108 Amazoon Branchburg 
$93 Books-r-us Bridgewater 
$87 H&R NYC 
Number of Books 
Chain 
201K Houses&Royalty 
183K Amazoon 
98K Books-r-us 
Store 
193K H&R NYC 
77K Books-r-us Bridgewater 
68K Amazoon Branchburg
date_breakout : { range: { 
field: sale_date, 
start : ..., 
end : ..., 
gap : "+1MONTH”, 
facet : { 
top_genre : { terms : { 
field : genre, 
sort : "revenue desc", 
limit : 4, 
facet : { 
revenue : "sum(sales)" 
} 
}}, 
by_chain: { terms : { 
field : chain, 
facet : { 
revenue : "sum(sales)" 
} 
}} 
[…] 
Implementation 
Creates series of facet 
buckets based on date 
For each date bucket, facet by genre, taking 
the top 4 by revenue 
For each genre bucket, report revenue
Fantasy ($1045) 
Top Authors 
$423 George R.R. Martin 
$347 Brandon Sanderson 
$155 JK Rowling 
Top Books 
$252 A Game of Thrones 
$113 Emperor of Thorns 
$101 Nine Princes in Amber 
$82 Steel Heart 
Sci-Fi ($898) 
Top Authors 
$321 Iain M Banks 
$218 Neal Asher 
$155 Neal Stephenson 
Top Books 
$113 Gridlinked 
$101 Use of Weapons 
$93 Snow Crash 
$82 The Skinner 
Mystery ($645) 
Top Authors 
$191 James Patterson 
$145 Patricia Cornwell 
$126 John Grisham 
Top Books 
$85 One for the Money 
$77 Angels & Daemons 
$64 Shutter Island 
$35 The Firm 
top_genres:{ terms:{ 
field: genre, 
facet : { 
rev : "sum(sales)", 
top_authors:{ terms:{ 
field : author, 
sort :"rev desc", 
limit : 3, 
facet : { 
rev : "sum(sales)" 
} 
}}, 
top_books:{ terms:{ 
field : title, 
sort : "rev desc", 
limit : 4, 
facet : { 
rev : "sum(sales)" 
} 
}} 
[…]
Filter By 
State 
$852 NJ (14 stores) 
$658 NY (11 stores) 
$421 CT (8 stores) 
Chain 
$984 Amazoon (14 stores) 
$734 Houses&Royalty (9 stores) 
$387 Books-r-us (7 stores) 
Store 
$108 Amazoon Branchburg 
$93 Books-r-us Bridgewater 
$87 H&R NYC 
state_breakout:{ terms:{ 
field: state, 
sort: "rev desc", 
facet : { 
rev : "sum(sales)", 
num_stores : "unique(store)" 
}}, 
chain_breakout:{ terms:{ 
field: chain, 
sort: "rev desc", 
facet : { 
rev : "sum(sales)", 
num_stores : "unique(store)" 
}} , 
store_breakout:{ terms:{ 
field: store, 
sort: "rev desc", 
facet : { 
rev : "sum(sales)", 
}}}
Misc Features
Parameter Substitution 
 Parameters / macros substituted across whole request 
 Happens before any parsing, so usable in any context 
q=price:[ ${low} TO ${high} ] 
&low=100 
&high=200 
 Default values 
q=price:[ ${low:0} TO ${high:100} ] 
 Nested 
q=${price_query} 
&price_query=${price_field}:[ ${low} TO ${high} ] AND inStock:true 
&price_field=specialPrice 
&low=50 
&high=100
New Query Parser Features 
 Filters in queries - just like “fq” parameters, but may appear 
anywhere in a query 
q=(text:elephant –(filter(*:* -price:[ 0 TO 100 ]) OR 
filter(date[0 TO 2013]) ) 
 Constant Score Queries 
q=color:(blue OR green)^=1 text:shoes 
 Comments in Queries (can nest) 
q=+text:elephant /* the main query */ /* boosting part – WIP 
{!func}mul(pop,rank)^10 */
Thank You 
Help Develop the Next Generation of Solr! 
Resources: 
 http://heliosearch.org 
 https://github.com/Heliosearch/heliosearch 
 https://groups.google.com/forum/#!forum/heliosearch 
 https://groups.google.com/forum/#!forum/heliosearch-dev 
twitter.com/lucene_solr

More Related Content

What's hot

Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrLucidworks (Archived)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Creating New Streams: Presented by Dennis Gove, Bloomberg LP
Creating New Streams: Presented by Dennis Gove, Bloomberg LPCreating New Streams: Presented by Dennis Gove, Bloomberg LP
Creating New Streams: Presented by Dennis Gove, Bloomberg LPLucidworks
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Lucidworks
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceChitturi Kiran
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solradunne
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 

What's hot (20)

Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
Creating New Streams: Presented by Dennis Gove, Bloomberg LP
Creating New Streams: Presented by Dennis Gove, Bloomberg LPCreating New Streams: Presented by Dennis Gove, Bloomberg LP
Creating New Streams: Presented by Dennis Gove, Bloomberg LP
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Webinar: What's New in Solr 7
Webinar: What's New in Solr 7
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solr
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 

Viewers also liked

Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solrotisg
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeleylucenerevolution
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 

Viewers also liked (7)

Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solr
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 

Similar to Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Lucidworks
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management....NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...NETFest
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowMateuszSzczyrzyca
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETMaarten Balliauw
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvmPrem Kuppumani
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8Rahul Gupta
 
Introduction to Java performance tuning
Introduction to Java performance tuningIntroduction to Java performance tuning
Introduction to Java performance tuningMarouane Gazanayi
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData WebinarSnappyData
 
Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)Maarten Balliauw
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
Java programing considering performance
Java programing considering performanceJava programing considering performance
Java programing considering performanceRoger Xia
 
Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)Jens Hadlich
 
CodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory laneCodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory laneMaarten Balliauw
 

Similar to Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch) (20)

Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management....NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
Introduction to Java performance tuning
Introduction to Java performance tuningIntroduction to Java performance tuning
Introduction to Java performance tuning
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
 
Apache Geode Offheap Storage
Apache Geode Offheap StorageApache Geode Offheap Storage
Apache Geode Offheap Storage
 
Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Java programing considering performance
Java programing considering performanceJava programing considering performance
Java programing considering performance
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
Taming The JVM
Taming The JVMTaming The JVM
Taming The JVM
 
Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)
 
CodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory laneCodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory lane
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

  • 1. Native Code, Off-Heap Data & JSON Facet API for Solr Yonik Seeley Apachecon EU 2014 Budapest, Hungary
  • 2. My Background • Creator of Solr • Heliosearch Founder • LucidWorks Co-Founder • Lucene/Solr committer, PMC member • Apache Software Foundation member • M.S. in Computer Science, Stanford
  • 3. Heliosearch Project • The Next Evolution of Solr • Forked from Solr, Developing at github – Started Jan 2014 – Well aligned community – Open Source, Apache licensed • Bring back to Apache in the future? • Currently drop-in replacement for Solr at the HTTP-API level – A super-set… we continually merge in upstream changes – Latest version of Heliosearch includes latest Solr • Current Features: Off-heap filters, Off-heap fieldcache, facet-by- function, sub-facets, native code performance enhancements
  • 5. Garbage Collection Basics Eden Space Survivor Space 1 Survivor Space 2 Tenured Space Permanent Space  New objects allocated in Eden  Find live objects by tracing from GC “roots” (threads, stack locals, etc)  Make a copy of live objects, leaving “garbage” behind  Eden + Survivor Space copied together to other Survivor space  Tenured from Survivor when old enough  “stop-the-world” needed when GC can’t keep up  Out of memory when too much time spent in GC Thread
  • 6. Java Memory Waste - Need to size for worst case scenario - OS needs free memory to cache index files - JVMs aren’t good at “sharing” with rest of the system - mmap allocations managed by OS, can be immediately reused on free OS Real Memory max heap Unused Heap Heap in use JVM max heap Unused Heap Heap in use JVM mmap alloced mmap alloced Unused Heap C Heap in use C Process Unused Heap C Heap in use C Process “Free” Memory includes buffer cache, important to cache index files
  • 7. GC Impact  GC Reduces Throughput Time to copy all that memory around could be spent better!  Stop-the-world pauses Seconds to Minutes long Pause time proportional to heap size Still exists in all Hotspot GCs… CMS, G1GC, etc Breaks Application SLAs (request timeouts, etc) Can cause SolrCloud Zookeeper session timeouts  Reducing max pause size normally means reduced throughput  Non-graceful degradation if you don't size your heap big enough… BOOM!
  • 8. GC Tuning UseSerialGC UseParallelGC UseParallelOldGC UseParallelOldGCCompacting UseParallelDensePrefixUpdate HeapMaximumCompactionInterval HeapFirstMaximumCompactionCount UseMaximumCompactionOnSystemGC ParallelOldDeadWoodLimiterMean ParallelOldDeadWoodLimiterStdDev UseParallelOldGCDensePrefix ParallelGCThreads ParallelCMSThreads YoungPLABSize OldPLABSize GCTaskTimeStampEntries AlwaysTenure NeverTenure ScavengeBeforeFullGC UseConcMarkSweepGC ExplicitGCInvokesConcurrent UseCMSBestFit UseCMSCollectionPassing UseParNewGC ParallelGCVerbose ParallelGCBufferWastePct ParallelGCRetainPLAB TargetPLABWastePct PLABWeight ResizePLAB PrintPLAB ParGCArrayScanChunk ParGCDesiredObjsFromOverflowList CMSParPromoteBlocksToClaim AlwaysPreTouch CMSUseOldDefaults CMSYoungGenPerWorker CMSIncrementalMode CMSIncrementalDutyCycle CMSIncrementalPacing CMSIncrementalDutyCycleMin CMSIncrementalSafetyFactor CMSIncrementalOffset CMSExpAvgFactor CMS_FLSWeight CMS_FLSPadding FLSCoalescePolicy CMS_SweepWeight CMS_SweepPadding CMS_SweepTimerThresholdMillis CMSClassUnloadingEnabled CMSCompactWhenClearAllSoftRefs UseCMSCompactAtFullCollection CMSFullGCsBeforeCompaction CMSIndexedFreeListReplenish CMSLoopWarn CMSMarkStackSize CMSMarkStackSizeMax CMSMaxAbortablePrecleanLoops CMSMaxAbortablePrecleanTime CMSAbortablePrecleanMinWorkPerIteration CMSAbortablePrecleanWaitMillis CMSRescanMultiple CMSConcMarkMultiple CMSRevisitStackSize CMSAbortSemantics CMSParallelRemarkEnabled CMSParallelSurvivorRemarkEnabled CMSPLABRecordAlways CMSConcurrentMTEnabled CMSPermGenPrecleaningEnabled CMSPermGenSweepingEnabled CMSPrecleaningEnabled CMSPrecleanIter CMSPrecleanNumerator CMSPrecleanDenominator CMSPrecleanRefLists1 CMSPrecleanRefLists2 CMSPrecleanSurvivors1 CMSPrecleanSurvivors2 CMSPrecleanThreshold CMSCleanOnEnter CMSRemarkVerifyVariant CMSScheduleRemarkEdenSizeThreshold CMSScheduleRemarkEdenPenetration CMSScheduleRemarkSamplingRatio CMSSamplingGrain CMSScavengeBeforeRemark CMSWorkQueueDrainThreshold CMSWaitDuration CMSYield CMSBitMapYieldQuantum UseGCLogFileRotation NumberOfGCLogFiles GCLogFileSize LargePageSizeInBytes LargePageHeapSizeThreshold PrintGCApplicationConcurrentTime PrintGCApplicationStoppedTime OnOutOfMemoryError ClassUnloading BlockOffsetArrayUseUnallocatedBlock RefDiscoveryPolicy ParallelRefProcEnabled CMSTriggerRatio CMSBootstrapOccupancy CMSInitiatingOccupancyFraction UseCMSInitiatingOccupancyOnly HandlePromotionFailure PreserveMarkStackSize ZeroTLAB PrintTLAB TLABStats AlwaysActAsServerClassMachine DefaultMaxRAM DefaultMaxRAMFraction DefaultInitialRAMFraction UseAutoGCSelectPolicy AutoGCSelectPauseMillis UseAdaptiveSizePolicy UsePSAdaptiveSurvivorSizePolicy UseAdaptiveGenerationSizePolicyAtMinorCollection UseAdaptiveGenerationSizePolicyAtMajorCollection UseAdaptiveSizePolicyWithSystemGC UseAdaptiveGCBoundary AdaptiveSizeThroughPutPolicy AdaptiveSizePausePolicy AdaptiveSizePolicyInitializingSteps AdaptiveSizePolicyOutputInterval UseAdaptiveSizePolicyFootprintGoal AdaptiveSizePolicyWeight AdaptiveTimeWeight PausePadding PromotedPadding SurvivorPadding AdaptivePermSizeWeight PermGenPadding ThresholdTolerance AdaptiveSizePolicyCollectionCostMargin YoungGenerationSizeIncrement YoungGenerationSizeSupplement YoungGenerationSizeSupplementDecay TenuredGenerationSizeIncrement TenuredGenerationSizeSupplement TenuredGenerationSizeSupplementDecay MaxGCPauseMillis MaxGCMinorPauseMillis GCTimeRatio AdaptiveSizeDecrementScaleFactor UseAdaptiveSizeDecayMajorGCCost AdaptiveSizeMajorGCDecayTimeScale MinSurvivorRatio InitialSurvivorRatio BaseFootPrintEstimate UseGCOverheadLimit GCTimeLimit GCHeapFreeLimit PrintAdaptiveSizePolicy DisableExplicitGC CollectGen0First BindGCTaskThreadsToCPUs UseGCTaskAffinity ProcessDistributionStride CMSCoordinatorYieldSleepCount CMSYieldSleepCount PrintGCTaskTimeStamps TraceClassLoadingPreorder TraceGen0Time TraceGen1Time PrintTenuringDistribution PrintHeapAtSIGBREAK TraceParallelOldGCTasks PrintParallelOldGCPhaseTimes MaxHeapSize MaxNewSize PretenureSizeThreshold MinTLABSize TLABAllocationWeight TLABWasteTargetPercent TLABRefillWasteFraction TLABWasteIncrement MaxLiveObjectEvacuationRatio OldSize MinHeapFreeRatio MaxHeapFreeRatio SoftRefLRUPolicyMSPerMB MinHeapDeltaBytes MinPermHeapExpansion MaxPermHeapExpansion QueuedAllocationWarningCount MaxTenuringThreshold InitialTenuringThreshold TargetSurvivorRatio MarkSweepDeadRatio PermMarkSweepDeadRatio MarkSweepAlwaysCompactCount PrintCMSStatistics PrintCMSInitiationStatistics PrintFLSStatistics PrintFLSCensus DeferThrSuspendLoopCount DeferPollingPageLoopCount SafepointSpinBeforeYield UseDepthFirstScavengeOrder GCDrainStackTargetSize ThreadSafetyMargin CodeCacheMinimumFreeSpace MaxDirectMemorySize PerfDataMemorySize AggressiveHeap UseCompressedStrings UseStringCache HeapDumpOnOutOfMemoryError HeapDumpPath PrintGC PrintGCDetails PrintGCTimeStamps PG1HeapRegionSize G1ReservePercent G1ConfidencePercent PrintPromotionFailure PrintGCDateStamps -XX:InitiatingHeapOccupancyPercent=n -XX:MaxGCPauseMillis=n -XX:ConcGCThreads=n -XX:MaxHeapFreeRatio=70 -XX:MaxTenuringThreshold=n -XX:+ScavengeBeforeFullGC
  • 9. GC Reduction  Reuse objects – cause less garbage  Move certain things off-heap (invisible to GC)  Option1: Direct ByteBuffers Limited to “int” (2GB) No way to directly “free” – still relies on GC  Option2: sun.misc.Unsafe malloc() + free() + direct memory access Supported on all major JVMs Widely used: Java (nio, concurrent),JSR166, Google Guava, objenesis (which is used in Kyro, which is used in Twitter Storm), Apache DirectMemory,Lightning, Hazelcast, snappy, gson, … Being considered for Java 9
  • 10. Off-Heap Filters 50M docs (3.8 GB index) 8GB RAM 20K requests 8 req threads 500 filters JVM Options: -Xmx4G (solr)
  • 11. Off-Heap title Filters Test Observed max process sizes Solr : 3.8GB – 4.3GB Heliosearch: 3.6GB – 3.7GB
  • 12. Off-Heap FieldCache Normal (on-heap) FieldCache  Typically the largest data structures kept on the heap  Used for sorting, function query values, single-valued faceting, grouping  Uses weak references Heliosearch nCache (n is for “native”)  Allocated off-heap  First-class managed Solr cache  Configure size, warming policies  View statistics  Per-segment (NRT friendly)  No weak references
  • 13.
  • 14. nCache admin stats item_id:{ "field":"id", "uses":8, "class":"StrTopValues", "refcount":2, "numSegments":7, "carriedOver":6, "size":612} item_popularity:{ "field":"popularity", "uses":5, "class":"IntTopValues", "refcount":2, "numSegments":7, "carriedOver":6, "size":106} item_price:{ "field":"price”, "uses":0, -- the number of top-level uses for searcher "class":"FloatTopValues", "refcount":2, "numSegments":5, -- number of segments populated "carriedOver":5, -- number of segments carried over from last searcher "size":272 -- size in bytes for all populated segments }
  • 15. Off-Heap Integer Field  50M document index  Sorting on 6 different integer fields (10,100,1000,10000,1M unique values)  4 request threads Results  42% faster sorting  73% faster functions
  • 16. String Field Sorting  10M document index  10 different string fields, each field 80% populated  Median latency
  • 17. String Field Sorting Throughput  Concurrent throughput sorting on random fields in random order (asc/desc)  ~50% performance gain
  • 19. Native Code  The Idea: create native accelerators for CPU hotspots Faceting anyone?  But…. JNI Sucks! (and it’s GC’s fault again) jint *buf= (*env)->GetIntArrayElements(env, arr, 0); for (i=0; i<len; i++) { sum += buf[i];  GetArrayElements() – makes a *copy* of the array!  GetPrimitiveArrayCritical() – blocks garbage collection! Tons of other restrictions… it’s a “critical section”  Defeats the purpose of going to native code in the first place  But… our data is already off-heap, we’re good! }
  • 20. Native Single Valued String Faceting  Top-Level off-heap String cache Improves Sorting and Faceting speed Eliminates FieldCache “insanity”  Native Code Written in C++, compiled with GCC 4.7, 4.8 Currently supports 64 bit Windows, OS-X, Linux (x86) static compilation avoids JVM hotspot warmup period, mis-compilation bugs, and variations between runs
  • 23.
  • 25. Facet Module Goals  Replace the aging “SimpleFacets”  First class JSON support  Easier programmatic construction of complex nested facet commands  Canonical response format that is easier for clients to parse  First class analytics support  Cleaner distributed search support  Fully pluggable  Better base for integration of other search features Heliosearch is a Solr super-set, so you can still chose to use the old faceting or mix-n-match.
  • 26. API Comparison Old Style New JSON API &facet=true &facet.range={!key=age_ranges}age &f.age_ranges.facet.range.start=0 &f.age_ranges.facet.range.end=100 &f.age_ranges.facet.range.gap=10 &facet.range={!key=price_ranges}price &f.price_ranges.facet.range.start=0 &f.price_ranges.facet.range.end=1000 &f.price_ranges.facet.range.gap=50 { age_ranges: { // facet name range: { // facet type field : age, // facet params start : 0, end : 100, gap : 10 } }, price_ranges: { range: { field : price, start : 0, end : 1000, gap : 50 } } }
  • 27. Facet Functions  Sort/Report by things other than “count” Aggregation Functions / Stats: count sum(function) avg(function) sumsq(function) min(function) max(function) unique(string_field) any “function query” that yields a numeric value! Example: sum(mul(num_units, unit_price))  Stats are calculated “per bucket”  Buckets created by Query, Range, or Terms (field) facets
  • 28. Simple Request + Response $ curl http://localhost:8983/solr/query -d 'q=widgets& json.facet= { // Comments can help with clarity /* traditional C-style comments are also supported */ x : "avg(price)" , // Simple strings can occur unquoted y : 'unique(brand)' // Strings can also use single quotes } ' […] "facets" : { "count" : 314, "x" : 102.5, "y" : 28 } Number of documents in the facet bucket
  • 29. Terms Facet Example json.facet={ shoes:{ terms:{ field: shoe_style, sort: {x : desc}, facet:{ x : "avg(price)", y : "unique(brand)" } } } } "facets": { "count" : 472, "shoes": { "buckets" : [ { "val" : "Hiking", "count" : 34, "x" : 135.25, "y" : 17, }, { "val" : "Running", "count" : 45, "x" : 110.75, "y" : 24, }, Executed per-bucket
  • 30. Sub-Facets Any facet that produces buckets can have sub-facets (terms/field, range, query) Sub-facets can have facet functions (stats) or their own sub-facets (no limit to nesting). A subfacet can be any type (field, range, query) Multiple subfacets can be added to any given facet Subfacets are first-class facets - can be configured independently like any other facet. Different offsets, limits, stats, sorts, etc
  • 31. Sub-Facet Example json.facet={ shoes:{ terms:{ field: shoe_style, sort: {x : desc}, facet:{ x : "avg(price)", y : "unique(brand)", colors :{terms:color} } } } } "facets": { "count" : 472, "shoes": { "buckets" : [ { "val" : "Hiking", "count" : 34, "x" : 135.25, "y" : 17, "colors" : { "buckets" : [ { "val" : "brown", "count" : 12 }, { "val" : "black", "count" : 10 }, […] ] } // end of colors sub-facet }, // end of Hiking bucket { "val" : "Running", "count" : 45, "x" : 110.75, "y" : 24, "colors" : { "buckets" : […] Short-form for terms facet simply specifies the field. Sorts buckets by count descending.
  • 32. Terms Facet Terms facet creates buckets of docs with the same value in a field - field – The field name to facet over. - offset – Used for paging, this skips the first N buckets. Defaults to 0. - limit – Limits the number of buckets returned. Defaults to 10. - mincount – Only return buckets with a count of at least this number. Defaults to 1. - sort – Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc” - missing – A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false. - numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false. - allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false. - prefix – Only produce buckets for terms starting with the specified prefix.
  • 33. Query Facet Query facet creates a single bucket of documents matching the query. { // simple example highpop:{ query:{ q:"inStock:true AND popularity[8 TO 10]" } } } { // example with multiple sub-facets highpop:{ query:{ q : "inStock:true AND popularity[8 TO 10]", facet : { average_price : "agv(price)", available_colors : { terms : color }, price_ranges : { range : { field:price, start:0, end:200, gap:10 }} }} }
  • 34. Range Facet Creates buckets over ranges on a numeric or date field Parameter names/values "in sync" with Solr range parameters: field – The numeric field or date field to produce range buckets from start – Lower bound of the ranges end – Upper bound of the ranges gap – Size of each range bucket produced hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, the last bucket will be “gap” wide, which may extend past “end”. other – This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for… – "before" all records with field values lower then lower bound of the first range – "after" all records with field values greater then the upper bound of the last range – "between" all records with field values between the start and end bounds of all ranges – "none" compute none of this information – "all" shortcut for before, between, and after include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options… – "lower" all gap based ranges include their lower bound – "upper" all gap based ranges include their upper bound – "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified – "outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries. – "all" shorthand for lower, upper, edge, outer
  • 35. Sub-Facets + Facet-Functions = Business Intelligence / Analytics
  • 36. Fantasy ($1045) Top Authors $423 George R.R. Martin $347 Brandon Sanderson $155 JK Rowling Top Books $252 A Game of Thrones $113 Emperor of Thorns $101 Nine Princes in Amber $82 Steel Heart Sci-Fi ($898) Top Authors $321 Iain M Banks $218 Neal Asher $155 Neal Stephenson Top Books $113 Gridlinked $101 Use of Weapons $93 Snow Crash $82 The Skinner Mystery ($645) Top Authors $191 James Patterson $145 Patricia Cornwell $126 John Grisham Top Books $85 One for the Money $77 Angels & Daemons $64 Shutter Island $35 The Firm Filter By State $852 NJ (14 stores) $658 NY (11 stores) $421 CT (8 stores) Chain $984 Amazoon (14 stores) $734 Houses&Royalty (9 stores) $387 Books-r-us (7 stores) Store $108 Amazoon Branchburg $93 Books-r-us Bridgewater $87 H&R NYC Number of Books Chain 201K Houses&Royalty 183K Amazoon 98K Books-r-us Store 193K H&R NYC 77K Books-r-us Bridgewater 68K Amazoon Branchburg
  • 37. date_breakout : { range: { field: sale_date, start : ..., end : ..., gap : "+1MONTH”, facet : { top_genre : { terms : { field : genre, sort : "revenue desc", limit : 4, facet : { revenue : "sum(sales)" } }}, by_chain: { terms : { field : chain, facet : { revenue : "sum(sales)" } }} […] Implementation Creates series of facet buckets based on date For each date bucket, facet by genre, taking the top 4 by revenue For each genre bucket, report revenue
  • 38. Fantasy ($1045) Top Authors $423 George R.R. Martin $347 Brandon Sanderson $155 JK Rowling Top Books $252 A Game of Thrones $113 Emperor of Thorns $101 Nine Princes in Amber $82 Steel Heart Sci-Fi ($898) Top Authors $321 Iain M Banks $218 Neal Asher $155 Neal Stephenson Top Books $113 Gridlinked $101 Use of Weapons $93 Snow Crash $82 The Skinner Mystery ($645) Top Authors $191 James Patterson $145 Patricia Cornwell $126 John Grisham Top Books $85 One for the Money $77 Angels & Daemons $64 Shutter Island $35 The Firm top_genres:{ terms:{ field: genre, facet : { rev : "sum(sales)", top_authors:{ terms:{ field : author, sort :"rev desc", limit : 3, facet : { rev : "sum(sales)" } }}, top_books:{ terms:{ field : title, sort : "rev desc", limit : 4, facet : { rev : "sum(sales)" } }} […]
  • 39. Filter By State $852 NJ (14 stores) $658 NY (11 stores) $421 CT (8 stores) Chain $984 Amazoon (14 stores) $734 Houses&Royalty (9 stores) $387 Books-r-us (7 stores) Store $108 Amazoon Branchburg $93 Books-r-us Bridgewater $87 H&R NYC state_breakout:{ terms:{ field: state, sort: "rev desc", facet : { rev : "sum(sales)", num_stores : "unique(store)" }}, chain_breakout:{ terms:{ field: chain, sort: "rev desc", facet : { rev : "sum(sales)", num_stores : "unique(store)" }} , store_breakout:{ terms:{ field: store, sort: "rev desc", facet : { rev : "sum(sales)", }}}
  • 41. Parameter Substitution  Parameters / macros substituted across whole request  Happens before any parsing, so usable in any context q=price:[ ${low} TO ${high} ] &low=100 &high=200  Default values q=price:[ ${low:0} TO ${high:100} ]  Nested q=${price_query} &price_query=${price_field}:[ ${low} TO ${high} ] AND inStock:true &price_field=specialPrice &low=50 &high=100
  • 42. New Query Parser Features  Filters in queries - just like “fq” parameters, but may appear anywhere in a query q=(text:elephant –(filter(*:* -price:[ 0 TO 100 ]) OR filter(date[0 TO 2013]) )  Constant Score Queries q=color:(blue OR green)^=1 text:shoes  Comments in Queries (can nest) q=+text:elephant /* the main query */ /* boosting part – WIP {!func}mul(pop,rank)^10 */
  • 43. Thank You Help Develop the Next Generation of Solr! Resources:  http://heliosearch.org  https://github.com/Heliosearch/heliosearch  https://groups.google.com/forum/#!forum/heliosearch  https://groups.google.com/forum/#!forum/heliosearch-dev twitter.com/lucene_solr