Solr at zvents 6 years later & still going strong

Solr @ Zvents: 6 years later
Amit Nithianandan
Lead Engineer – Search and Analytics

My “Street Cred”
• Joined Zvents in Aug 2008 as member of search
engineering team.
– Knew nothing about Solr/Lucene (Lucene.. Isn’t that a
misspelling of the Safeway brand milk?)
• Worked on small features early on
– New ranking configuration for “hot tickets” module on site.
• Worked on larger initiatives
– Multiple re-writes of the federated search component
– Recent upgrade to Solr 4.0
• Contribute to community
– Authored a few articles/blog posts, most notable regarding
running Solr in Eclipse.
– Wrote Chrome extension for easily editing long(Solr) API URLs

Overview
• About Zvents
• Why Solr?
• Search @ Zvents Details
• Federated Search discussion
• Integration with external data stores
• Development/Deployment
• Operations/Performance Details

About Zvents
• Helps people find fun things to do since 2005!
• Content sourced from a variety of places:
– Normal end users
– Internal content editors
– External content editors @ local newspapers
– Feeds
• Powers the events guide section of hundreds
of local newspaper sites around the nation.

Technologies used (including but not
limited to).

Why Solr?
• Flexible, Powerful, Customizable
• RESTful query API
• Scales reasonably well without hassle.
• Fast and easy to get started given the samples.
• Strong and active community
– Mailing list amazing. Conferences and meetups
help too 

Zvents Search at a quick glance…
• 2 Masters/10 Slaves  Not sharded.
• Solr 4.x running on Jetty
• Six cores – Five host actual data, sixth used for
federated search
• Federated search among eight different
document types (i.e. venue, restaurants,
movies…)
• Total number of documents ~5M documents

• We allow blank text (“what”) searches so people can
look for stuff based on date and location
• How to surface the most relevant things to do?
11
Search Challenges

Document Design Notes
• Venues, Artists are as you would expect
• For movies, index each showtime
pk = {theater_id, movie id, time} triple.
– When searching, filter by location, collapse on the
movie_id sort by time asc
• For events, index each occurrence (time).
– When searching, collapse on a sequence_id sort by
time asc to show the most recent upcoming event.
• Avoid showing visual “duplicates”

Zvents Search Service API
• Essentially Solr API with a few changes.
• ServletFilter and custom QueryComponent
used to translate URL parameters to proper
Solr parameter “syntax”
– E.g. latitude/longitude/radius converted to
geospatial query and distance in km.
• Federated search executed using ThreadPool
– Parallel searches, results blended together.

Sample Query
http://localhost:8983/map_prod/select?qt=zven
ts&trim=1&
start=0&start_spn=0&rows_spn=6&rows=10&zs
ort=0&rcity=San+Francisco&latitude=37.7752&l
ongitude=-
122.419&radius=75.0&category=event,event_sp
n,venue&sd=201212190000&fq:event:=has_city
:true&fq:event_spn:=has_city:true&wt=ruby&q
=the%20fillmore&fl=id,name,score&facet=true&
indent=on Category specific fq parameter
Lat/Long Params
Collapse results (grouping)

Sample Response (abbr)
{
'organic'=>{
'response'=>{'numFound'=>379,'start'=>0,'maxScore'=>75.485054,'doc
s'=>[…]
},
'facet_counts'=>{
'facet_fields'=>{
'category'=>{
'event'=>67,
'venue'=>312}}},
'sponsored'=>{
'response'=>{'numFound'=>0,'start'=>0,'maxScore'=>0.0,'docs'=>[ …
]
},
'facet_counts'=>{
'facet_fields'=>{
'category'=>{
'event_spn'=>0}}}
}

Federated Search
Federated Search (notice movies +
events mixed)

Federated Search (cont’d)
• Zvents federator component executes multiple concurrent searches
and blends the results.
• Raw score meaningless across products so scores must be
normalized so that across products they mean something.
• Division by max to yield 0-1 scale throws out the score
distribution differences
• We chose to use the Z score (score – avg)/stddev.
• Getting stats like average and standard deviation on the results not
trivial.
• Initially thought to hack the handler to put my own
collector/scorer

PostFilter to the rescue!
• PostFilters allow you to (as the name suggests) execute filtering
logic *after* the main query and all other filters have executed.
• Lucene filters + main query execute in parallel in a leap-frog
manner. Some filters (i.e. filter by distance to user) are
expensive to generate up front for all documents.
• You can create a delegate Collector to optionally call
“super.collect()” if some condition is true.
• Since now I am at the lowest level of Lucene effectively
(Collector/Scorer), I can store distribution information about the
scores as they pass through the collector and custom scorer!

Example Result Snippet
<lst name="score_stats">
<float name="min">1.3786081E-6</float>
<float name="max">10.416486</float>
<float name="avg">1.8479956</float>
<float name="stdDev">1.544854</float>
<long name="numDocs">561</long>
<float name="sumSquaredScores">3254.7324</float>
<float name="sumScores">1036.7256</float>
</lst>

Federated Search – Victory!
• Now the federator, when executing the product specific searches, can extract
this information to produce a “normal” score.
• Results from different products can be blended based on how good individual
results are relative to their peers.

Ranking/Filtering using (highly) volatile
data…
• Store data in field, re-index document
constantly with updated field value
• Atomic updates?  Solr 4.0 feature
– Claim ignorance here. Don’t know performance
impacts nor usage.
• Use functions/FunctionQuery + pseudo-fields
– Instead of indexed click field, use clk() function.
• Use PostFilter to support filtering of
documents based on this volatile data

Solr + External Data Store == Sweet!
Log
Processing
Jetty
Container
Solr Functions
pull volatile data
from EhCache
Example:
log(clk(EVENT,sequence_id))
Separate thread updates
EhCache from
Hypertable

Filtering events based on ticket
availability
Example: &fq={!ticket_filter idField=id}
Ticket availability
publisher
EHCache
Publishes ticket
information via
AMQP
Jetty
Cache stores:
{Event_id=>ticket_count}
1) Fetch ticket
information.
2) Filter out
document if
ticket_count ==0
id
0
1
2
4
3
1245
Solr PostFilter
5678

Production Environment
• Java 1.7
• Quad Core 2.8 GHz
• 10 GB RAM
– 8GB dedicated to JVM heap.
• All provisioned as VMs on VMWare ESX servers.
– Significantly simplifies cluster growth. Simply add
servers and go!
• 10 Slaves, 2 Masters
– From configuration standpoint, masters == slave except
masters have 4GB JVM heap instead of 8GB.

Solr Project Configuration
• Maven based
– Treat Solr as dependency *not* as application.
• Other dependencies specified in POM,
bundled into war during assembly phase.
• Build tarball that is pushed to Nexus
– Tarball contains configuration scripts + Jetty jar
etc.
• Bundle Jetty with the app for all in one
deployment.

Advantages of using Maven
• Solr version upgrades as simple as increasing
dependency version in pom.xml.
– Of course run tests before deploy!
• All dependencies managed by pom.xml and
bundled into deployment artifact
– No management of classpath via solrconfig.xml
• Take advantage of standard release
management practices. Everything self
contained.

Deployment via Capistrano
• Capistrano- Framework/Utility for executing
commands in parallel via SSH on multiple
servers
(https://github.com/capistrano/capistrano)
• Capistrano-Nexus Gem- Zvents built gem to
deploy a tarball hosted on a Nexus server out
to staging/production.

Examples
• Staging/Development Deploy:
– mvn deploy
– RELEASE=“2.10-SNAPSHOT” cap staging deploy
• Production Deploy:
– mvn release:prepare
– mvn relesae:perform
– RELEASE=“2.10” cap production deploy

Monitoring- NewRelic (cont’d)

CONTACT
Amit Nithianandan
Anithian-at-gmail.com

Solr at zvents 6 years later & still going strong

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Solr at zvents 6 years later & still going strong

Similar to Solr at zvents 6 years later & still going strong (20)

More from lucenerevolution

More from lucenerevolution (20)

Recently uploaded

Recently uploaded (20)

Solr at zvents 6 years later & still going strong