SlideShare a Scribd company logo
1 of 33
Download to read offline
Solr @ Zvents: 6 years later
Amit Nithianandan
Lead Engineer – Search and Analytics
About Me
My “Street Cred”
• Joined Zvents in Aug 2008 as member of search
engineering team.
– Knew nothing about Solr/Lucene (Lucene.. Isn’t that a
misspelling of the Safeway brand milk?)
• Worked on small features early on
– New ranking configuration for “hot tickets” module on site.
• Worked on larger initiatives
– Multiple re-writes of the federated search component
– Recent upgrade to Solr 4.0
• Contribute to community
– Authored a few articles/blog posts, most notable regarding
running Solr in Eclipse.
– Wrote Chrome extension for easily editing long(Solr) API URLs
Overview
• About Zvents
• Why Solr?
• Search @ Zvents Details
• Federated Search discussion
• Integration with external data stores
• Development/Deployment
• Operations/Performance Details
About Zvents
• Helps people find fun things to do since 2005!
• Content sourced from a variety of places:
– Normal end users
– Internal content editors
– External content editors @ local newspapers
– Feeds
• Powers the events guide section of hundreds
of local newspaper sites around the nation.
Technologies used (including but not
limited to).
Why Solr?
• Flexible, Powerful, Customizable
• RESTful query API
• Scales reasonably well without hassle.
• Fast and easy to get started given the samples.
• Strong and active community
– Mailing list amazing. Conferences and meetups
help too 
Zvents Search at a quick glance…
• 2 Masters/10 Slaves  Not sharded.
• Solr 4.x running on Jetty
• Six cores – Five host actual data, sixth used for
federated search
• Federated search among eight different
document types (i.e. venue, restaurants,
movies…)
• Total number of documents ~5M documents
• We allow blank text (“what”) searches so people can
look for stuff based on date and location
• How to surface the most relevant things to do?
11
Search Challenges
Document Design Notes
• Venues, Artists are as you would expect
• For movies, index each showtime
pk = {theater_id, movie id, time} triple.
– When searching, filter by location, collapse on the
movie_id sort by time asc
• For events, index each occurrence (time).
– When searching, collapse on a sequence_id sort by
time asc to show the most recent upcoming event.
• Avoid showing visual “duplicates”
Request Flow
Zvents Search Service API
• Essentially Solr API with a few changes.
• ServletFilter and custom QueryComponent
used to translate URL parameters to proper
Solr parameter “syntax”
– E.g. latitude/longitude/radius converted to
geospatial query and distance in km.
• Federated search executed using ThreadPool
– Parallel searches, results blended together.
Sample Query
http://localhost:8983/map_prod/select?qt=zven
ts&trim=1&
start=0&start_spn=0&rows_spn=6&rows=10&zs
ort=0&rcity=San+Francisco&latitude=37.7752&l
ongitude=-
122.419&radius=75.0&category=event,event_sp
n,venue&sd=201212190000&fq:event:=has_city
:true&fq:event_spn:=has_city:true&wt=ruby&q
=the%20fillmore&fl=id,name,score&facet=true&
indent=on Category specific fq parameter
Lat/Long Params
Collapse results (grouping)
Sample Response (abbr)
{
'organic'=>{
'response'=>{'numFound'=>379,'start'=>0,'maxScore'=>75.485054,'doc
s'=>[…]
},
'facet_counts'=>{
'facet_fields'=>{
'category'=>{
'event'=>67,
'venue'=>312}}},
'sponsored'=>{
'response'=>{'numFound'=>0,'start'=>0,'maxScore'=>0.0,'docs'=>[ …
]
},
'facet_counts'=>{
'facet_fields'=>{
'category'=>{
'event_spn'=>0}}}
}
Federated Search
Federated Search (notice movies +
events mixed)
Federated Search (cont’d)
• Zvents federator component executes multiple concurrent searches
and blends the results.
• Raw score meaningless across products so scores must be
normalized so that across products they mean something.
• Division by max to yield 0-1 scale throws out the score
distribution differences
• We chose to use the Z score (score – avg)/stddev.
• Getting stats like average and standard deviation on the results not
trivial.
• Initially thought to hack the handler to put my own
collector/scorer
PostFilter to the rescue!
• PostFilters allow you to (as the name suggests) execute filtering
logic *after* the main query and all other filters have executed.
• Lucene filters + main query execute in parallel in a leap-frog
manner. Some filters (i.e. filter by distance to user) are
expensive to generate up front for all documents.
• You can create a delegate Collector to optionally call
“super.collect()” if some condition is true.
• Since now I am at the lowest level of Lucene effectively
(Collector/Scorer), I can store distribution information about the
scores as they pass through the collector and custom scorer!
Example Result Snippet
<lst name="score_stats">
<float name="min">1.3786081E-6</float>
<float name="max">10.416486</float>
<float name="avg">1.8479956</float>
<float name="stdDev">1.544854</float>
<long name="numDocs">561</long>
<float name="sumSquaredScores">3254.7324</float>
<float name="sumScores">1036.7256</float>
</lst>
Federated Search – Victory!
• Now the federator, when executing the product specific searches, can extract
this information to produce a “normal” score.
• Results from different products can be blended based on how good individual
results are relative to their peers.
Ranking/Filtering using (highly) volatile
data…
• Store data in field, re-index document
constantly with updated field value
• Atomic updates?  Solr 4.0 feature
– Claim ignorance here. Don’t know performance
impacts nor usage.
• Use functions/FunctionQuery + pseudo-fields
– Instead of indexed click field, use clk() function.
• Use PostFilter to support filtering of
documents based on this volatile data
Solr + External Data Store == Sweet!
Log
Processing
Jetty
Container
Solr Functions
pull volatile data
from EhCache
Example:
log(clk(EVENT,sequence_id))
Separate thread updates
EhCache from
Hypertable
Filtering events based on ticket
availability
Example: &fq={!ticket_filter idField=id}
Ticket availability
publisher
EHCache
Publishes ticket
information via
AMQP
Jetty
Cache stores:
{Event_id=>ticket_count}
1) Fetch ticket
information.
2) Filter out
document if
ticket_count ==0
id
0
1
2
4
3
1245
Solr PostFilter
5678
Development and Deployment
Production Environment
• Java 1.7
• Quad Core 2.8 GHz
• 10 GB RAM
– 8GB dedicated to JVM heap.
• All provisioned as VMs on VMWare ESX servers.
– Significantly simplifies cluster growth. Simply add
servers and go!
• 10 Slaves, 2 Masters
– From configuration standpoint, masters == slave except
masters have 4GB JVM heap instead of 8GB.
Solr Project Configuration
• Maven based
– Treat Solr as dependency *not* as application.
• Other dependencies specified in POM,
bundled into war during assembly phase.
• Build tarball that is pushed to Nexus
– Tarball contains configuration scripts + Jetty jar
etc.
• Bundle Jetty with the app for all in one
deployment.
Advantages of using Maven
• Solr version upgrades as simple as increasing
dependency version in pom.xml.
– Of course run tests before deploy!
• All dependencies managed by pom.xml and
bundled into deployment artifact
– No management of classpath via solrconfig.xml
• Take advantage of standard release
management practices. Everything self
contained.
Deployment via Capistrano
• Capistrano- Framework/Utility for executing
commands in parallel via SSH on multiple
servers
(https://github.com/capistrano/capistrano)
• Capistrano-Nexus Gem- Zvents built gem to
deploy a tarball hosted on a Nexus server out
to staging/production.
Examples
• Staging/Development Deploy:
– mvn deploy
– RELEASE=“2.10-SNAPSHOT” cap staging deploy
• Production Deploy:
– mvn release:prepare
– mvn relesae:perform
– RELEASE=“2.10” cap production deploy
Monitoring- NewRelic
Monitoring- NewRelic (cont’d)
CONTACT
Amit Nithianandan
Anithian-at-gmail.com

More Related Content

What's hot

Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaLucidworks
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2Rafał Kuć
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 

What's hot (20)

Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr 4
Solr 4Solr 4
Solr 4
 
it's just search
it's just searchit's just search
it's just search
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 

Viewers also liked

Implementing search with solr at 7digital
Implementing search with solr at 7digitalImplementing search with solr at 7digital
Implementing search with solr at 7digitallucenerevolution
 
Brahe mass scale flexible indexing
Brahe   mass scale flexible indexingBrahe   mass scale flexible indexing
Brahe mass scale flexible indexinglucenerevolution
 
Indexing Big Data on Amazon AWS
Indexing Big Data on Amazon AWSIndexing Big Data on Amazon AWS
Indexing Big Data on Amazon AWSlucenerevolution
 
Hacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and ProfitHacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and Profitlucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Trickslucenerevolution
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexingUmesh Prasad
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & howlucenerevolution
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution
 

Viewers also liked (20)

Implementing search with solr at 7digital
Implementing search with solr at 7digitalImplementing search with solr at 7digital
Implementing search with solr at 7digital
 
Brahe mass scale flexible indexing
Brahe   mass scale flexible indexingBrahe   mass scale flexible indexing
Brahe mass scale flexible indexing
 
Indexing Big Data on Amazon AWS
Indexing Big Data on Amazon AWSIndexing Big Data on Amazon AWS
Indexing Big Data on Amazon AWS
 
Hacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and ProfitHacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and Profit
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Lucandra
LucandraLucandra
Lucandra
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexing
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
The Typed Index
The Typed IndexThe Typed Index
The Typed Index
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 

Similar to Solr at zvents 6 years later & still going strong

Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014nkabra
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchPeter Steenbergen
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction abenyeung1
 

Similar to Solr at zvents 6 years later & still going strong (20)

Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Solr at zvents 6 years later & still going strong

  • 1. Solr @ Zvents: 6 years later Amit Nithianandan Lead Engineer – Search and Analytics
  • 2.
  • 4. My “Street Cred” • Joined Zvents in Aug 2008 as member of search engineering team. – Knew nothing about Solr/Lucene (Lucene.. Isn’t that a misspelling of the Safeway brand milk?) • Worked on small features early on – New ranking configuration for “hot tickets” module on site. • Worked on larger initiatives – Multiple re-writes of the federated search component – Recent upgrade to Solr 4.0 • Contribute to community – Authored a few articles/blog posts, most notable regarding running Solr in Eclipse. – Wrote Chrome extension for easily editing long(Solr) API URLs
  • 5. Overview • About Zvents • Why Solr? • Search @ Zvents Details • Federated Search discussion • Integration with external data stores • Development/Deployment • Operations/Performance Details
  • 6.
  • 7. About Zvents • Helps people find fun things to do since 2005! • Content sourced from a variety of places: – Normal end users – Internal content editors – External content editors @ local newspapers – Feeds • Powers the events guide section of hundreds of local newspaper sites around the nation.
  • 8. Technologies used (including but not limited to).
  • 9. Why Solr? • Flexible, Powerful, Customizable • RESTful query API • Scales reasonably well without hassle. • Fast and easy to get started given the samples. • Strong and active community – Mailing list amazing. Conferences and meetups help too 
  • 10. Zvents Search at a quick glance… • 2 Masters/10 Slaves  Not sharded. • Solr 4.x running on Jetty • Six cores – Five host actual data, sixth used for federated search • Federated search among eight different document types (i.e. venue, restaurants, movies…) • Total number of documents ~5M documents
  • 11. • We allow blank text (“what”) searches so people can look for stuff based on date and location • How to surface the most relevant things to do? 11 Search Challenges
  • 12. Document Design Notes • Venues, Artists are as you would expect • For movies, index each showtime pk = {theater_id, movie id, time} triple. – When searching, filter by location, collapse on the movie_id sort by time asc • For events, index each occurrence (time). – When searching, collapse on a sequence_id sort by time asc to show the most recent upcoming event. • Avoid showing visual “duplicates”
  • 14. Zvents Search Service API • Essentially Solr API with a few changes. • ServletFilter and custom QueryComponent used to translate URL parameters to proper Solr parameter “syntax” – E.g. latitude/longitude/radius converted to geospatial query and distance in km. • Federated search executed using ThreadPool – Parallel searches, results blended together.
  • 17. Federated Search Federated Search (notice movies + events mixed)
  • 18. Federated Search (cont’d) • Zvents federator component executes multiple concurrent searches and blends the results. • Raw score meaningless across products so scores must be normalized so that across products they mean something. • Division by max to yield 0-1 scale throws out the score distribution differences • We chose to use the Z score (score – avg)/stddev. • Getting stats like average and standard deviation on the results not trivial. • Initially thought to hack the handler to put my own collector/scorer
  • 19. PostFilter to the rescue! • PostFilters allow you to (as the name suggests) execute filtering logic *after* the main query and all other filters have executed. • Lucene filters + main query execute in parallel in a leap-frog manner. Some filters (i.e. filter by distance to user) are expensive to generate up front for all documents. • You can create a delegate Collector to optionally call “super.collect()” if some condition is true. • Since now I am at the lowest level of Lucene effectively (Collector/Scorer), I can store distribution information about the scores as they pass through the collector and custom scorer!
  • 20. Example Result Snippet <lst name="score_stats"> <float name="min">1.3786081E-6</float> <float name="max">10.416486</float> <float name="avg">1.8479956</float> <float name="stdDev">1.544854</float> <long name="numDocs">561</long> <float name="sumSquaredScores">3254.7324</float> <float name="sumScores">1036.7256</float> </lst>
  • 21. Federated Search – Victory! • Now the federator, when executing the product specific searches, can extract this information to produce a “normal” score. • Results from different products can be blended based on how good individual results are relative to their peers.
  • 22. Ranking/Filtering using (highly) volatile data… • Store data in field, re-index document constantly with updated field value • Atomic updates?  Solr 4.0 feature – Claim ignorance here. Don’t know performance impacts nor usage. • Use functions/FunctionQuery + pseudo-fields – Instead of indexed click field, use clk() function. • Use PostFilter to support filtering of documents based on this volatile data
  • 23. Solr + External Data Store == Sweet! Log Processing Jetty Container Solr Functions pull volatile data from EhCache Example: log(clk(EVENT,sequence_id)) Separate thread updates EhCache from Hypertable
  • 24. Filtering events based on ticket availability Example: &fq={!ticket_filter idField=id} Ticket availability publisher EHCache Publishes ticket information via AMQP Jetty Cache stores: {Event_id=>ticket_count} 1) Fetch ticket information. 2) Filter out document if ticket_count ==0 id 0 1 2 4 3 1245 Solr PostFilter 5678
  • 26. Production Environment • Java 1.7 • Quad Core 2.8 GHz • 10 GB RAM – 8GB dedicated to JVM heap. • All provisioned as VMs on VMWare ESX servers. – Significantly simplifies cluster growth. Simply add servers and go! • 10 Slaves, 2 Masters – From configuration standpoint, masters == slave except masters have 4GB JVM heap instead of 8GB.
  • 27. Solr Project Configuration • Maven based – Treat Solr as dependency *not* as application. • Other dependencies specified in POM, bundled into war during assembly phase. • Build tarball that is pushed to Nexus – Tarball contains configuration scripts + Jetty jar etc. • Bundle Jetty with the app for all in one deployment.
  • 28. Advantages of using Maven • Solr version upgrades as simple as increasing dependency version in pom.xml. – Of course run tests before deploy! • All dependencies managed by pom.xml and bundled into deployment artifact – No management of classpath via solrconfig.xml • Take advantage of standard release management practices. Everything self contained.
  • 29. Deployment via Capistrano • Capistrano- Framework/Utility for executing commands in parallel via SSH on multiple servers (https://github.com/capistrano/capistrano) • Capistrano-Nexus Gem- Zvents built gem to deploy a tarball hosted on a Nexus server out to staging/production.
  • 30. Examples • Staging/Development Deploy: – mvn deploy – RELEASE=“2.10-SNAPSHOT” cap staging deploy • Production Deploy: – mvn release:prepare – mvn relesae:perform – RELEASE=“2.10” cap production deploy