Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics

Scaling Solr 4 to Power Big Search in Social Media
Analytics
Timothy Potter
Architect, Big Data Analytics, Dachis Group / Co-author Solr In Action

® 2011 Dachis Group.
dachisgroup.com
• Anyone running SolrCloud in
production today?
• Who is running pre-Solr 4 version in
production?
• Who has fired up Solr 4.x in SolrCloud
mode?
• Personal interest – who was
purchased Solr in Action in MEAP?
Audience poll

dachisgroup.com
• Gain insights into the key design decisions you need
to make when using Solr cloud
Wish I knew back then ...
• Solr 4 feature overview in context
• Zookeeper
• Distributed indexing
• Distributed search
• Real-time GET
• Atomic updates
• A day in the life ...
• Day-to-day operations
• What happens if you lose a node?
Goals of this talk

dachisgroup.com
Our business intelligence platform analyzes relationships, behaviors, and
conversations between 30,000 brands and 100M social accounts every 15 minutes.
About Dachis Group

dachisgroup.com

dachisgroup.com
• In production on 4.2.0
• 18 shards ~ 33M docs / shard, 25GB on disk per shard
• Multiple collections
• ~620 Million docs in main collection (still growing)
• ~100 Million docs in 30-day collection
• Inherent Parent / Child relationships (tweet and re-tweets)
• ~5M atomic updates to existing docs per day
• Batch-oriented updates
• Docs come in bursts from Hadoop; 8,000 docs/sec
• 3-4M new documents per day (deletes too)
• Business Intelligence UI, low(ish) query volume
Solution Highlights

dachisgroup.com
• Scalability
Scale-out: sharding and replication
A little scale-up too: Fast disks (SSD), lots of RAM!
• High-availability
Redundancy: multiple replicas per shard
Automated fail-over: automated leader election
• Consistency
Distributed queries must return consistent results
Accepted writes must be on durable storage
• Simplicity - wip
Self-healing, easy to setup and maintain,
able to troubleshoot
• Elasticity - wip
Add more replicas per shard at any time
Split large shards into two smaller ones
Pillars of my ideal search solution

dachisgroup.com
Nuts and Bolts
Nice tag cloud wordle.net!

dachisgroup.com
1. Zookeeper needs at least 3 nodes to establish quorum with fault
tolerance. Embedded is only for evaluation purposes, you need to
deploy a stand-alone ensemble for production
2. Every Solr core creates ephemeral “znodes” in Zookeeper which
automatically disappear if the Solr process crashes
3. Zookeeper pushes notifications to all registered “watchers” when a
znode changes; Solr caches cluster state
1. Zookeeper provides “recipes” for solving common problems faced
when building distributed systems, e.g. leader election
2. Zookeeper provides centralized configuration distribution, leader
election, and cluster state notifications
Zookeeper in a nutshell

dachisgroup.com
• Number and size of indexed fields
• Number of documents
• Update frequency
• Query complexity
• Expected growth
• Budget
Number of shards?
Yay for shard splitting in 4.3 (SOLR-3755)!

dachisgroup.com
We use Uwe Schindler’s advice on 64-bit Linux:
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.MMapDirectoryFactory}"/>
See: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
java -Xmx4g ...
(hint: rest of our RAM goes to the OS to load index in memory mapped I/O)
Small cache sizes with aggressive eviction – spread GC penalty out over time vs. all at once every time
you open a new searcher
<filterCache class="solr.LFUCache" size="50"
initialSize="50" autowarmCount="25"/>
Index Memory Management

dachisgroup.com
• Not a master
• Leader is a replica (handles queries)
• Accepts update requests for the shard
• Increments the _version_ on the new or
updated doc
• Sends updates (in parallel) to all
replicas
Leader = Replica + Addl’ Work

dachisgroup.com
Don’t let your tlog’s get too big – use “hard” commits with openSearcher=“false”
Distributed Indexing
View of cluster state from Zk
Shard 1
Leader
Node 1 Node 2
Shard 2
Leader
Shard 2
Replica
Shard 1
Replica
Zookeeper
CloudSolrServer
“smart client”
Hash on docID
1
2
3
Set the _version_
tlogtlog
Get URLs of current leaders?
4
5
2 shards with 1 replica each
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>60000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
8,000 docs / sec
to 18 shards

dachisgroup.com
Send query request to any node
Two-stage process
1. Query controller sends query to all
shards and merges results
One host per shard must be online
or queries fail
2. Query controller sends 2nd query to
all shards with documents in the
merged result set to get requested
fields
Solr client applications built for 3.x do
not need to change (our query code still
uses SolrJ 3.6)
Limitations
JOINs / Grouping need custom hashing
Distributed search
View of cluster state from Zk
Shard 1
Leader
Node 1 Node 2
Shard 2
Leader
Shard 2
Replica
Shard 1
Replica
Zookeeper
CloudSolrServer
1
3
q=*:*
Get URLs of all live nodes
4
2
Query controller
Or just a load balancer works too
get fields

dachisgroup.com
Search by daily activity volume
Drive analysis
that measures
the impact of
a social message
over time ...
Company posts
a tweet on Monday,
how much activity
around that message
on Thursday?

dachisgroup.com
Problem: Find all documents that had activity on a specific day
• tweets that had retweets or YouTube videos that had comments
• Use Solr join support to find parent documents by matching on child criteria
fq=_val_:"{!join from=echo_grouping_id_s to=id}day_tdt:[2013-05-01T00:00:00Z
TO 2013-05-02T00:00:00Z}" ...
... But, joins don’t work in distributed queries and is probably too slow anyway
Solution: Index daily activity into multi-valued fields. Use real-time GET to lookup
document by ID to get the current daily volume fields
fq:daily_volume_tdtm('2013-05-02’)
sort=daily_vol(daily_volume_s,'2013-04-01','2013-05-01')+desc
daily_volume_tdtm: [2013-05-01, 2013-05-02] <= doc has child signals on May 1 and 2
daily_volume_ssm: 2013-05-01|99, 2013-05-02|88 <= stored only field, doc had 99 child signals on May 1, 88 on May 2
daily_volume_s: 13050288|13050199 <= flattened multi-valued field for sorting using a custom ValueSource
Atomic updates and real-time get

dachisgroup.com
Will it work? Definitely!
Search can be addicting to your organization, queries we
tested for 6 months ago vs. what we have today are vastly
different
Buy RAM – OOMs and aggressive garbage collection
cause many issues
Give RAM from ^ to the OS – MMapDirectory
Need a disaster recovery process in addition to Solr cloud
replication; helps with migrating to new hardware too
Use Jetty ;-)
Store all fields! Atomic updates are a life saver
Lessons learned

dachisgroup.com
Schema will evolve – we thought we understood our data model but have since
added at least 10 new fields and deprecated some too
Partition if you can! e.g. 30-day collection
We don't optimize – segment merging works great
Size your staging environment so that shards have about as many docs and same
resources as prod. I have many more nodes in prod but my staging servers have
roughly the same number of docs per shard, just fewer shards.
Don’t be afraid to customize Solr! It’s designed to be customized with plug-ins
• ValueSource is very powerful
• Check out PostFilters:
{!frange l=1 u=1 cost=200 cache=false}imca(53313,employee)
Lessons learned cont.

dachisgroup.com
• Backups
.../replication?command=backup&location=/mnt/backups
• Monitoring
Replicas serving queries?
All replicas report same number of docs?
Zookeeper health
New search warm-up time
• Configuration update process
Our solrconfig.xml changes frequently – see Solr’s zkCli.sh
• Upgrade Solr process (it’s moving fast right now)
• Recover failed replica process
• Add new replica
• Kill the JVM on OOM (from Mark Miller)
-XX:OnOutOfMemoryError=/home/solr/on_oom.sh
-XX:+HeapDumpOnOutOfMemoryError
Minimum DevOps Reqts

dachisgroup.com
Nodes will crash! (ephemeral znodes)
Or, sometimes you just need to restart a
JVM (rolling restarts to upgrade)
Peer sync via update log (tlog)
100 updates else ...
Good ol’ Solr replication from leader to
replica
Node recovery

dachisgroup.com
• Moving to a near real-time streaming model using Storm
• Buying more RAM per node
• Looking forward to shard splitting as it has
become difficult to re-index 600M docs
• Re-building the index with DocValues
• We've had shards get out of sync after major failure –
resolved it by going back to raw data and doing a key by key
comparison of what we expected to be in the index and re-indexing
any missing docs.
• Custom hashing to put all docs for a specific brand in the same
shard
Roadmap / Futures

dachisgroup.com
If you find yourself in this
situation, buy more RAM!
Obligatory lolcats slide

CONTACT
Timothy Potter
thelabdude@gmail.com
twitter: @thelabdude

Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics

Similar to Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics (20)

Recently uploaded

Recently uploaded (20)

Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics

Editor's Notes