Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics


Published on

My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Image: Survey or poll theme
  • Discuss Solr cloud concepts in context of a real-world applicationFeel free to contact me with additional questions (or better yet, post to the Solr user mailing list)
  • Solr is a fundamental part of our infrastructure
  • Not petabyte scale, but we do deep analytics on brand-related data harvested from social networksScreen from our Advocate reporting interface that uses Solr to compute analytics and find signals from brand advocates
  • We use atomic updates, real time GET, custom ValueSources, PostFilter,CloudSolrServer for high throughput indexing from HadoopWe upgraded from Solr 4.1 to 4.2 in production without any downtime. We did a rolling restart and made sure at least one host per shard was online at all times. We did this in between index updatesTechnical details:Use MMapDirectory, keeps our JVM Heaps small(ish)Use small filterCache
  • Solr requires both scaling out and up – you must have fast disks and CPU and lots of RAMNot quite there on simplicity and elasticity but have come a very long way in short order
  • There’s some new terminology and some old concepts, like master/slave don’t apply anymore
  • Zk gives impression that SolrCloud is somehow complex because it uses ZookeeperLow overhead technology once it is setup – we’ve had 100% uptime on Zookeeper (knock on wood)
  • Tended to overshard to allow for growth, but that can be expensiveCurrently you must set the number of shards for a collection when bootstrapping the clusterWill be less of a problem with Solr 4.3 and beyond with shard splittingEven if you have small documents with only a few fields, if you have 100's of millions of these documents, you can benefit from sharding. Think about this in terms of sorting. Imagine a query that matches 10M documents with a custom sort criteria to return documents sorted by date in descending order. Solr has to sort all 10M matching hits just to produce a page size of 10. The sort alone will require roughly 80 megabytes of memory to sort the results. However, if your 100M docs are split across 10 shards, then each shard is sorting roughly 1M docs in parallel. There is a little overhead in that Solr needs to resort the top 10 hits per shard (100) but it should be obvious that sorting 1M documents on 10 nodes in parallel will always be faster than sorting 10M documents on a single node.
  • Remember – all objects in your caches must be invalidated when you open a new searcher using overly large caches and a huge max heap can lead to high GC overhead
  • Smart client knows the current leaders by asking Zk, but doesn’t know which leader to assign the doc to (that is planned though)Node accepting the new document computes a hash on the document ID to determine which shard to assign the doc toNew document is sent to the appropriate shard leader, which sets the _version_ and indexes itLeader saves the update request to durable storage in the update log (tlog)Leader sends the update to all active replicas in parallelCommits – sent around the clusterInteresting question is how are batches of updates from the client handled?
  • In our environment, the query controller selects 18 of 36 possible nodes to queryWarming queries should not be distributed (distrib=false) gets set automatically
  • Quick case study about real time get and atomic updates.
  • We update millions of signals every day using atomic updatesAll fields must be storedSo if on May 2, we want to update the daily count value, we get the document by ID, merge the updated value into the existing list, and then re-index just the updated fields + the _version_ fieldOptimistic locking allows our update code to run whenever with less coordinationWe update other fields on docs from different workflows at different timesUses the special _version_ field to support optimistic lockingMerge updated daily value into an existing multi-valued field, can’t just append because we compute the daily volume multiple times per day.Doesn’t have to be committed.If _version_ field is set toSolr does>1Versions must match or the update fails1The document must exist<0The document must not exist0No concurrency control desired, existing value is overwritten
  • Work closely with UI team to understand query needs and design them together, this helps avoid creating inefficient queries and helps identify criteria that should be included in warming queries.There's a lot of power and potential to cause problems at scale in queries.
  • Hmmm ... anyone know a sys admin that dresses like that?
  • We upgraded from 4.1 to 4.2 without any downtime or manual intervention – all automated
  • Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics

    1. 1. Scaling Solr 4 to Power Big Search in Social MediaAnalyticsTimothy PotterArchitect, Big Data Analytics, Dachis Group / Co-author Solr In Action
    2. 2. ® 2011 Dachis• Anyone running SolrCloud inproduction today?• Who is running pre-Solr 4 version inproduction?• Who has fired up Solr 4.x in SolrCloudmode?• Personal interest – who waspurchased Solr in Action in MEAP?Audience poll
    3. 3. ® 2011 Dachis• Gain insights into the key design decisions you needto make when using Solr cloudWish I knew back then ...• Solr 4 feature overview in context• Zookeeper• Distributed indexing• Distributed search• Real-time GET• Atomic updates• A day in the life ...• Day-to-day operations• What happens if you lose a node?Goals of this talk
    4. 4. ® 2011 Dachis Group.dachisgroup.comOur business intelligence platform analyzes relationships, behaviors, andconversations between 30,000 brands and 100M social accounts every 15 minutes.About Dachis Group
    5. 5. ® 2011 Dachis
    6. 6. ® 2011 Dachis• In production on 4.2.0• 18 shards ~ 33M docs / shard, 25GB on disk per shard• Multiple collections• ~620 Million docs in main collection (still growing)• ~100 Million docs in 30-day collection• Inherent Parent / Child relationships (tweet and re-tweets)• ~5M atomic updates to existing docs per day• Batch-oriented updates• Docs come in bursts from Hadoop; 8,000 docs/sec• 3-4M new documents per day (deletes too)• Business Intelligence UI, low(ish) query volumeSolution Highlights
    7. 7. ® 2011 Dachis• ScalabilityScale-out: sharding and replicationA little scale-up too: Fast disks (SSD), lots of RAM!• High-availabilityRedundancy: multiple replicas per shardAutomated fail-over: automated leader election• ConsistencyDistributed queries must return consistent resultsAccepted writes must be on durable storage• Simplicity - wipSelf-healing, easy to setup and maintain,able to troubleshoot• Elasticity - wipAdd more replicas per shard at any timeSplit large shards into two smaller onesPillars of my ideal search solution
    8. 8. ® 2011 Dachis Group.dachisgroup.comNuts and BoltsNice tag cloud!
    9. 9. ® 2011 Dachis Group.dachisgroup.com1. Zookeeper needs at least 3 nodes to establish quorum with faulttolerance. Embedded is only for evaluation purposes, you need todeploy a stand-alone ensemble for production2. Every Solr core creates ephemeral “znodes” in Zookeeper whichautomatically disappear if the Solr process crashes3. Zookeeper pushes notifications to all registered “watchers” when aznode changes; Solr caches cluster state1. Zookeeper provides “recipes” for solving common problems facedwhen building distributed systems, e.g. leader election2. Zookeeper provides centralized configuration distribution, leaderelection, and cluster state notificationsZookeeper in a nutshell
    10. 10. ® 2011 Dachis• Number and size of indexed fields• Number of documents• Update frequency• Query complexity• Expected growth• BudgetNumber of shards?Yay for shard splitting in 4.3 (SOLR-3755)!
    11. 11. ® 2011 Dachis Group.dachisgroup.comWe use Uwe Schindler’s advice on 64-bit Linux:<directoryFactory name="DirectoryFactory"class="${solr.directoryFactory:solr.MMapDirectoryFactory}"/>See: -Xmx4g ...(hint: rest of our RAM goes to the OS to load index in memory mapped I/O)Small cache sizes with aggressive eviction – spread GC penalty out over time vs. all at once every timeyou open a new searcher<filterCache class="solr.LFUCache" size="50"initialSize="50" autowarmCount="25"/>Index Memory Management
    12. 12. ® 2011 Dachis• Not a master• Leader is a replica (handles queries)• Accepts update requests for the shard• Increments the _version_ on the new orupdated doc• Sends updates (in parallel) to allreplicasLeader = Replica + Addl’ Work
    13. 13. ® 2011 Dachis Group.dachisgroup.comDon’t let your tlog’s get too big – use “hard” commits with openSearcher=“false”Distributed IndexingView of cluster state from ZkShard 1LeaderNode 1 Node 2Shard 2LeaderShard 2ReplicaShard 1ReplicaZookeeperCloudSolrServer“smart client”Hash on docID123Set the _version_tlogtlogGet URLs of current leaders?452 shards with 1 replica each<autoCommit><maxDocs>10000</maxDocs><maxTime>60000</maxTime><openSearcher>false</openSearcher></autoCommit>8,000 docs / secto 18 shards
    14. 14. ® 2011 Dachis Group.dachisgroup.comSend query request to any nodeTwo-stage process1. Query controller sends query to allshards and merges resultsOne host per shard must be onlineor queries fail2. Query controller sends 2nd query toall shards with documents in themerged result set to get requestedfieldsSolr client applications built for 3.x donot need to change (our query code stilluses SolrJ 3.6)LimitationsJOINs / Grouping need custom hashingDistributed searchView of cluster state from ZkShard 1LeaderNode 1 Node 2Shard 2LeaderShard 2ReplicaShard 1ReplicaZookeeperCloudSolrServer13q=*:*Get URLs of all live nodes42Query controllerOr just a load balancer works tooget fields
    15. 15. ® 2011 Dachis Group.dachisgroup.comSearch by daily activity volumeDrive analysisthat measuresthe impact ofa social messageover time ...Company postsa tweet on Monday,how much activityaround that messageon Thursday?
    16. 16. ® 2011 Dachis Group.dachisgroup.comProblem: Find all documents that had activity on a specific day• tweets that had retweets or YouTube videos that had comments• Use Solr join support to find parent documents by matching on child criteriafq=_val_:"{!join from=echo_grouping_id_s to=id}day_tdt:[2013-05-01T00:00:00ZTO 2013-05-02T00:00:00Z}" ...... But, joins don’t work in distributed queries and is probably too slow anywaySolution: Index daily activity into multi-valued fields. Use real-time GET to lookupdocument by ID to get the current daily volume fieldsfq:daily_volume_tdtm(2013-05-02’)sort=daily_vol(daily_volume_s,2013-04-01,2013-05-01)+descdaily_volume_tdtm: [2013-05-01, 2013-05-02] <= doc has child signals on May 1 and 2daily_volume_ssm: 2013-05-01|99, 2013-05-02|88 <= stored only field, doc had 99 child signals on May 1, 88 on May 2daily_volume_s: 13050288|13050199 <= flattened multi-valued field for sorting using a custom ValueSourceAtomic updates and real-time get
    17. 17. ® 2011 Dachis Group.dachisgroup.comWill it work? Definitely!Search can be addicting to your organization, queries wetested for 6 months ago vs. what we have today are vastlydifferentBuy RAM – OOMs and aggressive garbage collectioncause many issuesGive RAM from ^ to the OS – MMapDirectoryNeed a disaster recovery process in addition to Solr cloudreplication; helps with migrating to new hardware tooUse Jetty ;-)Store all fields! Atomic updates are a life saverLessons learned
    18. 18. ® 2011 Dachis Group.dachisgroup.comSchema will evolve – we thought we understood our data model but have sinceadded at least 10 new fields and deprecated some tooPartition if you can! e.g. 30-day collectionWe dont optimize – segment merging works greatSize your staging environment so that shards have about as many docs and sameresources as prod. I have many more nodes in prod but my staging servers haveroughly the same number of docs per shard, just fewer shards.Don’t be afraid to customize Solr! It’s designed to be customized with plug-ins• ValueSource is very powerful• Check out PostFilters:{!frange l=1 u=1 cost=200 cache=false}imca(53313,employee)Lessons learned cont.
    19. 19. ® 2011 Dachis• Backups.../replication?command=backup&location=/mnt/backups• MonitoringReplicas serving queries?All replicas report same number of docs?Zookeeper healthNew search warm-up time• Configuration update processOur solrconfig.xml changes frequently – see Solr’s• Upgrade Solr process (it’s moving fast right now)• Recover failed replica process• Add new replica• Kill the JVM on OOM (from Mark Miller)-XX:OnOutOfMemoryError=/home/solr/ DevOps Reqts
    20. 20. ® 2011 Dachis Group.dachisgroup.comNodes will crash! (ephemeral znodes)Or, sometimes you just need to restart aJVM (rolling restarts to upgrade)Peer sync via update log (tlog)100 updates else ...Good ol’ Solr replication from leader toreplicaNode recovery
    21. 21. ® 2011 Dachis• Moving to a near real-time streaming model using Storm• Buying more RAM per node• Looking forward to shard splitting as it hasbecome difficult to re-index 600M docs• Re-building the index with DocValues• Weve had shards get out of sync after major failure –resolved it by going back to raw data and doing a key by keycomparison of what we expected to be in the index and re-indexingany missing docs.• Custom hashing to put all docs for a specific brand in the sameshardRoadmap / Futures
    22. 22. ® 2011 Dachis Group.dachisgroup.comIf you find yourself in thissituation, buy more RAM!Obligatory lolcats slide
    23. 23. CONTACTTimothy Potterthelabdude@gmail.comtwitter: @thelabdude