SOLR Power FTW: short version


Published on

Overview of SOLR deployment at Bazaarvoice by @to

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SOLR Power FTW: short version

  1. 1. Solr Power FTW Robby Morgan
  2. 2. What Will I Cover?● Who I am● Why/how we use SOLR● Can SOLR handle 20K queries per second?● Lessons learned: large scale multi data center deployment● Conclusion
  3. 3. Robby Morgan● Software Engineering Lead● 3 yrs experience w/ Solr @ Bazaarvoice● Jack of all trades, master of some :)
  4. 4. Bazaarvoice● Bazaarvoice is a software as a service company powering User Generated Content such as ratings and reviews on thousands of web sites● 5 billion page views per month● 230 billion impressions● 75 million UGC
  5. 5. SOLR Case Study
  6. 6. SOLR Case Study
  7. 7. Life Before SOLR● Indexes for sorting and filtering● Aggregate tables for stats● Nightly jobs● Bugs...
  8. 8. Enter SOLR● Index content and product catalog● De-normalization● Filtering, faceting/stats and sorting● Index every 15 minutes (20 seconds NRT)
  9. 9. SOLR usage @ Bazaarvoice Documents 250 MM Index size 200 GB QPS, avg 2,350 QPS, max 10,200 Response time, avg 12 ms Servers 6+20
  10. 10. SOLR Cloud @ Bazaarvoice● Multiple cores (100+ per server)● Re-balance indexes across cores and servers ○ Automatic ○ Manual● Deployment map stored in MySQL ○ Host - Core - Partition ○ Statistics● Partition lifecycle
  11. 11. Scaling reads - Replication
  12. 12. Replication - Multiple Data Centers
  13. 13. Lessons: SOLR Performance● SOLR loves RAM! SOLR● Simulate and measure ○ Same config, same hardware● Get the most out of one instance
  14. 14. Lessons: Cross-DC ReplicationChatty if using multiple coresRelay ● Core auto-warming disabled ● Connection wait and read timeouts increased ● Replication poll interval increased (15 min) ● Compression enabled...<str name="httpConnTimeout">20000</str><str name="httpReadTimeout">65000</str><str name="pollInterval">00:15:00</str><str name="compression">internal</str>...
  15. 15. Performance Tuning ● Heap size ● Cache sizing ● Auto-warming ● Stored fields ● Merge factor ● Commit frequency ● Optimize frequencyProcess: Simulate and measure ● Replay logs ● Analyze metrics ● Monitor GC
  16. 16. Performance Tuning - GC# Java memory usage settings# Force the NewSize to be larger than the JVM typically allocates.# In practice, the JVM has been allocating an extremely small Young generationwhich objects to be prematurely promoted to the Tenured generationJAVA_MEM_OPTS="-Xms27g -Xmx27g -XX:NewRatio=8"# -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps --> Turn on GC Logging# -XX:+UseConcMarkSweepGC --> Use the concurrent collector# -XX:+CMSIncrementalMode --> Incremental mode for the concurrent collector# -XX:+CMSIncrementalPacing --> Let the JVM adjust the amount of incrementalcollectionJAVA_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=55 -XX:ParallelGCThreads=8 -XX:SurvivorRatio=4"
  17. 17. Lessons: Schema Changes● Re-indexing is time consuming for large indexes● Process: 1. Full re-index off-line prior to the release 2. Incremental indexing after the release● Bottleneck: reading from MySQL● Goal: Transparent on-line re-indexing
  18. 18. Conclusion - SOLR Strengths● Not only full-text search● Lightning fast given enough RAM● Good scale out support including multi-data center● Great community, wide range of use cases
  19. 19. Conclusion - SOLRs Gaps● Not fully elastic● Real time takes work● Secondary data store = sync overhead, inconsistencies● Schema changes
  20. 20. Questions robby.morgan@