Scaling ElasticSearch      SF Meetup      2012.10.03                       Sushant Shankar                   sushant.shank...
Agenda•   Why we need a search engine•   Monitoring•   Index Building•   Query Performance
Who is asdfas>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights ou...
Why we really need a search engine         Batch! Good for complicated tasks         (Machine Learning, Graph Algorithms, ...
INDEX BUILDING       1 WEEK → 3 HOURS
Mappers to build index                        6 nodes, 24GB RAM                        16GB for ES service                ...
Monitoring: Zabbix
Monitoring: SPM
Parameter OptimizationAmount bulk indexed                      Time taken                       CPU util.                 ...
Index Building: Learnings• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing  request• Refresh ...
QUERY PERFORMANCE     5 MINUTES  10 SECONDS
Query Performance: Learnings•   1-2 Replicas (and for reliability)•   Turn refresh on again (5s default)•   Warm up effect...
Warm Up: load into memory and cache
Other cool features• Custom Scoring functions• Scripts – MVEL, Python• Facets•   Exploring:•   Real-time indexing•   Index...
QUERIES?
Index Building over time
Upcoming SlideShare
Loading in …5
×

SF ElasticSearch Meetup 2012.10.03

756 views
703 views

Published on

Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
756
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Collect information over 1B users internationally – text copied from over 600K publisher sites, images, searches, pages visitedDifferent slices of data – now!
  • SF ElasticSearch Meetup 2012.10.03

    1. 1. Scaling ElasticSearch SF Meetup 2012.10.03 Sushant Shankar sushant.shankar@33across.com
    2. 2. Agenda• Why we need a search engine• Monitoring• Index Building• Query Performance
    3. 3. Who is asdfas>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights out of social and interest data- Target via high-performance distributed systems that integrate with our advertising partnersWebsite | Facebook | Twitter
    4. 4. Why we really need a search engine Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.) … …
    5. 5. INDEX BUILDING 1 WEEK → 3 HOURS
    6. 6. Mappers to build index 6 nodes, 24GB RAM 16GB for ES service 4 cores 3x 1.5TB drive >1TB/index Build index (replicated) using MR job ~300M documents and Bulk API ~5KB / document ~3 hours
    7. 7. Monitoring: Zabbix
    8. 8. Monitoring: SPM
    9. 9. Parameter OptimizationAmount bulk indexed Time taken CPU util. Mem util. Disk I/O Network # Shards
    10. 10. Index Building: Learnings• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing request• Refresh off (index.refresh_interval = -1)
    11. 11. QUERY PERFORMANCE 5 MINUTES  10 SECONDS
    12. 12. Query Performance: Learnings• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users
    13. 13. Warm Up: load into memory and cache
    14. 14. Other cool features• Custom Scoring functions• Scripts – MVEL, Python• Facets• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships
    15. 15. QUERIES?
    16. 16. Index Building over time

    ×