Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Configuring Elasticsearch For 
Performance and Scale 
Based on the knowledge gained after 
attending elasticsearch webinar...
Contents 
 The Elasticsearch Open Source Model 
 The Popularity of Elasticsearch 
 Insights across The Guardian 
 Opha...
The Elasticsearch Open Source Model
The Popularity of Elasticsearch 
10M downloads in 2 years and counting..
Insights across the Guardian 
• A large portion of The Guardian’s business relies on 
Elasticsearch to understand how thei...
Ophan - The real time analytics tool created by the 
Guardian based on elasticsearch
Datadog’s Elasticsearch Story 
• Elasticsearch is used as Datadog’s primary data store for 
events/logs. 
• Before elastic...
First version of elasticsearch cluster in Datadog 
• One node per AZ (availability zone) handling HTTP and data. 
• One la...
Faster and more scalable cluster 
• Split cluster into head and data nodes. 
• Head nodes act as a load balancer, acceptin...
What Datadog’s engineers learned?? 
• Give some planning time to sizing before setting on data format. 
– With a bit of pl...
How Datadog’s event dashboards look like..
How Datadog’s event dashboards look like..
How Datadog’s event dashboards look like.. 
Provides ability to 
write comments 
over events by 
mentioning peers.
How Datadog’s event dashboards look like..
Elasticsearch use @ Captora 
• Captora is the first marketing cloud solution to automatically 
expand and optimize the mar...
Elasticsearch use @ Captora 
@captora Elasticsearch is primarily used for 
• Indexing all textual data (i.e. crawled multi...
Captora’s Dashboard
Captora’s Architecture
Poll Time 
(Based on the votes by webinar attendees)
Thank You
Upcoming SlideShare
Loading in …5
×

Configuring elasticsearch for performance and scale

The contents are based on the vast experience shared by the experts from the industries like The Guardian, Datadog, Captora and elasticsearch itself.

Configuring elasticsearch for performance and scale

  1. 1. Configuring Elasticsearch For Performance and Scale Based on the knowledge gained after attending elasticsearch webinar on 30th September 2014 Prepared By: Bharvi Narayan Dixit Software Engineer, Orkash Services Pvt. Ltd.
  2. 2. Contents  The Elasticsearch Open Source Model  The Popularity of Elasticsearch  Insights across The Guardian  Ophan - The real time analytics tool  Datadog’s Elasticsearch Story  How Datadog’s event dashboards look like  Elasticsearch use @ Captora  Captora dashboard and it’s architecture  Webinar Poll for type of infrastructures used for elasticsearch
  3. 3. The Elasticsearch Open Source Model
  4. 4. The Popularity of Elasticsearch 10M downloads in 2 years and counting..
  5. 5. Insights across the Guardian • A large portion of The Guardian’s business relies on Elasticsearch to understand how their content is being consumed. • Before Ophan, guardian used a traditional analytics package which had a four-hour lag and that is too with so many restrictions. • ~40M documents is processed per day and 360M documents can be easily queried. • Real-Time traffic analysis of each content, which enables the organization to see the audience engagement. • Easy scaling the cluster (Adding more capacity) whenever there is any stress on elasticsearch because of any new feature.
  6. 6. Ophan - The real time analytics tool created by the Guardian based on elasticsearch
  7. 7. Datadog’s Elasticsearch Story • Elasticsearch is used as Datadog’s primary data store for events/logs. • Before elasticsearch Postgres was being used. • Event data is always structured with flexibility of adding/removing fields as needed. • Hundreds of millions of full-text events across 12+ indices. • ~10M documents/day. Doubling the volume every 4-5 months.
  8. 8. First version of elasticsearch cluster in Datadog • One node per AZ (availability zone) handling HTTP and data. • One large index storing all events from all time. • Writing to a pool of all nodes in the cluster. • Worked well for 1-1.5 years.
  9. 9. Faster and more scalable cluster • Split cluster into head and data nodes. • Head nodes act as a load balancer, accepting the HTTP requests. • Data nodes just interact with head and data nodes. • Use a rolling index with one month of event data each.
  10. 10. What Datadog’s engineers learned?? • Give some planning time to sizing before setting on data format. – With a bit of planning, they could have avoided migrating to a rolling index later on. – But you can’t plan for everything, so architect deployments, with migration in mind. • Monitor your elasticsearch cluster from the beginning. • Creating tooling around backup and restore should almost be in your first deployment
  11. 11. How Datadog’s event dashboards look like..
  12. 12. How Datadog’s event dashboards look like..
  13. 13. How Datadog’s event dashboards look like.. Provides ability to write comments over events by mentioning peers.
  14. 14. How Datadog’s event dashboards look like..
  15. 15. Elasticsearch use @ Captora • Captora is the first marketing cloud solution to automatically expand and optimize the marketing campaigns to engage and convert thousands of new future buyers. • It provides an approach of Adaptive Marketing, market discovery, engagement, and convert new buyers by intelligently and automatically scaling content-driven campaigns across multiple channels (search, advertising, and social). • Read more at http://www.captora.com/technology/
  16. 16. Elasticsearch use @ Captora @captora Elasticsearch is primarily used for • Indexing all textual data (i.e. crawled multi-channel content streams, user generated documents etc.) • Power the textual search, rankings, and relevant calculation of the content recommendation engine. • Power the user portal search of the content stream. Elasticsearch stats @captora • Mostly semi-structured data (i.e. web-pages, white-papers, meta data of videos from YouTube, LinkedIn updates, blogs, Tweets etc.) • ~200M documents, ~300GB of data. • Partitioned across ~1200 indices, 2300 shards, with replication factor of 4. • 6 EC2 nodes (c3.2xlarge, provisioned SSD), two AWS availability zones, ELB balanced. • Index rate: 10 to 500 requests/Sec. • Query rate: 100 to 2000 requests/Sec.
  17. 17. Captora’s Dashboard
  18. 18. Captora’s Architecture
  19. 19. Poll Time (Based on the votes by webinar attendees)
  20. 20. Thank You

×