SlideShare a Scribd company logo
Configuring Elasticsearch For 
Performance and Scale 
Based on the knowledge gained after 
attending elasticsearch webinar on 
30th September 2014 
Prepared By: 
Bharvi Narayan Dixit 
Software Engineer, 
Orkash Services Pvt. Ltd.
Contents 
 The Elasticsearch Open Source Model 
 The Popularity of Elasticsearch 
 Insights across The Guardian 
 Ophan - The real time analytics tool 
 Datadog’s Elasticsearch Story 
 How Datadog’s event dashboards look like 
 Elasticsearch use @ Captora 
 Captora dashboard and it’s architecture 
 Webinar Poll for type of infrastructures used for 
elasticsearch
The Elasticsearch Open Source Model
The Popularity of Elasticsearch 
10M downloads in 2 years and counting..
Insights across the Guardian 
• A large portion of The Guardian’s business relies on 
Elasticsearch to understand how their content is being 
consumed. 
• Before Ophan, guardian used a traditional analytics package 
which had a four-hour lag and that is too with so many 
restrictions. 
• ~40M documents is processed per day and 360M documents 
can be easily queried. 
• Real-Time traffic analysis of each content, which enables the 
organization to see the audience engagement. 
• Easy scaling the cluster (Adding more capacity) whenever there 
is any stress on elasticsearch because of any new feature.
Ophan - The real time analytics tool created by the 
Guardian based on elasticsearch
Datadog’s Elasticsearch Story 
• Elasticsearch is used as Datadog’s primary data store for 
events/logs. 
• Before elasticsearch Postgres was being used. 
• Event data is always structured with flexibility of 
adding/removing fields as needed. 
• Hundreds of millions of full-text events across 12+ indices. 
• ~10M documents/day. Doubling the volume every 4-5 months.
First version of elasticsearch cluster in Datadog 
• One node per AZ (availability zone) handling HTTP and data. 
• One large index storing all events from all time. 
• Writing to a pool of all nodes in the cluster. 
• Worked well for 1-1.5 years.
Faster and more scalable cluster 
• Split cluster into head and data nodes. 
• Head nodes act as a load balancer, accepting the HTTP requests. 
• Data nodes just interact with head and data nodes. 
• Use a rolling index with one month of event data each.
What Datadog’s engineers learned?? 
• Give some planning time to sizing before setting on data format. 
– With a bit of planning, they could have avoided migrating to a rolling index 
later on. 
– But you can’t plan for everything, so architect deployments, with 
migration in mind. 
• Monitor your elasticsearch cluster from the beginning. 
• Creating tooling around backup and restore should almost be in 
your first deployment
How Datadog’s event dashboards look like..
How Datadog’s event dashboards look like..
How Datadog’s event dashboards look like.. 
Provides ability to 
write comments 
over events by 
mentioning peers.
How Datadog’s event dashboards look like..
Elasticsearch use @ Captora 
• Captora is the first marketing cloud solution to automatically 
expand and optimize the marketing campaigns to engage and 
convert thousands of new future buyers. 
• It provides an approach of Adaptive Marketing, market 
discovery, engagement, and convert new buyers by intelligently 
and automatically scaling content-driven campaigns across 
multiple channels (search, advertising, and social). 
• Read more at http://www.captora.com/technology/
Elasticsearch use @ Captora 
@captora Elasticsearch is primarily used for 
• Indexing all textual data (i.e. crawled multi-channel content streams, user 
generated documents etc.) 
• Power the textual search, rankings, and relevant calculation of the content 
recommendation engine. 
• Power the user portal search of the content stream. 
Elasticsearch stats @captora 
• Mostly semi-structured data (i.e. web-pages, white-papers, meta data of videos 
from YouTube, LinkedIn updates, blogs, Tweets etc.) 
• ~200M documents, ~300GB of data. 
• Partitioned across ~1200 indices, 2300 shards, with replication factor of 4. 
• 6 EC2 nodes (c3.2xlarge, provisioned SSD), two AWS availability zones, ELB 
balanced. 
• Index rate: 10 to 500 requests/Sec. 
• Query rate: 100 to 2000 requests/Sec.
Captora’s Dashboard
Captora’s Architecture
Poll Time 
(Based on the votes by webinar attendees)
Thank You

More Related Content

What's hot

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Spark Summit
 

What's hot (20)

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Cascalog
CascalogCascalog
Cascalog
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Scalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchScalable Data Models with Elasticsearch
Scalable Data Models with Elasticsearch
 
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for Analytics
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 

Similar to Configuring elasticsearch for performance and scale

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013
Dipti Borkar
 
Log management with_logstash_and_elastic_search
Log management with_logstash_and_elastic_searchLog management with_logstash_and_elastic_search
Log management with_logstash_and_elastic_search
Rishav Rohit
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
marc_harrison
 

Similar to Configuring elasticsearch for performance and scale (20)

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Overview on elastic search
Overview on elastic searchOverview on elastic search
Overview on elastic search
 
Explore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingExplore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth Using
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptx
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013
 
Factweavers capability document
Factweavers capability documentFactweavers capability document
Factweavers capability document
 
Log management with_logstash_and_elastic_search
Log management with_logstash_and_elastic_searchLog management with_logstash_and_elastic_search
Log management with_logstash_and_elastic_search
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Configuring elasticsearch for performance and scale

  • 1. Configuring Elasticsearch For Performance and Scale Based on the knowledge gained after attending elasticsearch webinar on 30th September 2014 Prepared By: Bharvi Narayan Dixit Software Engineer, Orkash Services Pvt. Ltd.
  • 2. Contents  The Elasticsearch Open Source Model  The Popularity of Elasticsearch  Insights across The Guardian  Ophan - The real time analytics tool  Datadog’s Elasticsearch Story  How Datadog’s event dashboards look like  Elasticsearch use @ Captora  Captora dashboard and it’s architecture  Webinar Poll for type of infrastructures used for elasticsearch
  • 3. The Elasticsearch Open Source Model
  • 4. The Popularity of Elasticsearch 10M downloads in 2 years and counting..
  • 5. Insights across the Guardian • A large portion of The Guardian’s business relies on Elasticsearch to understand how their content is being consumed. • Before Ophan, guardian used a traditional analytics package which had a four-hour lag and that is too with so many restrictions. • ~40M documents is processed per day and 360M documents can be easily queried. • Real-Time traffic analysis of each content, which enables the organization to see the audience engagement. • Easy scaling the cluster (Adding more capacity) whenever there is any stress on elasticsearch because of any new feature.
  • 6. Ophan - The real time analytics tool created by the Guardian based on elasticsearch
  • 7. Datadog’s Elasticsearch Story • Elasticsearch is used as Datadog’s primary data store for events/logs. • Before elasticsearch Postgres was being used. • Event data is always structured with flexibility of adding/removing fields as needed. • Hundreds of millions of full-text events across 12+ indices. • ~10M documents/day. Doubling the volume every 4-5 months.
  • 8. First version of elasticsearch cluster in Datadog • One node per AZ (availability zone) handling HTTP and data. • One large index storing all events from all time. • Writing to a pool of all nodes in the cluster. • Worked well for 1-1.5 years.
  • 9. Faster and more scalable cluster • Split cluster into head and data nodes. • Head nodes act as a load balancer, accepting the HTTP requests. • Data nodes just interact with head and data nodes. • Use a rolling index with one month of event data each.
  • 10. What Datadog’s engineers learned?? • Give some planning time to sizing before setting on data format. – With a bit of planning, they could have avoided migrating to a rolling index later on. – But you can’t plan for everything, so architect deployments, with migration in mind. • Monitor your elasticsearch cluster from the beginning. • Creating tooling around backup and restore should almost be in your first deployment
  • 11. How Datadog’s event dashboards look like..
  • 12. How Datadog’s event dashboards look like..
  • 13. How Datadog’s event dashboards look like.. Provides ability to write comments over events by mentioning peers.
  • 14. How Datadog’s event dashboards look like..
  • 15. Elasticsearch use @ Captora • Captora is the first marketing cloud solution to automatically expand and optimize the marketing campaigns to engage and convert thousands of new future buyers. • It provides an approach of Adaptive Marketing, market discovery, engagement, and convert new buyers by intelligently and automatically scaling content-driven campaigns across multiple channels (search, advertising, and social). • Read more at http://www.captora.com/technology/
  • 16. Elasticsearch use @ Captora @captora Elasticsearch is primarily used for • Indexing all textual data (i.e. crawled multi-channel content streams, user generated documents etc.) • Power the textual search, rankings, and relevant calculation of the content recommendation engine. • Power the user portal search of the content stream. Elasticsearch stats @captora • Mostly semi-structured data (i.e. web-pages, white-papers, meta data of videos from YouTube, LinkedIn updates, blogs, Tweets etc.) • ~200M documents, ~300GB of data. • Partitioned across ~1200 indices, 2300 shards, with replication factor of 4. • 6 EC2 nodes (c3.2xlarge, provisioned SSD), two AWS availability zones, ELB balanced. • Index rate: 10 to 500 requests/Sec. • Query rate: 100 to 2000 requests/Sec.
  • 19. Poll Time (Based on the votes by webinar attendees)