SlideShare a Scribd company logo
Scaling ElasticSearch

      SF Meetup
      2012.10.03


                       Sushant Shankar
                   sushant.shankar@33across.com
Agenda
•   Why we need a search engine
•   Monitoring
•   Index Building
•   Query Performance
Who is asdfas
>600,000 Publishers
Machine Learning and Graph algorithms to:
- Build advertising segments
- Extract insights out of social and interest data
- Target via high-performance distributed systems that
  integrate with our advertising partners




Website | Facebook | Twitter
Why we really need a search engine
         Batch! Good for complicated tasks
         (Machine Learning, Graph Algorithms, etc.)




                          …                           …
INDEX BUILDING
       1 WEEK → 3 HOURS
Mappers to build index

                        6 nodes, 24GB RAM
                        16GB for ES service
                        4 cores
                        3x 1.5TB drive

                        >1TB/index
         Build index
                        (replicated)
         using MR job
                        ~300M documents
         and Bulk API
                        ~5KB / document
                        ~3 hours
Monitoring: Zabbix
Monitoring: SPM
Parameter Optimization
Amount bulk indexed




                      Time taken
                       CPU util.
                       Mem util.
                        Disk I/O
                       Network



                                   # Shards
Index Building: Learnings
• Bulk API
• No replicas
• 2 shards / CPU
• 10,000 documents (users) per indexing
  request
• Refresh off (index.refresh_interval = -1)
QUERY PERFORMANCE
     5 MINUTES  10 SECONDS
Query Performance: Learnings
•   1-2 Replicas (and for reliability)
•   Turn refresh on again (5s default)
•   Warm up effect (Index Warm up API 0.20+)
•   Optimize API
•   Simulate multiple users
Warm Up: load into memory and cache
Other cool features
• Custom Scoring functions
• Scripts – MVEL, Python
• Facets

•   Exploring:
•   Real-time indexing
•   Indexing images, files, etc.
•   Parent-child relationships
QUERIES?
Index Building over time

More Related Content

What's hot

OSOM - Operations in the Cloud
OSOM - Operations in the CloudOSOM - Operations in the Cloud
OSOM - Operations in the Cloud
Marcela Oniga
 
Libcloud presentation
Libcloud presentationLibcloud presentation
Libcloud presentation
Christophe Marchal
 
Amazon Web Services lection 5
Amazon Web Services lection 5  Amazon Web Services lection 5
Amazon Web Services lection 5
Binary Studio
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
Erin Shellman
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithms
Simon Belak
 
Transducing for fun and profit
Transducing for fun and profitTransducing for fun and profit
Transducing for fun and profit
Simon Belak
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations Slides
Sebastian Werner
 
Scaling out Rails with MySQL
Scaling out Rails with MySQLScaling out Rails with MySQL
Scaling out Rails with MySQL
freels
 
Amazon web services
Amazon web servicesAmazon web services
Amazon web servicestsaiscorpio
 
Serverless by examples and case studies
Serverless by examples and case studiesServerless by examples and case studies
Serverless by examples and case studies
CodeOps Technologies LLP
 
Amazon Web Services lection 6
Amazon Web Services lection 6  Amazon Web Services lection 6
Amazon Web Services lection 6
Binary Studio
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
Valery Tkachenko
 
Become Thanos of the LambdaLand: Wield all the Infinity Stones
Become Thanos of the LambdaLand: Wield all the Infinity StonesBecome Thanos of the LambdaLand: Wield all the Infinity Stones
Become Thanos of the LambdaLand: Wield all the Infinity Stones
Srushith Repakula
 
Graphite
GraphiteGraphite
Graphite
David Lutz
 
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Shubham Tagra
 
Zentral QueryCon 2018
Zentral QueryCon 2018Zentral QueryCon 2018
Zentral QueryCon 2018
Henry Stamerjohann
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Sumit Maheshwari
 
Hyperloglog Lightning Talk
Hyperloglog Lightning TalkHyperloglog Lightning Talk
Hyperloglog Lightning Talk
Simon Prickett
 
Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWSSoren Macbeth
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
Tristan Roddis
 

What's hot (20)

OSOM - Operations in the Cloud
OSOM - Operations in the CloudOSOM - Operations in the Cloud
OSOM - Operations in the Cloud
 
Libcloud presentation
Libcloud presentationLibcloud presentation
Libcloud presentation
 
Amazon Web Services lection 5
Amazon Web Services lection 5  Amazon Web Services lection 5
Amazon Web Services lection 5
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithms
 
Transducing for fun and profit
Transducing for fun and profitTransducing for fun and profit
Transducing for fun and profit
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations Slides
 
Scaling out Rails with MySQL
Scaling out Rails with MySQLScaling out Rails with MySQL
Scaling out Rails with MySQL
 
Amazon web services
Amazon web servicesAmazon web services
Amazon web services
 
Serverless by examples and case studies
Serverless by examples and case studiesServerless by examples and case studies
Serverless by examples and case studies
 
Amazon Web Services lection 6
Amazon Web Services lection 6  Amazon Web Services lection 6
Amazon Web Services lection 6
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
 
Become Thanos of the LambdaLand: Wield all the Infinity Stones
Become Thanos of the LambdaLand: Wield all the Infinity StonesBecome Thanos of the LambdaLand: Wield all the Infinity Stones
Become Thanos of the LambdaLand: Wield all the Infinity Stones
 
Graphite
GraphiteGraphite
Graphite
 
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
 
Zentral QueryCon 2018
Zentral QueryCon 2018Zentral QueryCon 2018
Zentral QueryCon 2018
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Hyperloglog Lightning Talk
Hyperloglog Lightning TalkHyperloglog Lightning Talk
Hyperloglog Lightning Talk
 
Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWS
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
 

Similar to SF ElasticSearch Meetup 2012.10.03

SF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - MonitoringSF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - Monitoring
Sushant Shankar
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
Lars Marius Garshol
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Jie Li
 
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
MongoDB
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
Apache MXNet
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Databricks
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
ObjectRocket
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
MongoDB
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch Service
Amazon Web Services
 
Performance Monitoring for the Cloud - Java2Days 2017
Performance Monitoring for the Cloud - Java2Days 2017Performance Monitoring for the Cloud - Java2Days 2017
Performance Monitoring for the Cloud - Java2Days 2017
Werner Keil
 
Lots of facets, fast
Lots of facets, fastLots of facets, fast
Lots of facets, fast
BeyondTrees
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
Amazon Web Services
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman
 

Similar to SF ElasticSearch Meetup 2012.10.03 (20)

SF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - MonitoringSF ElasticSearch Meetup 2013.04.06 - Monitoring
SF ElasticSearch Meetup 2013.04.06 - Monitoring
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch Service
 
Performance Monitoring for the Cloud - Java2Days 2017
Performance Monitoring for the Cloud - Java2Days 2017Performance Monitoring for the Cloud - Java2Days 2017
Performance Monitoring for the Cloud - Java2Days 2017
 
Lots of facets, fast
Lots of facets, fastLots of facets, fast
Lots of facets, fast
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

SF ElasticSearch Meetup 2012.10.03

  • 1. Scaling ElasticSearch SF Meetup 2012.10.03 Sushant Shankar sushant.shankar@33across.com
  • 2. Agenda • Why we need a search engine • Monitoring • Index Building • Query Performance
  • 3. Who is asdfas >600,000 Publishers Machine Learning and Graph algorithms to: - Build advertising segments - Extract insights out of social and interest data - Target via high-performance distributed systems that integrate with our advertising partners Website | Facebook | Twitter
  • 4. Why we really need a search engine Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.) … …
  • 5. INDEX BUILDING 1 WEEK → 3 HOURS
  • 6. Mappers to build index 6 nodes, 24GB RAM 16GB for ES service 4 cores 3x 1.5TB drive >1TB/index Build index (replicated) using MR job ~300M documents and Bulk API ~5KB / document ~3 hours
  • 9. Parameter Optimization Amount bulk indexed Time taken CPU util. Mem util. Disk I/O Network # Shards
  • 10. Index Building: Learnings • Bulk API • No replicas • 2 shards / CPU • 10,000 documents (users) per indexing request • Refresh off (index.refresh_interval = -1)
  • 11. QUERY PERFORMANCE 5 MINUTES  10 SECONDS
  • 12. Query Performance: Learnings • 1-2 Replicas (and for reliability) • Turn refresh on again (5s default) • Warm up effect (Index Warm up API 0.20+) • Optimize API • Simulate multiple users
  • 13. Warm Up: load into memory and cache
  • 14. Other cool features • Custom Scoring functions • Scripts – MVEL, Python • Facets • Exploring: • Real-time indexing • Indexing images, files, etc. • Parent-child relationships

Editor's Notes

  1. Collect information over 1B users internationally – text copied from over 600K publisher sites, images, searches, pages visitedDifferent slices of data – now!