SlideShare a Scribd company logo
1 of 27
0
Joe Alex, Senior Big Data Engineer, Verizon
10/06/2015
1
Managing Security @1M Events/Sec
Introduction
• Senior Big Data Engineer, Tech. Lead @ Verizon
 Managed Security Services
• Using Elasticsearch since ver 0.19
• Aspiring Data Scientist - Who is not ?
• Loves to work with data at scale
2
What we do - Manage Security for our Customers
• Collect Security Logs
• Correlate
• Store
• Index
• Analyze
• Monitor
• Escalate
3
Before Elasticsearch
4
Before Elasticsearch
• Traditional RDBMS won’t scale for
the billions of logs
 filtered logs > events > incidents >
tickets
• All raw Logs were on disks
• Requests from customers took
days, weeks
• No way to search through billions
of Logs
• Advanced analytics not possible
5
http://www.liftoffit.com
After Elasticsearch
• Customers
 have access to all their logs near real-time 
 can search and download their logs through the Portal
 visualize/analyze using Kibana
• Operations
 No more grep through disks 
• Opens up the data for all kind of Analytics and Monitoring
 Anomaly detection
 Real-time alerting
 Advanced monitoring
6
How we do it
7
What we use and some numbers
• Multiple Elasticsearch Clusters
 Search, Data Visualization, Analytics, Forensics
• Largest cluster has 128 Nodes
 Current load about 20 billion docs per day
 Has around 800 billion docs
• Index heavy use case (vs. search heavy)
• Hadoop for long term storage and analytics
• Spark for real-time analytics and monitoring
• Kafka for Queue
• Flume for collectors
8
How we progressed
• Earlier
 Co-located with 28 Hadoop Data nodes
 12 Core, 128GB RAM, 12 X 3TB Disks
 Elasticsearch 0.19
• Later
 Ran 2 Elasticsearch Nodes co-located with Hadoop data nodes
 Effectively 56 Elasticsearch Nodes
• Now
 128 dedicated bare metal boxes for Elasticsearch
 8 core, 64GB RAM, 6 X 1TB Disks
 Elasticsearch 1.5.2 (soon to ver 1.7)
9
Know your environment and data
• ENV
 CPU
 Memory
 I/O
 Network
• Elasticsearch typically runs in to Memory issues before CPU
 Get the CPU – RAM – Disk ratios correct for your env.
 Too much disk storage – ES may not utilize
• For data nodes prefer physical boxes
• For disks – SSD, RAID0, JBOD
10
Know your environment and data
• Data
 Data ingestion rates
 Type of data
o Our docs were mostly 1.5k – 2k, rarely 5k
o 10% of the customers produced 80% of data
o Variety of data
 Volume
11
Storage requirements
• Depends on
 volume
 retention period
 replication factor
 _all
 _source
 analyzed
 doc_values
 _timestamp
12
Things you should change
• change default location of data and logs
• change cluster.name
• avoid multicast use unicast
• discover timeouts adjust per your network
• use mapping/templates
 plan your field types number, date, ipv4
• adjust gateway, discovery, threadpool, recovery settings
• adjust throttling settings
• evaluate breakers
• to analyze or not to
13
Things you should change
14
• JVM Heap set to 50% of available memory
 Leave 50% for OS, page caching
 Elasticsearch/Java tends to have issues after 31GB heap
• Disable _all, _timestamp, _source if you don't need it
• No swap - mlockall: true, vm.swappiness = 0 or 1
• Tune kernel parameters
 file, network, user, process
 vm.max_map_count = 262144
 /sys/kernel/mm/transparent_hugepage/defrag = never
 10G network tweaks
Dedicated Master, Client, Data Nodes
• Master
 Only cluster management (don’t send search or indexing requests)
 3 masters minimum
 Avoid split-brain
• Client
 Coordinators, Aggregation (send all search requests here, will co-ordinate)
 Load balance behind Apache, Nginx, F5 …
• Data nodes
 Indexing, Searches (send all indexing requests direct to data nodes)
• Use Tribe node to search across multiple clusters
15
Effects of shards, replication, indexes on Cluster
• Replication factor
 More replicas – searches faster, but more memory pressure
 We had factor 2 initially, later changed to 1
• Shards
 More shards - better indexing rates, but more memory pressure
 We had 2 per index initially, later as per customer 2 – 35 shards
• Index/Shard sizes
• Number of indexes (one big one, monthly, weekly, daily, hourly …)
• Index naming – performance, access control, data retention, shard size
• Know your data and plan shards and replicas
16
Field data cache
17
• When you do - sorting, facets/aggregation with high cardinality fields
 All unique values are loaded to memory and held on to
 never goes away
• Risks running out of memory
 indices.breaker.fielddata.limit
 indices.fielddata.cache.size
• Use doc_values - writes to a columnar store side of the inverted index
 lives on disk instead of in heap memory (storage, indexing small effect)
 for not_analyzed fields
 default in Elasticsearch 2.0
Indexing
• Use Bulk Indexing
 We use mapreduce, about 60 - 100 reducers do the indexing
 flush size, find your sweet spot (ours is 5000)
 index.refresh_interval: -1
 Transport client - tcp vs http client, tcp slightly faster
 Increase thread pool for bulk and adjust merge speed
• More shards better indexing, but watch cluster
• Watch out for Bulk Rejections and Hotspots
• Index direct to data nodes
• Now es-hadoop available
18
Key items for extremely large clusters
19
• Manage shard sizes and counts (including replicas)
• Hotspots - adjust shards per node
• Some Nodes/disks getting full
 adjust disk.watermark low/high settings
• Disk failures (especially when you have multiple disks, striping)
 remove disk from config and restart Node
• Set replication to 0 and adjust throttling for initial Bulk inserts
• Disable allocation for faster restarts
• Adjust throttling settings for recovery and indexing
• Elasticsearch shard is a Lucene index, max docs 2.1 billion
Watch out for
20
• Use Aliases from Day 1
• _type
 use generic - minimize dynamic updating of mappings
• Template dir., all files will be picked up
• Scripting and Updates a bit slow, use carefully
• Node failures
• Disk failures
• Bulk Rejections
• Network timeouts
• ttl performance issues
Monitor and Stats
21
• Cluster and Node health/stats
• Heap
• Stats: clear view on what is going on in your cluster
 intake volumes, when received at edge, when indexed, index rate
• Lots of APIs available for cluster/node health, stats
• Watch for hotspots – nodes, disks
• Watch for safety trips (from ES 1.4 onwards)
• Nagios, Zabbix, custom
• Housekeeping - Use curator or custom
• Use Marvel, Watcher
Get ready for production
• Difficult to recreate production volumes in Dev/QA
• Plan a buffering or queuing mechanism
• Be ready to Re-index
 We had data in HDFS for a year and in ES for 6 months
• Monitor and Alert
 With hundreds of machines/disks, something is bound to fail
• Stats
 Find bottle necks, Project storage/processing needs
• Sharing a single config for same Node type helps
• Use automation as much as possible – Puppet, Ansible
22
Security & Access control
• Plan index per customer
• Use Aliases
• Control access via APIs
• Use a reverse proxy Apache, Nginx
 Authentication/Authorization
 Client nodes behind proxy
• Now Shield available
23
Tips on Searches
24
• Use Filters, they are cached
• Use match query instead of query_string
• term is not analyzed, match is analyzed
• For large search results – Use Scan search type and Scroll API
Thank You
Questions / Comments
@joealex
25
www.elastic.co
26

More Related Content

What's hot

Introducing SciaaS @ Sanger
Introducing SciaaS @ SangerIntroducing SciaaS @ Sanger
Introducing SciaaS @ SangerPeter Clapham
 
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hopeOracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hopeDataStax
 
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japanHiromitsu Komatsu
 
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...DataStax
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
 
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 

What's hot (20)

Introducing SciaaS @ Sanger
Introducing SciaaS @ SangerIntroducing SciaaS @ Sanger
Introducing SciaaS @ Sanger
 
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hopeOracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
 
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 

Viewers also liked

IKANOW System Architecture Guide
IKANOW System Architecture GuideIKANOW System Architecture Guide
IKANOW System Architecture GuideSholeh Gregory
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
November 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with HadoopNovember 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with HadoopYahoo Developer Network
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchVic Hargrave
 
Leverage Big Data for Security Intelligence
Leverage Big Data for Security Intelligence Leverage Big Data for Security Intelligence
Leverage Big Data for Security Intelligence Stefaan Van daele
 
Big Data Analytics for Cyber Security: A Quick Overview
Big Data Analytics for Cyber Security: A Quick OverviewBig Data Analytics for Cyber Security: A Quick Overview
Big Data Analytics for Cyber Security: A Quick OverviewFemi Ashaye
 
Cyber security government ppt By Vishwadeep Badgujar
Cyber security government  ppt By Vishwadeep BadgujarCyber security government  ppt By Vishwadeep Badgujar
Cyber security government ppt By Vishwadeep BadgujarVishwadeep Badgujar
 
For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...
For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...
For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...Accenture Technology
 

Viewers also liked (8)

IKANOW System Architecture Guide
IKANOW System Architecture GuideIKANOW System Architecture Guide
IKANOW System Architecture Guide
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
November 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with HadoopNovember 2013 HUG: Cyber Security with Hadoop
November 2013 HUG: Cyber Security with Hadoop
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
Leverage Big Data for Security Intelligence
Leverage Big Data for Security Intelligence Leverage Big Data for Security Intelligence
Leverage Big Data for Security Intelligence
 
Big Data Analytics for Cyber Security: A Quick Overview
Big Data Analytics for Cyber Security: A Quick OverviewBig Data Analytics for Cyber Security: A Quick Overview
Big Data Analytics for Cyber Security: A Quick Overview
 
Cyber security government ppt By Vishwadeep Badgujar
Cyber security government  ppt By Vishwadeep BadgujarCyber security government  ppt By Vishwadeep Badgujar
Cyber security government ppt By Vishwadeep Badgujar
 
For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...
For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...
For the CISO: Continuous Cyber Attacks - Achieving Operational Excellence for...
 

Similar to Managing Security At 1M Events a Second using Elasticsearch

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StoryBrian Cline
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxManish Maheshwari
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesPratik Bhat
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Running & Scaling Large Elasticsearch Clusters
Running & Scaling Large Elasticsearch ClustersRunning & Scaling Large Elasticsearch Clusters
Running & Scaling Large Elasticsearch ClustersFred de Villamil
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 

Similar to Managing Security At 1M Events a Second using Elasticsearch (20)

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer Story
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drives
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Running & Scaling Large Elasticsearch Clusters
Running & Scaling Large Elasticsearch ClustersRunning & Scaling Large Elasticsearch Clusters
Running & Scaling Large Elasticsearch Clusters
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 

Recently uploaded

DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneCall girls in Ahmedabad High profile
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 

Recently uploaded (20)

DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 

Managing Security At 1M Events a Second using Elasticsearch

  • 1. 0
  • 2. Joe Alex, Senior Big Data Engineer, Verizon 10/06/2015 1 Managing Security @1M Events/Sec
  • 3. Introduction • Senior Big Data Engineer, Tech. Lead @ Verizon  Managed Security Services • Using Elasticsearch since ver 0.19 • Aspiring Data Scientist - Who is not ? • Loves to work with data at scale 2
  • 4. What we do - Manage Security for our Customers • Collect Security Logs • Correlate • Store • Index • Analyze • Monitor • Escalate 3
  • 6. Before Elasticsearch • Traditional RDBMS won’t scale for the billions of logs  filtered logs > events > incidents > tickets • All raw Logs were on disks • Requests from customers took days, weeks • No way to search through billions of Logs • Advanced analytics not possible 5 http://www.liftoffit.com
  • 7. After Elasticsearch • Customers  have access to all their logs near real-time   can search and download their logs through the Portal  visualize/analyze using Kibana • Operations  No more grep through disks  • Opens up the data for all kind of Analytics and Monitoring  Anomaly detection  Real-time alerting  Advanced monitoring 6
  • 8. How we do it 7
  • 9. What we use and some numbers • Multiple Elasticsearch Clusters  Search, Data Visualization, Analytics, Forensics • Largest cluster has 128 Nodes  Current load about 20 billion docs per day  Has around 800 billion docs • Index heavy use case (vs. search heavy) • Hadoop for long term storage and analytics • Spark for real-time analytics and monitoring • Kafka for Queue • Flume for collectors 8
  • 10. How we progressed • Earlier  Co-located with 28 Hadoop Data nodes  12 Core, 128GB RAM, 12 X 3TB Disks  Elasticsearch 0.19 • Later  Ran 2 Elasticsearch Nodes co-located with Hadoop data nodes  Effectively 56 Elasticsearch Nodes • Now  128 dedicated bare metal boxes for Elasticsearch  8 core, 64GB RAM, 6 X 1TB Disks  Elasticsearch 1.5.2 (soon to ver 1.7) 9
  • 11. Know your environment and data • ENV  CPU  Memory  I/O  Network • Elasticsearch typically runs in to Memory issues before CPU  Get the CPU – RAM – Disk ratios correct for your env.  Too much disk storage – ES may not utilize • For data nodes prefer physical boxes • For disks – SSD, RAID0, JBOD 10
  • 12. Know your environment and data • Data  Data ingestion rates  Type of data o Our docs were mostly 1.5k – 2k, rarely 5k o 10% of the customers produced 80% of data o Variety of data  Volume 11
  • 13. Storage requirements • Depends on  volume  retention period  replication factor  _all  _source  analyzed  doc_values  _timestamp 12
  • 14. Things you should change • change default location of data and logs • change cluster.name • avoid multicast use unicast • discover timeouts adjust per your network • use mapping/templates  plan your field types number, date, ipv4 • adjust gateway, discovery, threadpool, recovery settings • adjust throttling settings • evaluate breakers • to analyze or not to 13
  • 15. Things you should change 14 • JVM Heap set to 50% of available memory  Leave 50% for OS, page caching  Elasticsearch/Java tends to have issues after 31GB heap • Disable _all, _timestamp, _source if you don't need it • No swap - mlockall: true, vm.swappiness = 0 or 1 • Tune kernel parameters  file, network, user, process  vm.max_map_count = 262144  /sys/kernel/mm/transparent_hugepage/defrag = never  10G network tweaks
  • 16. Dedicated Master, Client, Data Nodes • Master  Only cluster management (don’t send search or indexing requests)  3 masters minimum  Avoid split-brain • Client  Coordinators, Aggregation (send all search requests here, will co-ordinate)  Load balance behind Apache, Nginx, F5 … • Data nodes  Indexing, Searches (send all indexing requests direct to data nodes) • Use Tribe node to search across multiple clusters 15
  • 17. Effects of shards, replication, indexes on Cluster • Replication factor  More replicas – searches faster, but more memory pressure  We had factor 2 initially, later changed to 1 • Shards  More shards - better indexing rates, but more memory pressure  We had 2 per index initially, later as per customer 2 – 35 shards • Index/Shard sizes • Number of indexes (one big one, monthly, weekly, daily, hourly …) • Index naming – performance, access control, data retention, shard size • Know your data and plan shards and replicas 16
  • 18. Field data cache 17 • When you do - sorting, facets/aggregation with high cardinality fields  All unique values are loaded to memory and held on to  never goes away • Risks running out of memory  indices.breaker.fielddata.limit  indices.fielddata.cache.size • Use doc_values - writes to a columnar store side of the inverted index  lives on disk instead of in heap memory (storage, indexing small effect)  for not_analyzed fields  default in Elasticsearch 2.0
  • 19. Indexing • Use Bulk Indexing  We use mapreduce, about 60 - 100 reducers do the indexing  flush size, find your sweet spot (ours is 5000)  index.refresh_interval: -1  Transport client - tcp vs http client, tcp slightly faster  Increase thread pool for bulk and adjust merge speed • More shards better indexing, but watch cluster • Watch out for Bulk Rejections and Hotspots • Index direct to data nodes • Now es-hadoop available 18
  • 20. Key items for extremely large clusters 19 • Manage shard sizes and counts (including replicas) • Hotspots - adjust shards per node • Some Nodes/disks getting full  adjust disk.watermark low/high settings • Disk failures (especially when you have multiple disks, striping)  remove disk from config and restart Node • Set replication to 0 and adjust throttling for initial Bulk inserts • Disable allocation for faster restarts • Adjust throttling settings for recovery and indexing • Elasticsearch shard is a Lucene index, max docs 2.1 billion
  • 21. Watch out for 20 • Use Aliases from Day 1 • _type  use generic - minimize dynamic updating of mappings • Template dir., all files will be picked up • Scripting and Updates a bit slow, use carefully • Node failures • Disk failures • Bulk Rejections • Network timeouts • ttl performance issues
  • 22. Monitor and Stats 21 • Cluster and Node health/stats • Heap • Stats: clear view on what is going on in your cluster  intake volumes, when received at edge, when indexed, index rate • Lots of APIs available for cluster/node health, stats • Watch for hotspots – nodes, disks • Watch for safety trips (from ES 1.4 onwards) • Nagios, Zabbix, custom • Housekeeping - Use curator or custom • Use Marvel, Watcher
  • 23. Get ready for production • Difficult to recreate production volumes in Dev/QA • Plan a buffering or queuing mechanism • Be ready to Re-index  We had data in HDFS for a year and in ES for 6 months • Monitor and Alert  With hundreds of machines/disks, something is bound to fail • Stats  Find bottle necks, Project storage/processing needs • Sharing a single config for same Node type helps • Use automation as much as possible – Puppet, Ansible 22
  • 24. Security & Access control • Plan index per customer • Use Aliases • Control access via APIs • Use a reverse proxy Apache, Nginx  Authentication/Authorization  Client nodes behind proxy • Now Shield available 23
  • 25. Tips on Searches 24 • Use Filters, they are cached • Use match query instead of query_string • term is not analyzed, match is analyzed • For large search results – Use Scan search type and Scroll API
  • 26. Thank You Questions / Comments @joealex 25