SlideShare a Scribd company logo
‹#›
Pablo Musa - Elastic - @pablitomusa
Amsterdam | April 2-3, 2019
Managing your Black Friday Logs
Elastic Stack
© Elasticsearch BV 2015-2019. All rights reserved.
Elastic Stack
!3
Beats Elasticsearch
Logstash
Kibana
© Elasticsearch BV 2015-2019. All rights reserved.
Beats
!4
Beats Elasticsearch
Logstash
KibanaLog
Files
Metrics
Wire
Data
your{beat}
‣ Data Shipper
‣ Light Weight
‣ Application Side Component
© Elasticsearch BV 2015-2019. All rights reserved.
Logstash
!5
Beats Elasticsearch
Logstash
Kibana
‣ Data Collector/Processor
‣ Server Side Component
Nodes (X)
© Elasticsearch BV 2015-2019. All rights reserved.
Elasticsearch
!6
Beats Elasticsearch
Logstash
Kibana
‣ Data Platform
‣ Really Fast
‣ HTTP + JSON
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
© Elasticsearch BV 2015-2019. All rights reserved.
Kibana
!7
Beats Elasticsearch
Logstash
Kibana
‣ Data Visualization
‣ Stack Configuration
‣ Elastic Stack UI
Instances (X)
Monitoring
Architectures
© Elasticsearch BV 2015-2019. All rights reserved.
The Elastic Journey of an Event
!9
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Kibana
Instances (X)
© Elasticsearch BV 2015-2019. All rights reserved.
The Elastic Journey of an Event
!10
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kibana
Instances (X)
© Elasticsearch BV 2015-2019. All rights reserved.
The Elastic Journey of an Event
!11
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kibana
Instances (X)
© Elasticsearch BV 2015-2019. All rights reserved.
The Elastic Journey of an Event
!12
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kibana
Instances (X)
NotificationQueues Storage Metrics
© Elasticsearch BV 2015-2019. All rights reserved.
The Elastic Journey of an Event
!13
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kafka
Redis
Messaging
Queue
Kibana
Instances (X)
NotificationQueues Storage Metrics
Elasticsearch

Cluster Sizing
© Elasticsearch BV 2015-2019. All rights reserved.
Cluster my_cluster
Server 1
Terminology
!15
Node A
d1
d2
d3
d4
d5
d6
d7
d8d9
d10
d11
d12
Index twitter
d6d3
d2
d5
d1
d4
Index logs
© Elasticsearch BV 2015-2019. All rights reserved.
Cluster my_cluster
Server 1
Partition
!16
Node A
d1
d2
d3
d4
d5
d6
d7
d8d9
d10
d11
d12
Index twitter
d6d3
d2
d5
d1
d4
Index logs
Shards
0
1
4
2
3
0
1
© Elasticsearch BV 2015-2019. All rights reserved.
Cluster my_cluster
Server 1
Node A
Distribution
!17
Server 2
Node Btwitter
shard P4
d1
d2
d6
d5
d10
d12
twitter
shard P2
twitter
shard P1
logs
shard P0
d2
d5
d4
logs
shard P1
d3
d4
d9
d7
d8
d11
twitter
shard P3
twitter
shard P0
d6d3
d1
© Elasticsearch BV 2015-2019. All rights reserved.
Cluster my_cluster
Server 1
Node A
Replication
!18
Server 2
Node Btwitter
shard P4
d1
d2
d6
d5
d10
d12
twitter
shard P2
twitter
shard P1
logs
shard P0
d2
d5
d4
logs
shard P1
d3
d4
d9
d7
d8
d11
twitter
shard P3
twitter
shard P0
twitter
shard R4
d1
d2
d6
d12
twitter
shard R2
d5
d10
twitter
shard R1
d6d3
d1
d6d3
d1
logs
shard R0
d2
d5
d4
logs
shard R1
d3
d4
d9
d7
d8
d11
twitter
shard R3
twitter
shard R0
• Replicas• Primaries
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!19
Data
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!20
Data
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!21
Data
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!22
Big Data
... ...
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!23
Big Data
... ...
• In Elasticsearch, shards are the working unit
‒ More data -> More shards
But how many shards?
© Elasticsearch BV 2015-2019. All rights reserved.
How much data?
• ~1000 events per second
• 60s * 60m * 24h * 1000 events => ~87M events per day
• 1kb per event => ~82GB per day
• 3 months => ~7TB
!24
© Elasticsearch BV 2015-2019. All rights reserved.
Shard Size
• It depends on many different factors
‒ document size, mapping, use case, kinds of queries being
executed, desired response time, peak indexing rate, budget, ...
• After the shard sizing*, each shard should handle 45GB
• Up to 10 shards per machine
!25
* https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
© Elasticsearch BV 2015-2019. All rights reserved.
Cluster my_cluster
How many shards?
• Data size: ~7TB
• Shard Size: ~45GB*
• Total Shards: ~160
!26
3 months of logs
...
• Shards per machine: 10*
• Total Servers: 16
* https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
© Elasticsearch BV 2015-2019. All rights reserved.
But...
• How many indices?
• What do you do if the daily data grows?
• What do you do if you want to delete old data?
!27
© Elasticsearch BV 2015-2019. All rights reserved.
Time-Based Data
• Logs, social media streams, time-based events
• Timestamp + Data
• Do not change
• Typically search for recent events
• Older documents become less important
• Hard to predict the data size
• Time-based indices is the best option
‒ create a new index each day, week, month, year, ...
‒ search the indices you need in the same request
!28
© Elasticsearch BV 2015-2019. All rights reserved.
Daily Indices
!29
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
© Elasticsearch BV 2015-2019. All rights reserved.
Daily Indices
!30
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
d6d3
d2
d5
d1
d4
logs-2016-10-20
© Elasticsearch BV 2015-2019. All rights reserved.
Daily Indices
!31
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
d6d3
d2
d5
d1
d4
logs-2016-10-21
d6d3
d2
d5
d1
d4
logs-2016-10-20
© Elasticsearch BV 2015-2019. All rights reserved.
Templates
• Every new created index starting with 'logs-' will have
‒ 2 shards
‒ 1 replica (for each primary shard)
‒ 60 seconds refresh interval
!32
PUT _template/logs
{
"template": "logs-*",
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"refresh_interval": "60s"
}
}
More on that later
© Elasticsearch BV 2015-2019. All rights reserved.
Read and Write
!33
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
users
Application
logs-write
logs-read
© Elasticsearch BV 2015-2019. All rights reserved.
Read and Write
!34
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
users
Application
logs-write
logs-read
d6d3
d2
d5
d1
d4
logs-2016-10-20
© Elasticsearch BV 2015-2019. All rights reserved.
Read and Write
!35
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
users
Application
logs-write
logs-read
d6d3
d2
d5
d1
d4
logs-2016-10-20
d6d3
d2
d5
d1
d4
logs-2016-10-21
© Elasticsearch BV 2015-2019. All rights reserved.
Read and Write
• Index Wildcard
‒ logs-*
‒ logs-*-10*
• Alias
‒ default_write (6.3+)
• Date Math
‒ logs<now/d>
‒ logs<now/d>, logs<now/d-1d>
!36
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2016-10-19
d6d3
d2
d5
d1
d4
logs-2016-10-20
d6d3
d2
d5
d1
d4
logs-2016-10-21
© Elasticsearch BV 2015-2019. All rights reserved.
Index Lifecycle Management
• Rollover
• ILM UI
!37
d6d3
d2
d5
d1
d4
logs-2016-10-20
d6d3
d2
d5
d1
d4
logs-2016-10-21
POST /logs_write/_rollover
{
"conditions": {
"max_age": "7d",
"max_docs": 1000,
"max_size": "5gb"
}
}
© Elasticsearch BV 2015-2019. All rights reserved.
Cluster my_cluster
Do not Overshard
• 3 different logs
• 1 index per day each
• 1GB each
• 5 shards (default)
• 6 months retention
• ~180GB
• ~900 shards
!38
access-...
d6d3
d2
d5
d1
d4
application-...
d6d5
d9
d5
d1
d7
mysql-...
d10d59
d3
d5
d0
d4
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!39
Big Data
... ...1M users
But what happens if we have 2M users?
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!40
Big Data
... ...1M users
... ...1M users
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!41
Big Data
... ...1M users
... ...1M users
... ...1M users
© Elasticsearch BV 2015-2019. All rights reserved.
Scaling
!42
Big Data
... ...
... ...
... ...
U
s
e
r
s
© Elasticsearch BV 2015-2019. All rights reserved.
Shards are the working unit
• Primaries
‒ More data -> More shards
‒ write throughput (More writes -> More shards)
• Replicas
‒ high availability (1 replica is the default)
‒ read throughput (More reads -> More replicas)
!43
Optimal Bulk Size
© Elasticsearch BV 2015-2019. All rights reserved.
What is Bulk?
!45
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000

log events
Beats
Logstash
Application
1000 index requests
with 1 document
1 bulk request with
1000 documents
© Elasticsearch BV 2015-2019. All rights reserved.
What is the optimal bulk size?
!46
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000

log events
Beats
Logstash
Application
4 * 250?
1 * 1000?
2 * 500?
© Elasticsearch BV 2015-2019. All rights reserved.
It depends...
• on your application (language, libraries, ...)
• document size (100b, 1kb, 100kb, 1mb, ...)
• number of nodes
• node size
• number of shards
• shards distribution
!47
© Elasticsearch BV 2015-2019. All rights reserved.
Test it ;)
!48
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
4000 * 250-> 160s
1000 * 1000-> 155s
2000 * 500-> 164s
© Elasticsearch BV 2015-2019. All rights reserved.
Test it ;)
!49
DATE=`date +%Y.%m.%d`
LOG=logs/logs.txt
exec_test () {
curl -s -XDELETE "http://USER:PASS@HOST:9200/logstash-$DATE"
sleep 10
export SIZE=$1
time cat /home/ubuntu/dataset.txt | ./bin/logstash -f logstash.conf
}
for SIZE in 100 500 1000 3000 5000 10000; do
for i in {1..20}; do
exec_test $SIZE
done; done;
input { stdin{} }
filter {}
output {
elasticsearch {
hosts => ["10.12.145.189"]
flush_size => "${SIZE}"
} }
In Beats set "bulk_max_size"
in the output.elasticsearch
© Elasticsearch BV 2015-2019. All rights reserved.
• 2 node cluster (m3.large)
‒ 2 vCPU, 7.5GB Memory, 1x32GB SSD
• 1 index server (m3.large)
‒ logstash
‒ kibana
Test it ;)
!50
# docs 100 500 1000 3000 5000 10000
time(s) 191.7 161.9 163.5 160.7 160.7 161.5
Distribute the Load
© Elasticsearch BV 2015-2019. All rights reserved.
Avoid Bottlenecks
!52
Elasticsearch
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
single node
Node 1
Node 2
round robin
© Elasticsearch BV 2015-2019. All rights reserved.
Distributing the Load
• Client
• Load Balancer
• Coordinating-only Nodes
!53
© Elasticsearch BV 2015-2019. All rights reserved.
Clients
• Most clients implement round robin
‒ you specify a seed list
‒ the client sniffs the cluster
‒ the client implement different selectors
• Logstash allows an array (no sniffing)
• Beats allows an array (no sniffing)
• Kibana only connects to one single node
!54
© Elasticsearch BV 2015-2019. All rights reserved.
Load Balancer
!55
Elasticsearch
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
LB
Node 2
Node 1
© Elasticsearch BV 2015-2019. All rights reserved.
Coordinating-only Node
!56
Elasticsearch
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
Node 3

co-node
Node 2
Node 1
© Elasticsearch BV 2015-2019. All rights reserved.
• 2 node cluster (m3.large)
‒ 2 vCPU, 7.5GB Memory, 1x32GB SSD
• 1 index server (m3.large)
‒ logstash (round robin configured)
‒ hosts => ["10.12.145.189", "10.121.140.167"]
‒ kibana
Test it ;)
!57
#docs
time(s)
100 500 1000
NO Round Robin 191.7 161.9 163.5
Round Robin 189.7 159.7 159.0
Optimizing Disk IO
© Elasticsearch BV 2015-2019. All rights reserved.
Durability
!59
index a doc
time
lucene flush
buffer
index a doc
buffer
index a doc
buffer
buffer
segment
© Elasticsearch BV 2015-2019. All rights reserved.
Durability
!60
index a doc
time
lucene flush
buffer
segment
trans_log
buffer
trans_log
buffer
trans_log
elasticsearch flush
doc
op
lucene commit
segment
segment
© Elasticsearch BV 2015-2019. All rights reserved.
refresh_interval
• Increase to get better write throughput to an index
• New documents will take more time to be available for search
• Dynamic per-index setting
!61
PUT logstash-2017.05.16/_settings
{
"refresh_interval": "60s"
}
#docs
time(s)
100 500 1000
1s refresh 189.7 159.7 159.0
60s refresh 185.8 152.1 152.6
© Elasticsearch BV 2015-2019. All rights reserved.
Translog fsync every 5s (1.X)
!62
index a doc
buffer
trans_log
doc
op
index a doc
buffer
trans_log
doc
op
Primary
Replica
redundancy doesn’t help if all nodes lose power
© Elasticsearch BV 2015-2019. All rights reserved.
Translog fsync on every request
• For low volume indexing, fsync matters less
• For high volume indexing, we can amortize the costs and
fsync on every bulk
• Concurrent requests can share an fsync
!63
bulk 1
bulk 2
single fsync
© Elasticsearch BV 2015-2019. All rights reserved.
Async Transaction Log
• index.translog.durability
‒ request (default)
‒ async
• index.translog.sync_interval (only if async is set)
• Dynamic per-index settings
• Be careful, you are relaxing the safety guarantees
!64
#docs
time(s)
100 500 1000
Request
fsync
185.8 152.1 152.6
5s sync 154.8 143.2 143.1
Final Remarks
© Elasticsearch BV 2015-2019. All rights reserved.
Final Remarks
!66
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kafka
Redis
Messaging
Queue
Kibana
Instances (X)
NotificationQueues Storage Metrics
© Elasticsearch BV 2015-2019. All rights reserved.
Final Remarks
• Primaries
‒ More data -> More shards
‒ Do not overshard!
• Replicas
‒ high availability (1 replica is the default)
‒ read throughput (More reads -> More replicas)
!67
Big Data
... ...
... ...
... ...
U
s
e
r
s
© Elasticsearch BV 2015-2019. All rights reserved.
Final Remarks
• Bulk and Test
• Distribute the Load
• Refresh Interval
• Async Trans Log (careful)
!68
#docs
time(s)
100 500 1000
Default 191.7 161.9 163.5
RR+60s+Async5s 154.8 143.2 143.1
‹#›
@pablitomusa
Amsterdam | April 2-3, 2019
Thank You!

More Related Content

What's hot

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
Modern Data Stack France
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
kbajda
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
Jason Plurad
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Demi Ben-Ari
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
elephantscale
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
Altinity Ltd
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
Jason Plurad
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
DataWorks Summit
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonNathan Bijnens
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
Zhenxiao Luo
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
Igor Sfiligoi
 
Druid at SF Big Analytics 2015-12-01
Druid at SF Big Analytics 2015-12-01Druid at SF Big Analytics 2015-12-01
Druid at SF Big Analytics 2015-12-01
gianmerlino
 
Kiwi.com Reaches Cruising Altitude with Scylla
Kiwi.com Reaches Cruising Altitude with ScyllaKiwi.com Reaches Cruising Altitude with Scylla
Kiwi.com Reaches Cruising Altitude with Scylla
ScyllaDB
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
Zhenxiao Luo
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark Summit
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
Jason Plurad
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Nathan Bijnens
 

What's hot (19)

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
Druid at SF Big Analytics 2015-12-01
Druid at SF Big Analytics 2015-12-01Druid at SF Big Analytics 2015-12-01
Druid at SF Big Analytics 2015-12-01
 
Kiwi.com Reaches Cruising Altitude with Scylla
Kiwi.com Reaches Cruising Altitude with ScyllaKiwi.com Reaches Cruising Altitude with Scylla
Kiwi.com Reaches Cruising Altitude with Scylla
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
 

Similar to Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019

Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
J On The Beach
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
David Pilato
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
David Pilato
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
Meetup ilm virtual emea
Meetup ilm virtual emeaMeetup ilm virtual emea
Meetup ilm virtual emea
Daliya Spasova
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
C4Media
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
VoltDB
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
Bill Liu
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Databricks
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Clustrix
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
Chen-en Lu
 
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Amazon Web Services
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
DataStax Academy
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
Amazon Web Services
 
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFData Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Amazon Web Services
 

Similar to Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019 (20)

Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Meetup ilm virtual emea
Meetup ilm virtual emeaMeetup ilm virtual emea
Meetup ilm virtual emea
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFData Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SF
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019

  • 1. ‹#› Pablo Musa - Elastic - @pablitomusa Amsterdam | April 2-3, 2019 Managing your Black Friday Logs
  • 3. © Elasticsearch BV 2015-2019. All rights reserved. Elastic Stack !3 Beats Elasticsearch Logstash Kibana
  • 4. © Elasticsearch BV 2015-2019. All rights reserved. Beats !4 Beats Elasticsearch Logstash KibanaLog Files Metrics Wire Data your{beat} ‣ Data Shipper ‣ Light Weight ‣ Application Side Component
  • 5. © Elasticsearch BV 2015-2019. All rights reserved. Logstash !5 Beats Elasticsearch Logstash Kibana ‣ Data Collector/Processor ‣ Server Side Component Nodes (X)
  • 6. © Elasticsearch BV 2015-2019. All rights reserved. Elasticsearch !6 Beats Elasticsearch Logstash Kibana ‣ Data Platform ‣ Really Fast ‣ HTTP + JSON Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X)
  • 7. © Elasticsearch BV 2015-2019. All rights reserved. Kibana !7 Beats Elasticsearch Logstash Kibana ‣ Data Visualization ‣ Stack Configuration ‣ Elastic Stack UI Instances (X)
  • 9. © Elasticsearch BV 2015-2019. All rights reserved. The Elastic Journey of an Event !9 Beats Log Files Metrics Wire Data your{beat} Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) Kibana Instances (X)
  • 10. © Elasticsearch BV 2015-2019. All rights reserved. The Elastic Journey of an Event !10 Beats Log Files Metrics Wire Data your{beat} Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) Logstash Nodes (X) Kibana Instances (X)
  • 11. © Elasticsearch BV 2015-2019. All rights reserved. The Elastic Journey of an Event !11 Beats Log Files Metrics Wire Data your{beat} Data Store Web APIs Social Sensors Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) Logstash Nodes (X) Kibana Instances (X)
  • 12. © Elasticsearch BV 2015-2019. All rights reserved. The Elastic Journey of an Event !12 Beats Log Files Metrics Wire Data your{beat} Data Store Web APIs Social Sensors Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) Logstash Nodes (X) Kibana Instances (X) NotificationQueues Storage Metrics
  • 13. © Elasticsearch BV 2015-2019. All rights reserved. The Elastic Journey of an Event !13 Beats Log Files Metrics Wire Data your{beat} Data Store Web APIs Social Sensors Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) Logstash Nodes (X) Kafka Redis Messaging Queue Kibana Instances (X) NotificationQueues Storage Metrics
  • 15. © Elasticsearch BV 2015-2019. All rights reserved. Cluster my_cluster Server 1 Terminology !15 Node A d1 d2 d3 d4 d5 d6 d7 d8d9 d10 d11 d12 Index twitter d6d3 d2 d5 d1 d4 Index logs
  • 16. © Elasticsearch BV 2015-2019. All rights reserved. Cluster my_cluster Server 1 Partition !16 Node A d1 d2 d3 d4 d5 d6 d7 d8d9 d10 d11 d12 Index twitter d6d3 d2 d5 d1 d4 Index logs Shards 0 1 4 2 3 0 1
  • 17. © Elasticsearch BV 2015-2019. All rights reserved. Cluster my_cluster Server 1 Node A Distribution !17 Server 2 Node Btwitter shard P4 d1 d2 d6 d5 d10 d12 twitter shard P2 twitter shard P1 logs shard P0 d2 d5 d4 logs shard P1 d3 d4 d9 d7 d8 d11 twitter shard P3 twitter shard P0 d6d3 d1
  • 18. © Elasticsearch BV 2015-2019. All rights reserved. Cluster my_cluster Server 1 Node A Replication !18 Server 2 Node Btwitter shard P4 d1 d2 d6 d5 d10 d12 twitter shard P2 twitter shard P1 logs shard P0 d2 d5 d4 logs shard P1 d3 d4 d9 d7 d8 d11 twitter shard P3 twitter shard P0 twitter shard R4 d1 d2 d6 d12 twitter shard R2 d5 d10 twitter shard R1 d6d3 d1 d6d3 d1 logs shard R0 d2 d5 d4 logs shard R1 d3 d4 d9 d7 d8 d11 twitter shard R3 twitter shard R0 • Replicas• Primaries
  • 19. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !19 Data
  • 20. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !20 Data
  • 21. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !21 Data
  • 22. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !22 Big Data ... ...
  • 23. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !23 Big Data ... ... • In Elasticsearch, shards are the working unit ‒ More data -> More shards But how many shards?
  • 24. © Elasticsearch BV 2015-2019. All rights reserved. How much data? • ~1000 events per second • 60s * 60m * 24h * 1000 events => ~87M events per day • 1kb per event => ~82GB per day • 3 months => ~7TB !24
  • 25. © Elasticsearch BV 2015-2019. All rights reserved. Shard Size • It depends on many different factors ‒ document size, mapping, use case, kinds of queries being executed, desired response time, peak indexing rate, budget, ... • After the shard sizing*, each shard should handle 45GB • Up to 10 shards per machine !25 * https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
  • 26. © Elasticsearch BV 2015-2019. All rights reserved. Cluster my_cluster How many shards? • Data size: ~7TB • Shard Size: ~45GB* • Total Shards: ~160 !26 3 months of logs ... • Shards per machine: 10* • Total Servers: 16 * https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
  • 27. © Elasticsearch BV 2015-2019. All rights reserved. But... • How many indices? • What do you do if the daily data grows? • What do you do if you want to delete old data? !27
  • 28. © Elasticsearch BV 2015-2019. All rights reserved. Time-Based Data • Logs, social media streams, time-based events • Timestamp + Data • Do not change • Typically search for recent events • Older documents become less important • Hard to predict the data size • Time-based indices is the best option ‒ create a new index each day, week, month, year, ... ‒ search the indices you need in the same request !28
  • 29. © Elasticsearch BV 2015-2019. All rights reserved. Daily Indices !29 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19
  • 30. © Elasticsearch BV 2015-2019. All rights reserved. Daily Indices !30 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19 d6d3 d2 d5 d1 d4 logs-2016-10-20
  • 31. © Elasticsearch BV 2015-2019. All rights reserved. Daily Indices !31 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19 d6d3 d2 d5 d1 d4 logs-2016-10-21 d6d3 d2 d5 d1 d4 logs-2016-10-20
  • 32. © Elasticsearch BV 2015-2019. All rights reserved. Templates • Every new created index starting with 'logs-' will have ‒ 2 shards ‒ 1 replica (for each primary shard) ‒ 60 seconds refresh interval !32 PUT _template/logs { "template": "logs-*", "settings": { "number_of_shards": 2, "number_of_replicas": 1, "refresh_interval": "60s" } } More on that later
  • 33. © Elasticsearch BV 2015-2019. All rights reserved. Read and Write !33 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19 users Application logs-write logs-read
  • 34. © Elasticsearch BV 2015-2019. All rights reserved. Read and Write !34 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19 users Application logs-write logs-read d6d3 d2 d5 d1 d4 logs-2016-10-20
  • 35. © Elasticsearch BV 2015-2019. All rights reserved. Read and Write !35 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19 users Application logs-write logs-read d6d3 d2 d5 d1 d4 logs-2016-10-20 d6d3 d2 d5 d1 d4 logs-2016-10-21
  • 36. © Elasticsearch BV 2015-2019. All rights reserved. Read and Write • Index Wildcard ‒ logs-* ‒ logs-*-10* • Alias ‒ default_write (6.3+) • Date Math ‒ logs<now/d> ‒ logs<now/d>, logs<now/d-1d> !36 Cluster my_cluster d6d3 d2 d5 d1 d4 logs-2016-10-19 d6d3 d2 d5 d1 d4 logs-2016-10-20 d6d3 d2 d5 d1 d4 logs-2016-10-21
  • 37. © Elasticsearch BV 2015-2019. All rights reserved. Index Lifecycle Management • Rollover • ILM UI !37 d6d3 d2 d5 d1 d4 logs-2016-10-20 d6d3 d2 d5 d1 d4 logs-2016-10-21 POST /logs_write/_rollover { "conditions": { "max_age": "7d", "max_docs": 1000, "max_size": "5gb" } }
  • 38. © Elasticsearch BV 2015-2019. All rights reserved. Cluster my_cluster Do not Overshard • 3 different logs • 1 index per day each • 1GB each • 5 shards (default) • 6 months retention • ~180GB • ~900 shards !38 access-... d6d3 d2 d5 d1 d4 application-... d6d5 d9 d5 d1 d7 mysql-... d10d59 d3 d5 d0 d4
  • 39. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !39 Big Data ... ...1M users But what happens if we have 2M users?
  • 40. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !40 Big Data ... ...1M users ... ...1M users
  • 41. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !41 Big Data ... ...1M users ... ...1M users ... ...1M users
  • 42. © Elasticsearch BV 2015-2019. All rights reserved. Scaling !42 Big Data ... ... ... ... ... ... U s e r s
  • 43. © Elasticsearch BV 2015-2019. All rights reserved. Shards are the working unit • Primaries ‒ More data -> More shards ‒ write throughput (More writes -> More shards) • Replicas ‒ high availability (1 replica is the default) ‒ read throughput (More reads -> More replicas) !43
  • 45. © Elasticsearch BV 2015-2019. All rights reserved. What is Bulk? !45 Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ 1000
 log events Beats Logstash Application 1000 index requests with 1 document 1 bulk request with 1000 documents
  • 46. © Elasticsearch BV 2015-2019. All rights reserved. What is the optimal bulk size? !46 Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ 1000
 log events Beats Logstash Application 4 * 250? 1 * 1000? 2 * 500?
  • 47. © Elasticsearch BV 2015-2019. All rights reserved. It depends... • on your application (language, libraries, ...) • document size (100b, 1kb, 100kb, 1mb, ...) • number of nodes • node size • number of shards • shards distribution !47
  • 48. © Elasticsearch BV 2015-2019. All rights reserved. Test it ;) !48 Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ 1000000
 log events Beats Logstash Application 4000 * 250-> 160s 1000 * 1000-> 155s 2000 * 500-> 164s
  • 49. © Elasticsearch BV 2015-2019. All rights reserved. Test it ;) !49 DATE=`date +%Y.%m.%d` LOG=logs/logs.txt exec_test () { curl -s -XDELETE "http://USER:PASS@HOST:9200/logstash-$DATE" sleep 10 export SIZE=$1 time cat /home/ubuntu/dataset.txt | ./bin/logstash -f logstash.conf } for SIZE in 100 500 1000 3000 5000 10000; do for i in {1..20}; do exec_test $SIZE done; done; input { stdin{} } filter {} output { elasticsearch { hosts => ["10.12.145.189"] flush_size => "${SIZE}" } } In Beats set "bulk_max_size" in the output.elasticsearch
  • 50. © Elasticsearch BV 2015-2019. All rights reserved. • 2 node cluster (m3.large) ‒ 2 vCPU, 7.5GB Memory, 1x32GB SSD • 1 index server (m3.large) ‒ logstash ‒ kibana Test it ;) !50 # docs 100 500 1000 3000 5000 10000 time(s) 191.7 161.9 163.5 160.7 160.7 161.5
  • 52. © Elasticsearch BV 2015-2019. All rights reserved. Avoid Bottlenecks !52 Elasticsearch _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ 1000000
 log events Beats Logstash Application single node Node 1 Node 2 round robin
  • 53. © Elasticsearch BV 2015-2019. All rights reserved. Distributing the Load • Client • Load Balancer • Coordinating-only Nodes !53
  • 54. © Elasticsearch BV 2015-2019. All rights reserved. Clients • Most clients implement round robin ‒ you specify a seed list ‒ the client sniffs the cluster ‒ the client implement different selectors • Logstash allows an array (no sniffing) • Beats allows an array (no sniffing) • Kibana only connects to one single node !54
  • 55. © Elasticsearch BV 2015-2019. All rights reserved. Load Balancer !55 Elasticsearch _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ 1000000
 log events Beats Logstash Application LB Node 2 Node 1
  • 56. © Elasticsearch BV 2015-2019. All rights reserved. Coordinating-only Node !56 Elasticsearch _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ 1000000
 log events Beats Logstash Application Node 3
 co-node Node 2 Node 1
  • 57. © Elasticsearch BV 2015-2019. All rights reserved. • 2 node cluster (m3.large) ‒ 2 vCPU, 7.5GB Memory, 1x32GB SSD • 1 index server (m3.large) ‒ logstash (round robin configured) ‒ hosts => ["10.12.145.189", "10.121.140.167"] ‒ kibana Test it ;) !57 #docs time(s) 100 500 1000 NO Round Robin 191.7 161.9 163.5 Round Robin 189.7 159.7 159.0
  • 59. © Elasticsearch BV 2015-2019. All rights reserved. Durability !59 index a doc time lucene flush buffer index a doc buffer index a doc buffer buffer segment
  • 60. © Elasticsearch BV 2015-2019. All rights reserved. Durability !60 index a doc time lucene flush buffer segment trans_log buffer trans_log buffer trans_log elasticsearch flush doc op lucene commit segment segment
  • 61. © Elasticsearch BV 2015-2019. All rights reserved. refresh_interval • Increase to get better write throughput to an index • New documents will take more time to be available for search • Dynamic per-index setting !61 PUT logstash-2017.05.16/_settings { "refresh_interval": "60s" } #docs time(s) 100 500 1000 1s refresh 189.7 159.7 159.0 60s refresh 185.8 152.1 152.6
  • 62. © Elasticsearch BV 2015-2019. All rights reserved. Translog fsync every 5s (1.X) !62 index a doc buffer trans_log doc op index a doc buffer trans_log doc op Primary Replica redundancy doesn’t help if all nodes lose power
  • 63. © Elasticsearch BV 2015-2019. All rights reserved. Translog fsync on every request • For low volume indexing, fsync matters less • For high volume indexing, we can amortize the costs and fsync on every bulk • Concurrent requests can share an fsync !63 bulk 1 bulk 2 single fsync
  • 64. © Elasticsearch BV 2015-2019. All rights reserved. Async Transaction Log • index.translog.durability ‒ request (default) ‒ async • index.translog.sync_interval (only if async is set) • Dynamic per-index settings • Be careful, you are relaxing the safety guarantees !64 #docs time(s) 100 500 1000 Request fsync 185.8 152.1 152.6 5s sync 154.8 143.2 143.1
  • 66. © Elasticsearch BV 2015-2019. All rights reserved. Final Remarks !66 Beats Log Files Metrics Wire Data your{beat} Data Store Web APIs Social Sensors Elasticsearch Master Nodes (3) Ingest Nodes (X) Data Nodes Hot (X) Data Notes Warm (X) Logstash Nodes (X) Kafka Redis Messaging Queue Kibana Instances (X) NotificationQueues Storage Metrics
  • 67. © Elasticsearch BV 2015-2019. All rights reserved. Final Remarks • Primaries ‒ More data -> More shards ‒ Do not overshard! • Replicas ‒ high availability (1 replica is the default) ‒ read throughput (More reads -> More replicas) !67 Big Data ... ... ... ... ... ... U s e r s
  • 68. © Elasticsearch BV 2015-2019. All rights reserved. Final Remarks • Bulk and Test • Distribute the Load • Refresh Interval • Async Trans Log (careful) !68 #docs time(s) 100 500 1000 Default 191.7 161.9 163.5 RR+60s+Async5s 154.8 143.2 143.1