SlideShare a Scribd company logo
Logs @ OVHcloud
Babacar Diassé
28 novembre 2019
Observability @ OVH
whoami
Babacar Diassé
Software Engineer @ OVH
@ghostdiasse
github.com/jehuty0shift
>400k servers
>18M Web apps
>1.5M customers
more and more managed products
Presience
Our product family: Platform
Observability
(6 persons)
IO Team
(4 persons)
Managed Kubernetes
(8 persons)
Agenda
1. The Mission
Why and how we built the platform.
1. Deep Dives
How we managed to scale.
1. Extra Bits
What’s more.
The Mission
7
The Mission
“Provide a platform allowing OVH to collect, retrieve and analyze logs from any
infrastructure or application” (end 2014)
The Mission
“Provide a platform allowing OVH to collect, retrieve and analyze logs from any
infrastructure or application”.
● Available as a Service
● All OVH personas, multi-tenant
● Centralized, queryable, analytics capabilities
● Servers, network, devices
● Software from OVH and others
The Mission
● 2 people at start.
● First P.O.C leveraging Big Data ecosystem :
The Mission : POC challenges
● Complexity
● Multi tenancy
● Orchestration
The Mission
Too much work, so little time:
A wonderful person (@jedisct1) showed us :
The Mission: Graylog
The Mission : Graylog
✔ Elasticsearch As Backend
✔ Features : Search, Data Viz, Alerting, Extensible
✔ Built-in Multi tenancy
✔ Scalable By Design
✔ Standards formats (Syslog, Gelf)
✔ API Available
Elasticsearch Basics
The Mission: Alpha
The Mission: Graylog
● Alpha (early 2015):
○ CDN logs: 70k logs/sec (~1KB bytes/log)
○ 3 Graylog servers
■ Xeon E5-2620 v2 (12 cores, 2.1 Ghz) / 48 GB
■ Graylog 1.1
○ 3 Elasticsearch nodes
■ Xeon E5-2650 (16 cores, 2.6 Ghz) / 64 GB (30 GB for JVM) /
HDD Raid 0 (7 To)
■ Elasticsearch 1.7.2
■ 3 shards / 1 replica
The Mission: Alpha
● Alpha (early 2015):
○ 1 VM for Graylog web interface
○ 3 VM for MongoDB
○ 1 HA Proxy
The Mission : Alpha
● The Good :
😁 Performance
😁 Practicality
😁 Stability
● The Bad:
☹️ Not Self Service
☹️ Mutualized Indexes
● The Ugly:
🤮 1 socket = 1 Graylog Server
The Mission: Beta
The Mission: Beta
● Beta 1 (mid 2015):
○ Target: 300k-500k logs/sec (~1 KB bytes/log)
○ 16 Graylog servers BM nodes
■ 1*Xeon E5-2650 v2 (16 cores, 2.6 Ghz) / 128 GB
■ Graylog 1.3
○ 80 Elasticsearch BM nodes
■ 1*Xeon E5-2650 v2 (16 cores, 2.6 Ghz) / 128 GB (30 GB for
JVM) / HDD Raid 0 (7 To)
■ Elasticsearch 1.7.5
■ 80 shards / 1 replica
○ 3 MongoDB VM.
● Beta 1 (mid 2015):
○ 3 VM Graylog web
○ 16 Kafka Nodes (0.8)
○ Flowgger (0.1)
○ Dedicated Logstash and Flowgger on SailAbove (Container As A
Service)
○ 3 infrastructure nodes:
■ Zookeeper/Flowgger/ES masters/Engine/Admin Tools
○ Syslog RFC 5424/LTSV/GELF/Cap’n’Proto standards
The Mission: Beta
The Mission: Beta
At first:
The Mission: Beta
The Mission: Beta
● The Good :
😁 Kafka/ZK/Flowgger/Graylog
😁 Users and use cases
● The Bad:
☹️ Retention is low
☹️ Logstash performance
● The Ugly:
🤮 Elasticsearch
The Mission: Beta
● Too many shards (250 indexes *160 shards = 40 000 shards):
○ Initialization and Rebalancing issues.
○ Memory consumption in data structures.
○ Big Cluster State Update (slow recovery/slow pending tasks).
● CMS GC:
○ Long STW GC Pauses => nodes out of the cluster.
○ G1GC was not deemed prod ready for Lucene (LUCENE-
5168/LUCENE-6098).
● Resources Usage:
○ Big Queries => I/O Wait => Lag
○ Indexing burst => No search performance
The Mission: Beta
Improvements:
● Hot-Warm architecture:
○ Nodes dedicated to indexing and “recent” data searching
○ Nodes dedicated to search only
The Mission: Beta
Improvements:
● G1GC:
○ Few STWs collection
○ Better suited for medium sized heaps
The Mission: Beta
Improvements:
● Elasticsearch:
○ Upgrade to 2.X: better, faster, stronger.
○ Divide the number of shards by 2.
○ Configuration changes: breakers, threadpool, index settings,
mapping...
The Mission: Gamma
● From Beta to Gamma (2015-2017):
○ SSD on Hot-Nodes
○ Streams and Dashboards Sharing
○ Better performance on ES
○ Graylog upgrade and plugins
○ SailAbove to Mesos
○ Additional Features: Cold Storage, Index As a service
The Mission: Gamma
● But, big outages on the way:
○ Unexplained issues:
■ “ghost” indexes
■ hot spot
■ memory leaks
○ Explained issues:
■ OS, JVM, ES Settings
■ MongoDB
■ Bugs
The Mission: Gamma
● Problems:
○ Domain of failure
○ Different user needs
■ Low latency
■ High indexing write
■ High retention
○ Inefficiency
○ Scalability
● Solution:
Multi-Cluster
The Mission: LDP
The Mission: LDP
✔ Global multi-tenancy
✔ Independent scaling
✔ All features
✔ Customization
✔ OVH API
The Mission: LDP
Current Status:
● 36 clusters
● 1.5-1.8 Million docs/sec (140 B/day)
● 4+ Trillion of docs indexed.
● 500+ search/sec
● Graylog 2.5
● Elasticsearch 6.8
Deep Dives
37
Disclaimer
● “It works !™” for OUR use case : Logging with mutualized indexes.
● “It works !™” until our next upgrade or our next rendezvous.
● “It works !™” within our budget:
○ Budget == infrastructure cost + SREs time.
Elasticsearch @ Scale
Know your infrastructure
Know your stack
Deep Dives
● Kafka and Zookeeper
● MongoDB
● Graylog
● Elasticsearch
Deep Dives: Zookeeper
● Use dedicated nodes for Zookeeper
● Use decent I/O storage
Deep Dives: Kafka
● IO scheduler: prefer deadline/mqdeadline
● Rack awareness
● Compress on producer side and on topics (ZSTD
available in 2.1).
● Keep the number of partition as low as possible
● Setup I/O threads and network threads
● Monitor partition assignment
● Use modern consumers
Deep Dives: MongoDB
● Primary only for R/W
● Indexes
● Journaled writes
● Write Concern
Deep Dives: Graylog
● Message Processing metrics
● Use Custom message processor
● Tune processbuffer+outputbuffer_processors, ring_sizes,
batch_sizes
● Enable rest gzip
● tune web+rest_selector_runners_count
● tune rest_worker+proxied_request_threadpool_size
● Rotation Strategy: prefer size
● Number of shards -> number of indexing nodes/2
Deep Dives: Elasticsearch
● Indexing is CPU Heavy
● Raid 0 or SSD
● SSD: use deadline
● No Swap
● Tune, net.ipv4.tcp_tw_reuse, fs.file-max, fs.nr_open, fs.aio-max-
nr, vm.max_map_count
Deep Dives: Elasticsearch
● JDK 13
● Xms == Xmx, -XX:+AlwaysPreTouch, -XX:-OmitStackTraceInFastThrow
● -Xss=1m
● Heap < 30 GB (oops)
● Heap < ½ Host RAM.
● Use G1GC
○ XX:ConcGCThreads=n/4
○ XX:ParallelGCThreads=n<8?8:8+(n-8)*0.625
○ XX:+ParallelRefProcEnabled
○ XX:MaxGCPauseMillis=250
○ XX:InitiatingHeapOccupancyPercent=<70-80>
○ GC Logging
● bi-socket: -XX:+UseNUMA
● -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
● -Djdk.nio.maxCachedBufferSize -XX:MaxDirectMemorySize
Deep Dives: Elasticsearch
● -Des.networkaddress.cache.negative.ttl=10
● -Des.networkaddress.cache.ttl=60
● -Dio.netty.noUnsafe=true
● -Dio.netty.noKeySetOptimization=true
● -Dio.netty.recycler.maxCapacityPerThread=0
Deep Dives: Elasticsearch
Node Settings:
● node.attr.box_type (hot-warm)
● cluster.routing.allocation.awareness.attributes
● transport/http.netty.worker_count
● Http.* settings
● threadpool.bulk/index/search/force_merge
● indices.breaker.request/fielddata/total.limit
● Indices.recovery.concurrent_streams/translog_ops
● indices.queries.cache.size
● cache.recycler.page.limit.heap
Deep Dives: Elasticsearch
Cluster Settings:
● Cluster.routing.allocation:
○ Node_concurrent_recoveries
○ Node_initial_primaries_recoveries
○ Cluster_concurrent_rebalance
● cluster.routing.allocation.balance:
○ Raise *.balance.threshold
○ *.balance.index >> *.balance.shard
Deep Dives: Elasticsearch
Indices Settings:
● index.mapping.total_fields.limit
● Index.requests.cache.enable
● index.codec
● Index.translog.flush_threashold_size
● index.translog.durability
● Index.merge.scheduler.max_thread_count
● Index.merge.scheduler.max_merge_count
● index.unassigned.node_left.delayed_timeout
Deep Dives: Elasticsearch
Indices Mapping:
● Use Templates:
● Deactivate Norms and index
● Conventions:
{ “double_suffix”: {
"mapping" : {
"type" : "double"
},
"match" : "*_double"
}
},
Deep Dives: Improve
● Observability
○ System metrics
○ JVM GC Logging
○ Jstack, jmap are your friends
○ Software KPI
Deep Dives: Improve
● Try new settings
○ Breaking a node must be easy
○ Breaking a cluster should be possible
○ Try/Fail/Try again
○ Try with real workload
Extra bits
Extra
Bits
55
Extra Bits
Extra Features:
● ES API to search streams
● Cold Storage on PCA
● Index as a Service
● Kibana as a Service
● Real time tail over WebSocket
Extra Bits: Under the Hood
● Engine: 100k LOC
● Monitoring: Ganglia, Shinken, Opsgenie
● Metrics Data Platform for business metrics
Extra Bits
● Low Latency Cluster for SOC
○ 100-200 logs/sec => Small cluster (4 data nodes)
○ Must answer < 200 ms on queries spanning on millions of data
○ One user login at OVH == One query
○ SSD + high cache sizes
○ Tweak queries to most efficient aggregations.
Extra Bits
● High Writing Cluster for DNS
○ 800k logs/sec (burst > 1.2 M)
○ Hot-Warm cluster (54 hot/14 warm)
○ Hot CPU => 2X Xeon E5-2640v3 (16c 40-60 % CPU usage)
○ 737 Billions of DNS Record
○ 150 TB of Data for primaries
Extra Bits
● High Writing Cluster for Mail
○ 112k logs/sec (burst > 200k)
○ Hot-Warm cluster (30 hot/22 warm)
○ Hot CPU => 2X Xeon E5-2640v3 (16c 30-50 % CPU usage)
○ 152 Billions of logs
○ 135 TB of Data for primaries
○ ~2KB by message
Closing
● Know your users
○ Write Workload vs Low Latency vs Read Workload
○ Expectations (retention, performance)
○ Gather Feedback
○ Teach/Document good user practices
Closing
● Know your stack
○ Read documentation, read blogs
○ Read Code
○ Observe software metrics and logs
○ Try, fail, try, fail, try, fail...until success
○ Upgrade your software to latest versions
Closing
● Know your infrastructure
○ Prefer Bare Metal for predictability
○ Prepare for failure
○ Scale only when everything else fails
○ Observe system metrics
Thank you.

More Related Content

What's hot

ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
Avleen Vig
 
[jLove 2020] Where is my cache architectural patterns for caching microservi...
[jLove 2020] Where is my cache  architectural patterns for caching microservi...[jLove 2020] Where is my cache  architectural patterns for caching microservi...
[jLove 2020] Where is my cache architectural patterns for caching microservi...
Rafał Leszko
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
GeeksLab Odessa
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
GeeksLab Odessa
 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open source
Thomas Alrin
 
Sharding
ShardingSharding
Sharding
MongoDB
 
M|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStoreM|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStore
MariaDB plc
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
Kai Sasaki
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Sage Weil
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and Beyond
Sage Weil
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigSelena Deckelmann
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
JSONiq - The SQL of NoSQL
JSONiq - The SQL of NoSQLJSONiq - The SQL of NoSQL
JSONiq - The SQL of NoSQL
William Candillon
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
MongoDB
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write Paths
MariaDB plc
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
Ceph Community
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Pythian
 

What's hot (20)

ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
[jLove 2020] Where is my cache architectural patterns for caching microservi...
[jLove 2020] Where is my cache  architectural patterns for caching microservi...[jLove 2020] Where is my cache  architectural patterns for caching microservi...
[jLove 2020] Where is my cache architectural patterns for caching microservi...
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open source
 
Sharding
ShardingSharding
Sharding
 
M|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStoreM|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStore
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and Beyond
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
JSONiq - The SQL of NoSQL
JSONiq - The SQL of NoSQLJSONiq - The SQL of NoSQL
JSONiq - The SQL of NoSQL
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write Paths
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
 

Similar to Logs @ OVHcloud

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017
Severalnines
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Athens Big Data
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
MariaDB Corporation
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
Wei Shan Ang
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
Hakan Ilter
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
Juan Berner
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Hernan Costante
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Pablo Garbossa
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
Corey Huinker
 

Similar to Logs @ OVHcloud (20)

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
nebulaconf
nebulaconfnebulaconf
nebulaconf
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 

More from OVHcloud

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
OVHcloud
 
How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020
OVHcloud
 
OVHcloud Partner Webinar - Data Processing
OVHcloud Partner Webinar - Data ProcessingOVHcloud Partner Webinar - Data Processing
OVHcloud Partner Webinar - Data Processing
OVHcloud
 
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud
 
Webinar - VPS New Range
Webinar - VPS New RangeWebinar - VPS New Range
Webinar - VPS New Range
OVHcloud
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud Databases
OVHcloud
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML serving
OVHcloud
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStack
OVHcloud
 
1 sysadmin vs 250 clusters de stockage
1 sysadmin vs 250 clusters de stockage1 sysadmin vs 250 clusters de stockage
1 sysadmin vs 250 clusters de stockage
OVHcloud
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
OVHcloud
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning
OVHcloud
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
OVHcloud
 
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
OVHcloud
 

More from OVHcloud (20)

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020
 
OVHcloud Partner Webinar - Data Processing
OVHcloud Partner Webinar - Data ProcessingOVHcloud Partner Webinar - Data Processing
OVHcloud Partner Webinar - Data Processing
 
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
 
Webinar - VPS New Range
Webinar - VPS New RangeWebinar - VPS New Range
Webinar - VPS New Range
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud Databases
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML serving
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStack
 
1 sysadmin vs 250 clusters de stockage
1 sysadmin vs 250 clusters de stockage1 sysadmin vs 250 clusters de stockage
1 sysadmin vs 250 clusters de stockage
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
 

Recently uploaded

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Logs @ OVHcloud

  • 1. Logs @ OVHcloud Babacar Diassé 28 novembre 2019 Observability @ OVH
  • 2. whoami Babacar Diassé Software Engineer @ OVH @ghostdiasse github.com/jehuty0shift
  • 3. >400k servers >18M Web apps >1.5M customers
  • 4. more and more managed products Presience
  • 5. Our product family: Platform Observability (6 persons) IO Team (4 persons) Managed Kubernetes (8 persons)
  • 6. Agenda 1. The Mission Why and how we built the platform. 1. Deep Dives How we managed to scale. 1. Extra Bits What’s more.
  • 8. The Mission “Provide a platform allowing OVH to collect, retrieve and analyze logs from any infrastructure or application” (end 2014)
  • 9. The Mission “Provide a platform allowing OVH to collect, retrieve and analyze logs from any infrastructure or application”. ● Available as a Service ● All OVH personas, multi-tenant ● Centralized, queryable, analytics capabilities ● Servers, network, devices ● Software from OVH and others
  • 10. The Mission ● 2 people at start. ● First P.O.C leveraging Big Data ecosystem :
  • 11. The Mission : POC challenges ● Complexity ● Multi tenancy ● Orchestration
  • 12. The Mission Too much work, so little time: A wonderful person (@jedisct1) showed us :
  • 14. The Mission : Graylog ✔ Elasticsearch As Backend ✔ Features : Search, Data Viz, Alerting, Extensible ✔ Built-in Multi tenancy ✔ Scalable By Design ✔ Standards formats (Syslog, Gelf) ✔ API Available
  • 17. The Mission: Graylog ● Alpha (early 2015): ○ CDN logs: 70k logs/sec (~1KB bytes/log) ○ 3 Graylog servers ■ Xeon E5-2620 v2 (12 cores, 2.1 Ghz) / 48 GB ■ Graylog 1.1 ○ 3 Elasticsearch nodes ■ Xeon E5-2650 (16 cores, 2.6 Ghz) / 64 GB (30 GB for JVM) / HDD Raid 0 (7 To) ■ Elasticsearch 1.7.2 ■ 3 shards / 1 replica
  • 18. The Mission: Alpha ● Alpha (early 2015): ○ 1 VM for Graylog web interface ○ 3 VM for MongoDB ○ 1 HA Proxy
  • 19. The Mission : Alpha ● The Good : 😁 Performance 😁 Practicality 😁 Stability ● The Bad: ☹️ Not Self Service ☹️ Mutualized Indexes ● The Ugly: 🤮 1 socket = 1 Graylog Server
  • 21. The Mission: Beta ● Beta 1 (mid 2015): ○ Target: 300k-500k logs/sec (~1 KB bytes/log) ○ 16 Graylog servers BM nodes ■ 1*Xeon E5-2650 v2 (16 cores, 2.6 Ghz) / 128 GB ■ Graylog 1.3 ○ 80 Elasticsearch BM nodes ■ 1*Xeon E5-2650 v2 (16 cores, 2.6 Ghz) / 128 GB (30 GB for JVM) / HDD Raid 0 (7 To) ■ Elasticsearch 1.7.5 ■ 80 shards / 1 replica ○ 3 MongoDB VM.
  • 22. ● Beta 1 (mid 2015): ○ 3 VM Graylog web ○ 16 Kafka Nodes (0.8) ○ Flowgger (0.1) ○ Dedicated Logstash and Flowgger on SailAbove (Container As A Service) ○ 3 infrastructure nodes: ■ Zookeeper/Flowgger/ES masters/Engine/Admin Tools ○ Syslog RFC 5424/LTSV/GELF/Cap’n’Proto standards The Mission: Beta
  • 25. The Mission: Beta ● The Good : 😁 Kafka/ZK/Flowgger/Graylog 😁 Users and use cases ● The Bad: ☹️ Retention is low ☹️ Logstash performance ● The Ugly: 🤮 Elasticsearch
  • 26. The Mission: Beta ● Too many shards (250 indexes *160 shards = 40 000 shards): ○ Initialization and Rebalancing issues. ○ Memory consumption in data structures. ○ Big Cluster State Update (slow recovery/slow pending tasks). ● CMS GC: ○ Long STW GC Pauses => nodes out of the cluster. ○ G1GC was not deemed prod ready for Lucene (LUCENE- 5168/LUCENE-6098). ● Resources Usage: ○ Big Queries => I/O Wait => Lag ○ Indexing burst => No search performance
  • 27. The Mission: Beta Improvements: ● Hot-Warm architecture: ○ Nodes dedicated to indexing and “recent” data searching ○ Nodes dedicated to search only
  • 28. The Mission: Beta Improvements: ● G1GC: ○ Few STWs collection ○ Better suited for medium sized heaps
  • 29. The Mission: Beta Improvements: ● Elasticsearch: ○ Upgrade to 2.X: better, faster, stronger. ○ Divide the number of shards by 2. ○ Configuration changes: breakers, threadpool, index settings, mapping...
  • 30. The Mission: Gamma ● From Beta to Gamma (2015-2017): ○ SSD on Hot-Nodes ○ Streams and Dashboards Sharing ○ Better performance on ES ○ Graylog upgrade and plugins ○ SailAbove to Mesos ○ Additional Features: Cold Storage, Index As a service
  • 31. The Mission: Gamma ● But, big outages on the way: ○ Unexplained issues: ■ “ghost” indexes ■ hot spot ■ memory leaks ○ Explained issues: ■ OS, JVM, ES Settings ■ MongoDB ■ Bugs
  • 32. The Mission: Gamma ● Problems: ○ Domain of failure ○ Different user needs ■ Low latency ■ High indexing write ■ High retention ○ Inefficiency ○ Scalability
  • 35. The Mission: LDP ✔ Global multi-tenancy ✔ Independent scaling ✔ All features ✔ Customization ✔ OVH API
  • 36. The Mission: LDP Current Status: ● 36 clusters ● 1.5-1.8 Million docs/sec (140 B/day) ● 4+ Trillion of docs indexed. ● 500+ search/sec ● Graylog 2.5 ● Elasticsearch 6.8
  • 38. Disclaimer ● “It works !™” for OUR use case : Logging with mutualized indexes. ● “It works !™” until our next upgrade or our next rendezvous. ● “It works !™” within our budget: ○ Budget == infrastructure cost + SREs time.
  • 39. Elasticsearch @ Scale Know your infrastructure Know your stack
  • 40. Deep Dives ● Kafka and Zookeeper ● MongoDB ● Graylog ● Elasticsearch
  • 41. Deep Dives: Zookeeper ● Use dedicated nodes for Zookeeper ● Use decent I/O storage
  • 42. Deep Dives: Kafka ● IO scheduler: prefer deadline/mqdeadline ● Rack awareness ● Compress on producer side and on topics (ZSTD available in 2.1). ● Keep the number of partition as low as possible ● Setup I/O threads and network threads ● Monitor partition assignment ● Use modern consumers
  • 43. Deep Dives: MongoDB ● Primary only for R/W ● Indexes ● Journaled writes ● Write Concern
  • 44. Deep Dives: Graylog ● Message Processing metrics ● Use Custom message processor ● Tune processbuffer+outputbuffer_processors, ring_sizes, batch_sizes ● Enable rest gzip ● tune web+rest_selector_runners_count ● tune rest_worker+proxied_request_threadpool_size ● Rotation Strategy: prefer size ● Number of shards -> number of indexing nodes/2
  • 45. Deep Dives: Elasticsearch ● Indexing is CPU Heavy ● Raid 0 or SSD ● SSD: use deadline ● No Swap ● Tune, net.ipv4.tcp_tw_reuse, fs.file-max, fs.nr_open, fs.aio-max- nr, vm.max_map_count
  • 46. Deep Dives: Elasticsearch ● JDK 13 ● Xms == Xmx, -XX:+AlwaysPreTouch, -XX:-OmitStackTraceInFastThrow ● -Xss=1m ● Heap < 30 GB (oops) ● Heap < ½ Host RAM. ● Use G1GC ○ XX:ConcGCThreads=n/4 ○ XX:ParallelGCThreads=n<8?8:8+(n-8)*0.625 ○ XX:+ParallelRefProcEnabled ○ XX:MaxGCPauseMillis=250 ○ XX:InitiatingHeapOccupancyPercent=<70-80> ○ GC Logging ● bi-socket: -XX:+UseNUMA ● -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError ● -Djdk.nio.maxCachedBufferSize -XX:MaxDirectMemorySize
  • 47. Deep Dives: Elasticsearch ● -Des.networkaddress.cache.negative.ttl=10 ● -Des.networkaddress.cache.ttl=60 ● -Dio.netty.noUnsafe=true ● -Dio.netty.noKeySetOptimization=true ● -Dio.netty.recycler.maxCapacityPerThread=0
  • 48. Deep Dives: Elasticsearch Node Settings: ● node.attr.box_type (hot-warm) ● cluster.routing.allocation.awareness.attributes ● transport/http.netty.worker_count ● Http.* settings ● threadpool.bulk/index/search/force_merge ● indices.breaker.request/fielddata/total.limit ● Indices.recovery.concurrent_streams/translog_ops ● indices.queries.cache.size ● cache.recycler.page.limit.heap
  • 49. Deep Dives: Elasticsearch Cluster Settings: ● Cluster.routing.allocation: ○ Node_concurrent_recoveries ○ Node_initial_primaries_recoveries ○ Cluster_concurrent_rebalance ● cluster.routing.allocation.balance: ○ Raise *.balance.threshold ○ *.balance.index >> *.balance.shard
  • 50. Deep Dives: Elasticsearch Indices Settings: ● index.mapping.total_fields.limit ● Index.requests.cache.enable ● index.codec ● Index.translog.flush_threashold_size ● index.translog.durability ● Index.merge.scheduler.max_thread_count ● Index.merge.scheduler.max_merge_count ● index.unassigned.node_left.delayed_timeout
  • 51. Deep Dives: Elasticsearch Indices Mapping: ● Use Templates: ● Deactivate Norms and index ● Conventions: { “double_suffix”: { "mapping" : { "type" : "double" }, "match" : "*_double" } },
  • 52. Deep Dives: Improve ● Observability ○ System metrics ○ JVM GC Logging ○ Jstack, jmap are your friends ○ Software KPI
  • 53. Deep Dives: Improve ● Try new settings ○ Breaking a node must be easy ○ Breaking a cluster should be possible ○ Try/Fail/Try again ○ Try with real workload
  • 56. Extra Bits Extra Features: ● ES API to search streams ● Cold Storage on PCA ● Index as a Service ● Kibana as a Service ● Real time tail over WebSocket
  • 57. Extra Bits: Under the Hood ● Engine: 100k LOC ● Monitoring: Ganglia, Shinken, Opsgenie ● Metrics Data Platform for business metrics
  • 58. Extra Bits ● Low Latency Cluster for SOC ○ 100-200 logs/sec => Small cluster (4 data nodes) ○ Must answer < 200 ms on queries spanning on millions of data ○ One user login at OVH == One query ○ SSD + high cache sizes ○ Tweak queries to most efficient aggregations.
  • 59. Extra Bits ● High Writing Cluster for DNS ○ 800k logs/sec (burst > 1.2 M) ○ Hot-Warm cluster (54 hot/14 warm) ○ Hot CPU => 2X Xeon E5-2640v3 (16c 40-60 % CPU usage) ○ 737 Billions of DNS Record ○ 150 TB of Data for primaries
  • 60. Extra Bits ● High Writing Cluster for Mail ○ 112k logs/sec (burst > 200k) ○ Hot-Warm cluster (30 hot/22 warm) ○ Hot CPU => 2X Xeon E5-2640v3 (16c 30-50 % CPU usage) ○ 152 Billions of logs ○ 135 TB of Data for primaries ○ ~2KB by message
  • 61. Closing ● Know your users ○ Write Workload vs Low Latency vs Read Workload ○ Expectations (retention, performance) ○ Gather Feedback ○ Teach/Document good user practices
  • 62. Closing ● Know your stack ○ Read documentation, read blogs ○ Read Code ○ Observe software metrics and logs ○ Try, fail, try, fail, try, fail...until success ○ Upgrade your software to latest versions
  • 63. Closing ● Know your infrastructure ○ Prefer Bare Metal for predictability ○ Prepare for failure ○ Scale only when everything else fails ○ Observe system metrics

Editor's Notes

  1. Complexity of stack : two devs (and no Ops) for too much new products to operate
  2. EAB=> Near real Time Basic but useful Built in Multitenancy. More Graylog, more power Can build an API which pilots this API Cold storage