SlideShare a Scribd company logo
How we scaled Rudder to 10k
nodes
And the road to 50k nodes
Nicolas CHARLES
Co-founder and COO
@nico_charles
2
Scalability ?
Scalability is the capability of a system,
network, or process to handle a growing
amount of work, or its potential to be
enlarged to accommodate that growth
https://en.wikipedia.org/wiki/Scalability
3
Scalability – why is it an issue in Rudder?
What does Rudder do ?
●
Users define policies
●
Apply them on groups of nodes
●
Rudder computes the policies for each
nodes
●
Agents apply them, and send back
information
●
Rudder computes the compliance
4
Scalability – why is it an issue in Rudder?
Each of these points need to go fast
●
Process nodes inventory quickly
●
Have a fast UI
●
Generate policies in a reasonable time
●
Have fast agents, and don’t overflow the
network
●
Compliance of actual state available
5
Rudder Architecture
6
Rudder Architecture
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Applications
Compliance Configuration Inventory
Plugins
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Rudder Engine Techniques
7
The origin of Rudder
●
At first, Rudder was thought for hundred(s) of nodes
●
No real goal for scalability
●
It was, retrospectively, an MVP
8
The origin of Rudder
●
Scalability went up, driven from
●
Users and usages
– Frustration over slowdowns
– More managed servers
●
Features
– Some features needed much improved performance
– Some needed massive architectural change
9
First bottlenecks to tackle
●
Reporting in Rudder
●
Display compliance of nodes
– Change the data model, as everything was Rule Centric in Rudder 2.3
●
Slow display of reports and compliance
– Remember, we are supporting Postgresql 8.x
– Adding relevant indexes
●
Agent side
●
Agent was already used in critical systems, but impacted performance of
nodes
– Rewrite some policies
– Add tooling around agent to prevent clogging
●
Rudder 2.5 was not more scalable, but more consistent
10
Scalability – Step by Step
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Bandwidth & Network
- Flag files to detect new policies
- Relay servers
11
Scalability – Step by Step
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Scale the uses
- Validation workflow
- Synchronisation of Rudder servers
- API
- More Techniques
12
Scalability – Step by Step
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Improve performance
- Save only changes of Inventories
(several order of magnitude faster)
- Change data model for Compliance
(30 % faster compliance)
13
Scalability – 2.9 & 2.10
●
Improving performances is one of the focus
●
Refactoring and code improvements to improve policy generation time
– Use of hashes and caches
●
Fighting with the ORM to have lighter queries
– Much less commits
●
Make impact on network and node adjustable
●
Configure agent run frequency : can configure based on the
performance of nodes and available bandwidth
14
Scalability – 2.9 & 2.10
●
First industrialized performances test – With Tsung
●
Generated inventories automatically, and send them to endpoint
●
Tests with thousands of inventories
●
Thank you @cscmeu !
http://tsung.erlang-projects.org/
15
Scalability – 2.11
●
Goal: manage thousand nodes
●
Distributed setup
– Make Rudder scale by adding more servers for components
●
UI more responsive to user requests
– Async
– LDAP optimizations
●
No more indexes (everything fits in RAM)
●
Much faster policy generation
– Changed of variable lookup, more caching
– Used a bit of parallelism when it wass easy
●
More performance tests
– A big thank to users pushing the limits
16
Scale the uses – Rudder 2.11
●
Technique Editor : everyone can create techniques
●
Uses ncf
●
Graphical User Interface to make Techniques easier to write
17
Rudder 3
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Complete change of UI
- Design and layout
Compliance is everywhere
- Everything is async
- Everything is cached
18
Rudder 3
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
New data model : Node Centric
- Compliance is per node
- Cached
- And lazyly computed
19
Rudder 3
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Lightweight reports
- Change only reporting
- Send reports only for changes
And much less disk usage
20
Rudder 3
●
For this release, devs had between 1000 and 2000 nodes
on their dev systems
●
A lot of timing info embedded in Rudder
●
Permitted to identify low hanging fruits
●
As a result, everything was much faster
●
500ms compute time with 2000 nodes was considered slow, and
reported as a bug
21
Rudder 3.1 – 5000 nodes
●
Rudder 3.1 – reaching the 5000 nodes limit (well – 7500 at
the end of its life)
●
This is the land of micro-optimization, pushing the limits of the model
– Lazy variables to prevent computation of unwanted values
●
Micro tuning of techniques to make policy generation faster
– But we are still talking about 45 minutes for 5000 nodes with policy
validation
●
Massive performance upgrade of the agent
– Change complexity of managing big policy
22
Rudder 3.1 – 5000 nodes
●
Tooling to generate compliance reports from nodes
●
Load servers, detect issues in compliance computing
●
Extensive use of PgBadger to analyze PostgreSQL logs
– From both tests benchs and production systems
– Finding the slow queries and the limits
●
Thank you @matya_j !!
https://github.com/dalibo/pgbadger
23
Rudder 4: going beyond
24
Rudder 4.0: massive changes
●
Policies
●
Each policy is identified by an id
●
Change database model
– Use Doobie, an excellent ORM that lets you write proper SQL
– Configuration is stored in JSON rather than JOINs
●
No « leaking » of policies changes from one node to another
– Regenerate only for the nodes that have been changed
●
Policy generation is much faster
– About 30 times faster (without policy validation)
25
Rudder 4.0: massive changes
●
Compliance
●
Compliance is computed when reports are received server side, cached,
– Twice as fast display of compliance with 1000 nodes, order of magnitude
faster with 5000 nodes
●
Audit mode
●
New LDAP backend (lmdb based)
26
Rudder 4.1: the road to 10k
●
UI is much faster
●
Everything ressources are cached
●
Compress everything (big impact on bad network with large installs and distant
server)
●
Policy generation is pretty fast (if we don’t validate them)
●
About 3 minutes for 7000 nodes
●
External data sources
●
We can trigger from changes remote tool
●
Hooks on events
●
Allow to fine tune behaviour of node acceptation/deletion/policy generation
●
Thank you @FlorianHeigl1 !
27
Rudder 4.3: 10k
●
Policy engine has been rewritten
●
Pluggable, less mutable, a bit faster
●
We can manage 10k nodes on one Rudder server
●
Recommended configuration is 11GB for the Web Interface for 10k nodes
●
Adding more RAM/CPU/IO is enough to go to 15k nodes
●
Still not perfect
●
Policy generation is long with 10k and policy validation activated
●
UI will be sluggish – because of DOM computations
– Might be ok with Firefox 59
●
API will be ok
28
What’s next ?
●
Improve tooling suite
●
Working with Florian Heigl to automate a super large
test plateform
– Automatically create nodes, rules, reports
– At high rate
– Checks application response rate and loads
●
Find new bottleneck using sysdig
29
What’s next ?
●
Improve tooling suite
●
Improve usability and documentation of load tools
– So that more users/contributors can use them
●
Automated tests of UI and measure the response time
at each commit
30
The road to 50k nodes
●
Several types of bottleneck
●
Policy validation
– We can’t realistically validate on the server 50 000 policies
– Policy validation on client side via 2 steps policy updates
●
GUI
– Paginate results on the server side
●
Ease client side burden
●
Improve response rate (especially over slow networks)
– Switch from Angular to ELM
31
The road to 50k nodes
●
Several types of bottleneck
●
Network
– Current protocol is not fit to update hundreds of thousands of files
– Reports are sent back from nodes to Rudder server via syslog
●
Missing compression
●
Rsyslog-psql does one insert/commit in database per received logs :(
●
Policy generation
– Upgrade or replace StringTemplate to lessen IO
– More static files
●
Database
– Use PostgreSQL 10 partitioning to speed up compliance and archiving
32
The road to 50k nodes
●
Missing features
●
We can expect every users of a given installation to need to
manage the whole 50k nodes
– Fine grained authorization (OrBAC)
– Multi-tenancy
– Federation/Synchronisation of different Rudder servers
●
A lot of thinking need to be put in there
●
Improve collaboration
– Notifications everywhere!
– Warn if another user is modifying the current object
●
Change management
– Canary testing
– Ramp-up deployment
33
Final words
●
We are very lucky to have great users pushing the limits
●
A special thank to all of you
Dennis, Olivier, Florian, Christophe, Janos, Pierre, Stéphane, Marc, Alexander,
David, Fabrice, Daniel, Dmitry, Ferenc, François, Vincent, Jean, Lionel, Maxime,
Michael, Enrico, Ilan, Jean Marie, Jeremy, …
(and I’m terribly sorry for all those that I did not mentionned)
●
Tools, softwares and resources evolved during Rudder life
●
They helped improve the scalability as well
How we scaled Rudder to 10k
nodes
Questions?
Nicolas CHARLES
Co-founder and COO
@nico_charles

More Related Content

What's hot

Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
nvirters
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
Ayyappadas Ravindran (Appu)
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
SDN Project PPT
SDN Project PPTSDN Project PPT
SDN Project PPT
Matthew Chang
 
DEVNET-1175 OpenDaylight Service Function Chaining
DEVNET-1175	OpenDaylight Service Function ChainingDEVNET-1175	OpenDaylight Service Function Chaining
DEVNET-1175 OpenDaylight Service Function Chaining
Cisco DevNet
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Cloud Native Day Tel Aviv
 
Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)
Deepak Mane
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
シスコシステムズ合同会社
 
OPNFV Service Function Chaining
OPNFV Service Function ChainingOPNFV Service Function Chaining
OPNFV Service Function Chaining
OPNFV
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
confluent
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitch
mestery
 
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Asher Feldman
 
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
Cloud Native Day Tel Aviv
 
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...Daniel Gheorghita
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 

What's hot (20)

Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
SDN Project PPT
SDN Project PPTSDN Project PPT
SDN Project PPT
 
DEVNET-1175 OpenDaylight Service Function Chaining
DEVNET-1175	OpenDaylight Service Function ChainingDEVNET-1175	OpenDaylight Service Function Chaining
DEVNET-1175 OpenDaylight Service Function Chaining
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
 
Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)
 
Chapter9ccna
Chapter9ccnaChapter9ccna
Chapter9ccna
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
 
OPNFV Service Function Chaining
OPNFV Service Function ChainingOPNFV Service Function Chaining
OPNFV Service Function Chaining
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitch
 
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
 
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
 
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 

Similar to How we scaled Rudder to 10k, and the road to 50k

RedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge DevicesRedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge Devices
Redis Labs
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDN
NetCraftsmen
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)
London Microservices
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net services
xKinAnx
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
APNIC
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
Alexander Penev
 
Software Defined Networking
Software Defined NetworkingSoftware Defined Networking
Software Defined Networking
Abhijeet Singh Panwar
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
Rick Hightower
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
IBM Programmable Network Controller
IBM Programmable Network ControllerIBM Programmable Network Controller
IBM Programmable Network Controller
IBM India Smarter Computing
 
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PROIDEA
 
OpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets OpenflowOpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets Openflow
APNIC
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
Redis Labs
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021
Ieva Navickaite
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyft
Mian Hamid
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
Ioannis Papapanagiotou
 
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layerC. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
Uni Systems S.M.S.A.
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 

Similar to How we scaled Rudder to 10k, and the road to 50k (20)

RedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge DevicesRedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge Devices
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDN
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net services
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Software Defined Networking
Software Defined NetworkingSoftware Defined Networking
Software Defined Networking
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
 
IBM Programmable Network Controller
IBM Programmable Network ControllerIBM Programmable Network Controller
IBM Programmable Network Controller
 
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 
OpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets OpenflowOpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets Openflow
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyft
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layerC. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 

More from RUDDER

What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?
RUDDER
 
Servers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proofServers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proof
RUDDER
 
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
RUDDER
 
OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?
RUDDER
 
The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0
RUDDER
 
What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?
RUDDER
 
UX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management toolUX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management tool
RUDDER
 
What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?
RUDDER
 
What is RUDDER and when should I use it?
What is RUDDER and when should I use it?What is RUDDER and when should I use it?
What is RUDDER and when should I use it?
RUDDER
 
Fosdem - Configurations do you prove yours?
Fosdem - Configurations  do you prove yours?Fosdem - Configurations  do you prove yours?
Fosdem - Configurations do you prove yours?
RUDDER
 
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
RUDDER
 
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
RUDDER
 
Stay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libresStay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libres
RUDDER
 
What's new and what's next in Rudder
What's new and what's next in RudderWhat's new and what's next in Rudder
What's new and what's next in Rudder
RUDDER
 
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
RUDDER
 
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
RUDDER
 
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
RUDDER
 
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
RUDDER
 
RUDDER - Continuous Configuration (configuration management + continuous aud...
 RUDDER - Continuous Configuration (configuration management + continuous aud... RUDDER - Continuous Configuration (configuration management + continuous aud...
RUDDER - Continuous Configuration (configuration management + continuous aud...
RUDDER
 
RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER
 

More from RUDDER (20)

What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?
 
Servers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proofServers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proof
 
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
 
OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?
 
The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0
 
What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?
 
UX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management toolUX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management tool
 
What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?
 
What is RUDDER and when should I use it?
What is RUDDER and when should I use it?What is RUDDER and when should I use it?
What is RUDDER and when should I use it?
 
Fosdem - Configurations do you prove yours?
Fosdem - Configurations  do you prove yours?Fosdem - Configurations  do you prove yours?
Fosdem - Configurations do you prove yours?
 
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
 
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
 
Stay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libresStay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libres
 
What's new and what's next in Rudder
What's new and what's next in RudderWhat's new and what's next in Rudder
What's new and what's next in Rudder
 
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
 
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
 
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
 
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
 
RUDDER - Continuous Configuration (configuration management + continuous aud...
 RUDDER - Continuous Configuration (configuration management + continuous aud... RUDDER - Continuous Configuration (configuration management + continuous aud...
RUDDER - Continuous Configuration (configuration management + continuous aud...
 
RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

How we scaled Rudder to 10k, and the road to 50k

  • 1. How we scaled Rudder to 10k nodes And the road to 50k nodes Nicolas CHARLES Co-founder and COO @nico_charles
  • 2. 2 Scalability ? Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth https://en.wikipedia.org/wiki/Scalability
  • 3. 3 Scalability – why is it an issue in Rudder? What does Rudder do ? ● Users define policies ● Apply them on groups of nodes ● Rudder computes the policies for each nodes ● Agents apply them, and send back information ● Rudder computes the compliance
  • 4. 4 Scalability – why is it an issue in Rudder? Each of these points need to go fast ● Process nodes inventory quickly ● Have a fast UI ● Generate policies in a reasonable time ● Have fast agents, and don’t overflow the network ● Compliance of actual state available
  • 6. 6 Rudder Architecture Rudder Server Root Interfaces CLI WEB UI API Uses Applications Compliance Configuration Inventory Plugins Node Rudder Agent Node Rudder relay Node Rudder Agent Rudder Engine Techniques
  • 7. 7 The origin of Rudder ● At first, Rudder was thought for hundred(s) of nodes ● No real goal for scalability ● It was, retrospectively, an MVP
  • 8. 8 The origin of Rudder ● Scalability went up, driven from ● Users and usages – Frustration over slowdowns – More managed servers ● Features – Some features needed much improved performance – Some needed massive architectural change
  • 9. 9 First bottlenecks to tackle ● Reporting in Rudder ● Display compliance of nodes – Change the data model, as everything was Rule Centric in Rudder 2.3 ● Slow display of reports and compliance – Remember, we are supporting Postgresql 8.x – Adding relevant indexes ● Agent side ● Agent was already used in critical systems, but impacted performance of nodes – Rewrite some policies – Add tooling around agent to prevent clogging ● Rudder 2.5 was not more scalable, but more consistent
  • 10. 10 Scalability – Step by Step Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Bandwidth & Network - Flag files to detect new policies - Relay servers
  • 11. 11 Scalability – Step by Step Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Scale the uses - Validation workflow - Synchronisation of Rudder servers - API - More Techniques
  • 12. 12 Scalability – Step by Step Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Improve performance - Save only changes of Inventories (several order of magnitude faster) - Change data model for Compliance (30 % faster compliance)
  • 13. 13 Scalability – 2.9 & 2.10 ● Improving performances is one of the focus ● Refactoring and code improvements to improve policy generation time – Use of hashes and caches ● Fighting with the ORM to have lighter queries – Much less commits ● Make impact on network and node adjustable ● Configure agent run frequency : can configure based on the performance of nodes and available bandwidth
  • 14. 14 Scalability – 2.9 & 2.10 ● First industrialized performances test – With Tsung ● Generated inventories automatically, and send them to endpoint ● Tests with thousands of inventories ● Thank you @cscmeu ! http://tsung.erlang-projects.org/
  • 15. 15 Scalability – 2.11 ● Goal: manage thousand nodes ● Distributed setup – Make Rudder scale by adding more servers for components ● UI more responsive to user requests – Async – LDAP optimizations ● No more indexes (everything fits in RAM) ● Much faster policy generation – Changed of variable lookup, more caching – Used a bit of parallelism when it wass easy ● More performance tests – A big thank to users pushing the limits
  • 16. 16 Scale the uses – Rudder 2.11 ● Technique Editor : everyone can create techniques ● Uses ncf ● Graphical User Interface to make Techniques easier to write
  • 17. 17 Rudder 3 Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Complete change of UI - Design and layout Compliance is everywhere - Everything is async - Everything is cached
  • 18. 18 Rudder 3 Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques New data model : Node Centric - Compliance is per node - Cached - And lazyly computed
  • 19. 19 Rudder 3 Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Lightweight reports - Change only reporting - Send reports only for changes And much less disk usage
  • 20. 20 Rudder 3 ● For this release, devs had between 1000 and 2000 nodes on their dev systems ● A lot of timing info embedded in Rudder ● Permitted to identify low hanging fruits ● As a result, everything was much faster ● 500ms compute time with 2000 nodes was considered slow, and reported as a bug
  • 21. 21 Rudder 3.1 – 5000 nodes ● Rudder 3.1 – reaching the 5000 nodes limit (well – 7500 at the end of its life) ● This is the land of micro-optimization, pushing the limits of the model – Lazy variables to prevent computation of unwanted values ● Micro tuning of techniques to make policy generation faster – But we are still talking about 45 minutes for 5000 nodes with policy validation ● Massive performance upgrade of the agent – Change complexity of managing big policy
  • 22. 22 Rudder 3.1 – 5000 nodes ● Tooling to generate compliance reports from nodes ● Load servers, detect issues in compliance computing ● Extensive use of PgBadger to analyze PostgreSQL logs – From both tests benchs and production systems – Finding the slow queries and the limits ● Thank you @matya_j !! https://github.com/dalibo/pgbadger
  • 24. 24 Rudder 4.0: massive changes ● Policies ● Each policy is identified by an id ● Change database model – Use Doobie, an excellent ORM that lets you write proper SQL – Configuration is stored in JSON rather than JOINs ● No « leaking » of policies changes from one node to another – Regenerate only for the nodes that have been changed ● Policy generation is much faster – About 30 times faster (without policy validation)
  • 25. 25 Rudder 4.0: massive changes ● Compliance ● Compliance is computed when reports are received server side, cached, – Twice as fast display of compliance with 1000 nodes, order of magnitude faster with 5000 nodes ● Audit mode ● New LDAP backend (lmdb based)
  • 26. 26 Rudder 4.1: the road to 10k ● UI is much faster ● Everything ressources are cached ● Compress everything (big impact on bad network with large installs and distant server) ● Policy generation is pretty fast (if we don’t validate them) ● About 3 minutes for 7000 nodes ● External data sources ● We can trigger from changes remote tool ● Hooks on events ● Allow to fine tune behaviour of node acceptation/deletion/policy generation ● Thank you @FlorianHeigl1 !
  • 27. 27 Rudder 4.3: 10k ● Policy engine has been rewritten ● Pluggable, less mutable, a bit faster ● We can manage 10k nodes on one Rudder server ● Recommended configuration is 11GB for the Web Interface for 10k nodes ● Adding more RAM/CPU/IO is enough to go to 15k nodes ● Still not perfect ● Policy generation is long with 10k and policy validation activated ● UI will be sluggish – because of DOM computations – Might be ok with Firefox 59 ● API will be ok
  • 28. 28 What’s next ? ● Improve tooling suite ● Working with Florian Heigl to automate a super large test plateform – Automatically create nodes, rules, reports – At high rate – Checks application response rate and loads ● Find new bottleneck using sysdig
  • 29. 29 What’s next ? ● Improve tooling suite ● Improve usability and documentation of load tools – So that more users/contributors can use them ● Automated tests of UI and measure the response time at each commit
  • 30. 30 The road to 50k nodes ● Several types of bottleneck ● Policy validation – We can’t realistically validate on the server 50 000 policies – Policy validation on client side via 2 steps policy updates ● GUI – Paginate results on the server side ● Ease client side burden ● Improve response rate (especially over slow networks) – Switch from Angular to ELM
  • 31. 31 The road to 50k nodes ● Several types of bottleneck ● Network – Current protocol is not fit to update hundreds of thousands of files – Reports are sent back from nodes to Rudder server via syslog ● Missing compression ● Rsyslog-psql does one insert/commit in database per received logs :( ● Policy generation – Upgrade or replace StringTemplate to lessen IO – More static files ● Database – Use PostgreSQL 10 partitioning to speed up compliance and archiving
  • 32. 32 The road to 50k nodes ● Missing features ● We can expect every users of a given installation to need to manage the whole 50k nodes – Fine grained authorization (OrBAC) – Multi-tenancy – Federation/Synchronisation of different Rudder servers ● A lot of thinking need to be put in there ● Improve collaboration – Notifications everywhere! – Warn if another user is modifying the current object ● Change management – Canary testing – Ramp-up deployment
  • 33. 33 Final words ● We are very lucky to have great users pushing the limits ● A special thank to all of you Dennis, Olivier, Florian, Christophe, Janos, Pierre, Stéphane, Marc, Alexander, David, Fabrice, Daniel, Dmitry, Ferenc, François, Vincent, Jean, Lionel, Maxime, Michael, Enrico, Ilan, Jean Marie, Jeremy, … (and I’m terribly sorry for all those that I did not mentionned) ● Tools, softwares and resources evolved during Rudder life ● They helped improve the scalability as well
  • 34. How we scaled Rudder to 10k nodes Questions? Nicolas CHARLES Co-founder and COO @nico_charles