This document provides an overview of the Rudder configuration management tool from the perspective of a Rudder ambassador and systems administrator. Some key points:
- Rudder aims to make configuration management easier to use for more people by continuously assessing systems and automatically correcting any drift from the desired configuration.
- Using Rudder results in more reproducible base OS installations, improved abstraction of desired configurations, and more hands-off maintenance of tightly controlled systems over time.
- While some aspects like cluster management remain challenging, the performance and scalability of Rudder has improved significantly. The agent has almost no impact on systems and a single relay can handle thousands of nodes.
- Rudder enables convenient features like automatic security patching,
3. Why I‘m here
Dayjob: Freelance Sysadmin-Consultant
I like fixing things and processes
Nightjob: Fix a lot of things. Rant a lot.
Rudder Ambassador
OpenNebula community champion
4. Why I‘m here
Liked bleeding edge, using Ansible since 2011
(10-20-800-100-30 nodes)
Some other tools, before that, too.
Not really happy.
Too many cases of: “if our solution doesn‘t fit, you got the wrong problem(*)“
And then I tried Rudder...
(I might have a backup slide on that)
5.
6. Rudder from where
u Rudder project went public in 2011
u Basic idea: „Drift assessment“
u What‘s parts of my fleet are drifting away?
u How do we best steer all of it back on course?
u This is how you avoid crashes!
u Project started by 3 long-term CM consultants
u Built on real requirements of many people
9. Rudder to where
Is it really easier to use?
For whom is it easier to use?
10. Rudder to where
What changes if you use it – short term
u Convenience level is extreme since everything is automatic
u Base OS rebuilds get quite reproducible
u Need to think very cross-OS, helps abstract what you really wanted
u Expect you‘ll want to rebuild to improve on this
u Track what you‘re adjusting
11. Rudder to where
What changes if you use it -- medium-long term
u Very hands-off – satisfaction can‘t come from one-off runs anymore, but
from running tight ship all the time
u CMDB housekeeping - Ghost ships are trouble
u Continuously maintained systems get more defensible
13. Rudder to where
Are there unexpected benefits?
u Naming conventions (tiny but powerful)
u Architectur-e-ing
u THE AGENT
1. an agent means no lock-out
2. things can just fix themselves
15. UX
What is easier now?
You don‘t even need to do most things (dynamic groups)
Having Metrics
Detecting ‘weirdness‘
Self-Fixing (Not more than glitches in the Matrix)
16. Rudder to where
What is still hard
u Bending to your will a tool is tricky if you try things you‘re not (yet)
supposed to. Glue is sticky & might not come out right 1st time
u Auto-acceptance
u What‘s hard everywhere else: Clusters
17. UX
Some question marks & dreams remain
Policy maintenance over years
(will start JSON-Diffing now)
High-end rollout clockworks
We need to build our Docker support (it‘s easy)
20. UX
„I didn‘t imagine it could be this intuitive“
-- junior project manager after about 15 minutes of introduction to Rudder
21. UX
Having a Web Interface can help
u visible documentation
u conformity
u differing skill levels
u large teams
u having a design
u Building bridges
28. Performance
u Gets faster on (almost) each version
u 4.1 is ... fast
1. Good performance à add Features
2. Features à Perf cost
3. Cry about it à Tuning
4. Tuning à Faster than 1.
30. Performance
u GUI was performing OK up to 1000 nodes
u Many rewrites, much tuning
u 30x faster now
u Smooth, loads 2000 nodes in 10s via Wifi + SSH tunnel J
32. Performance
u What if you don‘t manage 1000s of nodes?
u What if your smallest server type has less than 512G RAM?
u Can you run the server on something normal?
34. Performance
u Master: 4GB good starting point, 8GB nicer
u Master: JVM + PostgreSQL + LDAP want RAM
u I combine w/ ElasticSearch + Logstash => 16GB RAM
u Don‘t combine on AWS t2.* instances. Never.
35. Performance
u Agent: Needs a little disk space, almost no RAM, a bit of CPU (@5min)
u Agent: Syslog traffic bursty, but can limit to „relevant“ info
u Relay (Hub): a single 2 core / 2GB Xen VM could handle 2000 nodes
u Relay (Hub): Likely put on anything down to Avoton level
36. Cool things: OpenSCAP
u Yes, we got that...
1. Automated OVAL fetch
2. Central Validation (OVAL = downloaded XML processed as root!)
3. Automatic Deployment
4. Autoscheduled, time-spread daily Runs
5. Automatic result collection
6. Results integrated in UI (Rudder plugin)
37. Cool things: Agent
Just to get that clear...
u Completely AUTONOMOUS
u Owns & Decides to run policy
u Works without master/relays
u Will likely keep policy intact forever
u ...till Cthulhu awakes at the end of time
38. Cool things: A skeleton
u Trivial, but can help everyone
1. Centrally manage /etc/skel
2. creates /home/$user/.ssh
3. touch authorized_keys
4. separate root skel (.vimrc, .inputrc, ...)
u /etc/skel is non-invasive luxury defaults
39. Cool things: Autopatching
u started autopatchings systems where I‘m allowed to
u yum hooks (post-install triggers)
u used to restart endangered OpenSSL-based services
u need some yum excludes
u just avoid halfassed desktop things like firewalld
40. Cool things: Monitoring
u Systems are clean enough to alert
1. Automated Agent config inc. SSH keys
2. Automated Lynis (Baselines Sec Scanner) rollout
3. Automated daily security scoring
4. Scores reported to Nagios & alerted
5. Rudder compliance also in Nagios
6. Missing OS patches also in Nagios
7. Put in Service Group/BI Rule „Compliance“
41. Cool things: Application setup
u Yes, you can do that...
1. Trigger via Node Properties (can be from CMDB, AWS Tags, ...)
2. Set up application stack
3. Initialize „safe“ applications (ES, Redis, ...)
4. Don‘t initialize „unsafe“ applications (PostgreSQL)
42. Cool things: Application setup
u But yes, you can do that...
1. Trigger via Node Properties (can be from CMDB, AWS Tags, ...)
2. Set up application stack
3. Initialize „safe“ applications (ES, Redis, ...)
4. Don‘t initialize „unsafe“ applications (PostgreSQL)
43. Cool things: Audit mode
u Fleet Control killer feature
1. Decide: Enforce or Report Compliance Deltas
1. Per Node
2. Per Setting
3. Per Rule
2. Query via API
3. Think, Plan, Conquer
44. Cool things: Relay API
u Instant Policy runs anywhere
1. Safe: Relays can only trigger the run
2. Fast
3. Scalable
45.
46. Cool things: sharefile
u Instant File copies everywhere
1. N:N copy between nodes
2. centrally managed
3. Quite fast - can dropJRE on 60 nodes in 5 minutes
4. Might not be the recommended use case J
5. Effect?
48. Cool things: Ansible inventory
u Let‘s make a faster Ansible!
1. Use Rudder‘s automagic groups, avoid gathers & complex grouping
2. Use Ansible for deployment of unsafe applications
3. One-shot character
4. but build Rules so Rudder can fix
u Also Plugins for: Rundeck, Cobbler, Centreon & some more?
49. Cool things: ARM Agent
u Very fresh, but not raw! Debian/Ubuntu
u Tested:
?!!!!
ARMHF AARCH64 Thunder X2
50. Roadmap
u Right now development is too fast to follow (for me)
u Both minors and majors can introduce shiny things
u Majors API changes, heavy lifting features
51. Closing
This was my experience, I am happy with Rudder
u Pretty stable
u darn fast
u always there to save me
You could
u check out www.rudder-project.org
u Test it and give feedback
u Vagrant Box: rudder-vagrant @ GitHub