OSDC 2017 - Florian Heigl - Experiences with rudder, is it really for everyone

Rudder / Experiences
RUDDER FOR EVERYONE?

Why I‘m here
Dayjob: Freelance Sysadmin-Consultant
I like fixing things and processes
Nightjob: Fix a lot of things. Rant a lot.
Rudder Ambassador
OpenNebula community champion

Why I‘m here
Liked bleeding edge, using Ansible since 2011
(10-20-800-100-30 nodes)
Some other tools, before that, too.
Not really happy.
Too many cases of: “if our solution doesn‘t fit, you got the wrong problem(*)“
And then I tried Rudder...
(I might have a backup slide on that)

Rudder from where
u Rudder project went public in 2011
u Basic idea: „Drift assessment“
u What‘s parts of my fleet are drifting away?
u How do we best steer all of it back on course?
u This is how you avoid crashes!
u Project started by 3 long-term CM consultants
u Built on real requirements of many people

Rudder to where
Rudder claims to be config management for the masses
How does it fare?

Rudder to where
Is it really easier to use?
For whom is it easier to use?

Rudder to where
What changes if you use it – short term
u Convenience level is extreme since everything is automatic
u Base OS rebuilds get quite reproducible
u Need to think very cross-OS, helps abstract what you really wanted
u Expect you‘ll want to rebuild to improve on this
u Track what you‘re adjusting

Rudder to where
What changes if you use it -- medium-long term
u Very hands-off – satisfaction can‘t come from one-off runs anymore, but
from running tight ship all the time
u CMDB housekeeping - Ghost ships are trouble
u Continuously maintained systems get more defensible

Rudder to where
Were there undesired results?
So far, none

Rudder to where
Are there unexpected benefits?
u Naming conventions (tiny but powerful)
u Architectur-e-ing
u THE AGENT
1. an agent means no lock-out
2. things can just fix themselves

UX
What is easier now?
You don‘t even need to do most things (dynamic groups)
Having Metrics
Detecting ‘weirdness‘
Self-Fixing (Not more than glitches in the Matrix)

Rudder to where
What is still hard
u Bending to your will a tool is tricky if you try things you‘re not (yet)
supposed to. Glue is sticky & might not come out right 1st time
u Auto-acceptance
u What‘s hard everywhere else: Clusters

UX
Some question marks & dreams remain
Policy maintenance over years
(will start JSON-Diffing now)
High-end rollout clockworks
We need to build our Docker support (it‘s easy)

UX
Who benefits most?
Devs?
Ops?
Managers?

UX
„I didn‘t imagine it could be this intuitive“
-- junior project manager after about 15 minutes of introduction to Rudder

UX
Having a Web Interface can help
u visible documentation
u conformity
u differing skill levels
u large teams
u having a design
u Building bridges

Performance
u „Monitoring should not negatively impact performance“ (Oracle, 1986)
u CPU Usage?
u Disk trashing?
u Run times?

Performance
u Gets faster on (almost) each version
u 4.1 is ... fast
1. Good performance à add Features
2. Features à Perf cost
3. Cry about it à Tuning
4. Tuning à Faster than 1.

Performance
u GUI was performing OK up to 1000 nodes
u Many rewrites, much tuning
u 30x faster now
u Smooth, loads 2000 nodes in 10s via Wifi + SSH tunnel J

Performance
u What if you don‘t manage 1000s of nodes?
u What if your smallest server type has less than 512G RAM?
u Can you run the server on something normal?

Performance
u Master: 4GB good starting point, 8GB nicer
u Master: JVM + PostgreSQL + LDAP want RAM
u I combine w/ ElasticSearch + Logstash => 16GB RAM
u Don‘t combine on AWS t2.* instances. Never.

Performance
u Agent: Needs a little disk space, almost no RAM, a bit of CPU (@5min)
u Agent: Syslog traffic bursty, but can limit to „relevant“ info
u Relay (Hub): a single 2 core / 2GB Xen VM could handle 2000 nodes
u Relay (Hub): Likely put on anything down to Avoton level

Cool things: OpenSCAP
u Yes, we got that...
1. Automated OVAL fetch
2. Central Validation (OVAL = downloaded XML processed as root!)
3. Automatic Deployment
4. Autoscheduled, time-spread daily Runs
5. Automatic result collection
6. Results integrated in UI (Rudder plugin)

Cool things: Agent
Just to get that clear...
u Completely AUTONOMOUS
u Owns & Decides to run policy
u Works without master/relays
u Will likely keep policy intact forever
u ...till Cthulhu awakes at the end of time

Cool things: A skeleton
u Trivial, but can help everyone
1. Centrally manage /etc/skel
2. creates /home/$user/.ssh
3. touch authorized_keys
4. separate root skel (.vimrc, .inputrc, ...)
u /etc/skel is non-invasive luxury defaults

Cool things: Autopatching
u started autopatchings systems where I‘m allowed to
u yum hooks (post-install triggers)
u used to restart endangered OpenSSL-based services
u need some yum excludes
u just avoid halfassed desktop things like firewalld

Cool things: Monitoring
u Systems are clean enough to alert
1. Automated Agent config inc. SSH keys
2. Automated Lynis (Baselines Sec Scanner) rollout
3. Automated daily security scoring
4. Scores reported to Nagios & alerted
5. Rudder compliance also in Nagios
6. Missing OS patches also in Nagios
7. Put in Service Group/BI Rule „Compliance“

Cool things: Application setup
u Yes, you can do that...
1. Trigger via Node Properties (can be from CMDB, AWS Tags, ...)
2. Set up application stack
3. Initialize „safe“ applications (ES, Redis, ...)
4. Don‘t initialize „unsafe“ applications (PostgreSQL)

Cool things: Application setup
u But yes, you can do that...
1. Trigger via Node Properties (can be from CMDB, AWS Tags, ...)
2. Set up application stack
3. Initialize „safe“ applications (ES, Redis, ...)
4. Don‘t initialize „unsafe“ applications (PostgreSQL)

Cool things: Audit mode
u Fleet Control killer feature
1. Decide: Enforce or Report Compliance Deltas
1. Per Node
2. Per Setting
3. Per Rule
2. Query via API
3. Think, Plan, Conquer

Cool things: Relay API
u Instant Policy runs anywhere
1. Safe: Relays can only trigger the run
2. Fast
3. Scalable

Cool things: sharefile
u Instant File copies everywhere
1. N:N copy between nodes
2. centrally managed
3. Quite fast - can dropJRE on 60 nodes in 5 minutes
4. Might not be the recommended use case J
5. Effect?

Cool things: Ansible inventory
u Let‘s make a faster Ansible!
1. Use Rudder‘s automagic groups, avoid gathers & complex grouping
2. Use Ansible for deployment of unsafe applications
3. One-shot character
4. but build Rules so Rudder can fix
u Also Plugins for: Rundeck, Cobbler, Centreon & some more?

Cool things: ARM Agent
u Very fresh, but not raw! Debian/Ubuntu
u Tested:
?!!!!
ARMHF AARCH64 Thunder X2

Roadmap
u Right now development is too fast to follow (for me)
u Both minors and majors can introduce shiny things
u Majors API changes, heavy lifting features

Closing
This was my experience, I am happy with Rudder
u Pretty stable
u darn fast
u always there to save me
You could
u check out www.rudder-project.org
u Test it and give feedback
u Vagrant Box: rudder-vagrant @ GitHub

OSDC 2017 - Florian Heigl - Experiences with rudder, is it really for everyone

Recommended

Recommended

More Related Content

Similar to OSDC 2017 - Florian Heigl - Experiences with rudder, is it really for everyone

Similar to OSDC 2017 - Florian Heigl - Experiences with rudder, is it really for everyone (20)

Recently uploaded

Recently uploaded (20)

OSDC 2017 - Florian Heigl - Experiences with rudder, is it really for everyone