Nicolas Charles, CfgMgmtCamp 2019.
More and more services expose their state, internal details and metrics to be observable, and improve overall quality of service.
But what about observing the infrastructure they are deployed, configured and maintained on?
What can we learn from that, and what do we need from configuration management to get these features and metrics?
Logs from installation is a good start, but they need centralization, aggregation and especially knowledge derivation from these - but also we need to observe these features over time, to trace changes, and correlate them with monitoring.
Rudder was built around the predicate that all actions of the configuration agent need to be traced, centralized and exposed in a meaningful way - with agents ensuring the continuous configuration of systems, and this talk will show the rationale behind this predicate, how we implemented this solution, and the benefits of this approach for the modern IT world.
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
What uses for observing operations of Configuration Management?
1. rudder.io
What uses for observing operations of
Configuration Management?
Nicolas CHARLES
nicolas@rudder.io - @nico_charles 1
2. Are we really looking at logs?
2
I’m sure everyone here does, but...
3. No error nor change in logs means success?
3
Aren’t we missing something?
4. Getting and understanding the info is complex
4
Operators, Managers, Experts, APIs have differents needs
Frustration if we need a third party to get data
We mistrust what we don’t understand
5. Getting and understanding the info is complex
Putting errors into perspective
Errors can be expected
Errors in production can have catastrophic consequences
Errors in a Vagrant VM is much less critical
10. These concepts are core to Rudder
Everyone/thing can be an actor of configuration management
11. These concepts are core to Rudder
Technique
A set of operations & configurations to reach a state
With variables for configuration
Created by experts
13. These concepts are core to Rudder
Directive
Technique + Parameters
Defines how services must be managed
Driven by business needs, managed by admins or APIs
14. These concepts are core to Rudder
Rule
The application of Directive(s) to Group(s)
Defines the targets of the Directive(s)
Higher approach of services, managed by admins or APIs
15. Each can focus on what is relevant
15
Operators
Security Experts
16. Each can focus on what is relevant
16
Managers
APIs
"rules": [
{
"id": "32377fd7-02fd-43d0-aab7-28460a91347b",
"name": "Security rules - baseline",
"compliance": 100,
"mode": "full-compliance",
"complianceDetails": {
"successAlreadyOK": 87.47,
"successNotApplicable": 12.53
},
"directives": [
{
"id": "c16e3a90-b9d7-427d-83c1-d80e33124e4c",
"name": "CIS Benchmark 2.1.6 - rsh",
"compliance": 100.0,
"complianceDetails": {
"successAlreadyOK": 100.00
}
17. What is this compliance?
PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
RUDDER config
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
Environmental context
● Id : . . .
● Generated : . . .
Files
Node configuration
Change request
Historisation
Historization
Event logs
18. What is this compliance?
RUDDER config
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
Environmental context
● Id : . . .
● Generated : . . .
Files
Node configuration
Change request
Historisation
Event logs
PARAM
RULE
● Id
● Groups + Directives
DIRECTIVE
● Id
● Components
GROUP
● Id
Historization
19. What is this compliance?
PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
RUDDER config
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
Environmental context
● Id : . . .
● Generated : . . .
Files
Node configuration
Change request
Historisation
Historization Event logs
20. What is this compliance?
PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
RUDDER config
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
Environmental context
● Id : . . .
● Generated : . . .
Files
Node configuration
Change request
Historisation
Historization
Event logs
21. What is this compliance?
PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
RUDDER config
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
Environmental context
● Id : . . .
● Generated : . . .
Files
Node configuration
Change request
Historisation
Historization
Event logs
22. What is this compliance?
22
● Id : . . .
● Generated : . . .
Files
Node configuration
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C
23. What is this compliance?
23
● Id : . . .
● Generated : . . .
Files
Node configuration
Run reports
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
node id
config id
timestamp
end of validity
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C
24. What is this compliance?
24
● Id : . . .
● Generated : . . .
Files
Node configuration
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C
25. What is this compliance?
25
● Id : . . .
● Generated : . . .
Files
Node configuration
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C
26. What is this compliance?
26
● Id : . . .
● Generated : . . .
Files
Node configuration
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C
27. Make information available
27
A lot information from inside Rudder, usable in Rudder context
Details of each run (timestamped info)
Policy generation details
Serialization of configurations
Inventories
...
30. Causality and dependencies of events
30
Diagnostic on infrastructures is hard
● Many systems
● Dependencies across systems
● Many actors involved
An issue on one component can impact hundred systems
We need to separate the causes from the symptoms
31. Causality and dependencies of events
31
Monitoring can only correlate
Causes and precedences help root cause analysis
33. Event sourcing & Tracing
33
Events happen on the whole infrastructure
Describe and analyze over systems
Order events
Contextualize
34. Event sourcing & Tracing
34
Terminology (Dapper & OpenTracing)
Trace: Description of a “transaction” as it moves through systems
Span: Named and timed operation, piece of workflow (+ tags and logs)
Span context: Trace information that accompanies the transaction
35. Event sourcing & Tracing
35
What’s in a span?
Operation name
Start & end timestamps
Tags: Set of key:value
Logs: Set of key:value
SpanContext
36. Event sourcing & Tracing
36
Temporal relationships between Spans in a single Trace
https://www.jaegertracing.io/docs/1.9/architecture/
37. Event sourcing & Tracing
37
What would be the traces?
Defining the infrastructure state is a trace
Each changes before validation is a span
Validating results in a change request closes the trace
Computing the nodes configurations is a trace
Computing targets, overrides and generating files are spans
Closes with the serialization of the nodes configurations in database
Each run on an node is a trace
Each configuration check is a span
38. Event sourcing & Tracing
38
RULE
● Id
DIRECTIVE
● Id
GROUP
● Id
Environmental
context
● Id : . . .
● Generated : . .
● Commit id.
Files
Node configuration
Change request
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get config
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historisation
Store expected reports
Metadata
● Integrity
● CommitId
● Signature
Config
● For Rule R,
Directive D1,
Component C
Events
Commit Id
Defining state
Trace + Spans
Trace
Run: Trace
Each step: span
Message bus
39. Event sourcing & Tracing
39
● Id : . . .
● Generated : . .
● Commit id.
Files
Node configuration
METADATA
● node id
● config id
● run timestamp
RUN
METADATA
Signature
Get config
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Store expected reports
Metadata
● Integrity
● CommitId
● Signature
Config
● For Rule R,
Directive D1,
Component C
Trace
Message bus
Run: Trace
Each step: span
Compliance
CMDB Hooks
Monitoring
40. Event sourcing & Tracing
40
Store Traces & Events:
● Integrate with systems in place
● Many tools are compatible with OpenTracing
Correlate with non-observable systems
44. Closing thoughts
44
What can we do of these billions events?
Reactive approach
Query, search and analyze traces in case of problems
45. Closing thoughts
45
What can we do of these billions events?
Proactive approach
Process mining: Machine Learning on these events
Detect unusual behaviours
Outliers
Inconsistencies across systems
48. Security?
48
Events, trace and logs hold critical data
Within a unique system, security can be built-in
AuthN/AuthZ
For distributed system, it’s much harder
Who can see what?
Who defines and enforces the authorizations?
Tags on events for authorizations
50. rudder.io
What uses for observing operations of
Configuration Management?
Nicolas CHARLES
nicolas@rudder.io - @nico_charles 50
51. Event sourcing & Tracing
51
Temporal relationships between Spans in a single Trace
––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time
[Span A···················································]
[Span B··············································]
[Span D··········································]
[Span C········································]
[Span E·······] [Span F··] [Span G··] [Span H··]
https://opentracing.io/specification/
52. Event sourcing & Tracing
52
Every components need to know the context
● Carry the Span Context along each events
Add some information for each events
● Save on logging thanks to context
Send these traces on message bus