OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?

OSIS 2019
THE OPEN SOURCE
INNOVATION SPRING 2019
@nico_charles
nicolas@rudder.io
Qu’apporte l’observabilité à la
gestion de conﬁguration ?

OSIS 2019How are the systems?
Does no error nor change in logs mean success?
Aren’t we missing something?

OSIS 2019Definition
Configuration management is a systems
engineering process for establishing and
maintaining consistency of a product [...]
throughout its life.
Configuration_management
“

OSIS 2019Let's remember: What does configuration management do?
configuration
target state
feedbackconfiguration

OSIS 2019Let's remember: What does conﬁguration management do?
conﬁguration
target state

OSIS 2019Main challenges faced nowadays
DEV QA PRODUCTION RECOVERY
DEV SEC OPSMGMT EXTERN
Multiple teams, diluted expertise, harder reporting
Heterogeneous systems, reduced visibility, ease of use and understanding

OSIS 2019Getting and understanding the info is complex
Operators, Managers, Experts, APIs have differents needs
Frustration when we need a third party to obtain relevant data
We mistrust what we don’t understand

OSIS 2019Getting and understanding the info is complex
Putting errors into perspective:
Error can be expected
Error in production can have catastrophic consequences

OSIS 2019Deﬁnition (again)
Observability is a measure of how well
internal states of a system can be inferred
from knowledge of its external outputs.
Observability
“

OSIS 2019Monitoring VS Observability: having a factual & deep insight
monitoring observabilityVS

OSIS 2019Why we need Observability in Configuration Management?
Causality AgencyPerspective
trust and prove
configuration states
provide insights
relevant to different needs
help teams find
the best levers
for their job
A
B

OSIS 2019Observability adoption
Databases
Built-in facilities
Tooling ecosystem to extract knowledge

OSIS 2019Observability adoption
Software
Legacy: embedding agent (often proprietary)
New developments:
Best practices
Open standards
Architectural bricks

OSIS 2019Let’s take an implementation example...

OSIS 2019These concepts are core to Rudder
Everyone/thing can be an actor of conﬁguration management
"rules": [
{
"id": "32377fd7-02fd-43d0-aab7-28460a91
"name": "Security rules - baseline",
"compliance": 100,
"mode": "full-compliance",
"complianceDetails": {
"successAlreadyOK": 87.47,
"successNotApplicable": 12.53
},

OSIS 2019Compliance?
PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
RUDDER conﬁg
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
Environmental context
● Id : . . .
● Generated : . . .
Files
Node conﬁguration
Change request
Historisation
Historization
Event logs

RUDDER conﬁg
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
● Id : . . .
Files
Node conﬁguration
Change request
Historisation
Event logs
PARAM
RULE
● Id
● Groups + Directives
DIRECTIVE
● Id
● Components
GROUP
● Id
Historization

PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
RUDDER conﬁg
(global)
● Policy Mode
● Schedule
NODE
● Properties
● Policy Mode
● Schedule
● Id : . . .
Files
Node conﬁguration
Change request
Historisation
Historization Event logs

● Id : . . .
Files
Node configuration
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C

● Id : . . .
Files
Node configuration
Run reports
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get Policy
Send configuration
reports
Expected reports
node id
config id
timestamp
end of validity
Historization
Compliance
historized
Store expected reports
Metadata
● Integrity
● Signature
Config
● Id
● For Rule R,
Directive D1,
Component C

OSIS 2019Causality and dependencies of events
Why would we need it?
● We have logs
● We have experts

Diagnostic on infrastructures is hard
● Many systems
● Dependencies across systems
● Many actors involved
An issue on one component can impact hundred systems
We need to separate the causes from the symptoms

Monitoring can only correlate
Events happen on the whole infrastructure
Causes and precedences help root cause analysis

OSIS 2019Event sourcing & Tracing
Terminology (Dapper & OpenTracing)
Trace: Description of a “transaction” as it moves through systems
Span: Named and timed operation, piece of workﬂow (+ tags and logs)
Span context: Trace information that accompanies the transaction

What’s in a span?
Operation name
Start & end timestamps
Tags: Set of key:value
Logs: Set of key:value
SpanContext

Temporal relationships between Spans in a single Trace
https://www.jaegertracing.io/docs/1.9/architecture/

Configuration Management: What would be the traces?
Defining the infrastructure state is a trace
Each changes before validation is a span
Validating results in a change request closes the trace
Computing the nodes configurations is a trace
Computing targets, overrides and generating files are spans
Closes with the serialization of the nodes configurations in database
Each run on an node is a trace
Each configuration check is a span

PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
Environmental
context
● Id : . . .
Files
Node configuration
Commit Id
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get config
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historisation
Compliance
historised
Store expected reportsMetadata
● Integrity
● CommitId
● Signature
Config
● For Rule R,
Directive D1,
Component C
Event logs
Change request
Defining state
Trace + Spans
Trace
Run: Trace
Each step: span
Message
bus
Message
bus

Store Traces & Events:
● Integrate with systems in place
● Many tools are compatible with OpenTracing
Correlate with non-observable systems

OSIS 2019What to do of these billions events?
Reactive approach
Query, search and analyze traces in case of problems
Proactive approach
Process mining: Machine Learning on these events
Detect unusual behaviours
Outliers
Inconsistencies across systems

OSIS 2019Closing thoughts
Mark Burgess
Founder of Conﬁguration Management
http://markburgess.org/anomalies.html

OSIS 2019
THE OPEN SOURCE
INNOVATION SPRING 2019
@nico_charles
nicolas@rudder.io
Thank you !
Any questions ?

OSIS 2019Security?
Events, trace and logs hold critical data
Within a simple system, security can be built-in
AuthN/AuthZ
For distributed system, it’s much harder
Who can see what?
Who deﬁnes and enforces the authorizations?
Partial visibility of events/traces
Tags on events for authorizations

OSIS 2019What can we do with observability in conﬁguration mgmt?

OSIS 2019How DevSecOps can help to understand?
Culture
AutomationShare
Measure

OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?

Recommended

Recommended

More Related Content

Similar to OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?

Similar to OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ? (20)

More from RUDDER

More from RUDDER (20)

Recently uploaded

Recently uploaded (20)

OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?