What is observability and how is it different from traditional monitoring? How do we effectively monitor and debug complex, elastic microservice architectures? In this interactive discussion, we’ll answer these questions. We’ll also introduce the idea of an “observability pipeline” as a way to empower teams following DevOps practices. Lastly, we’ll demo cloud-native observability tools that fit this “observability pipeline” model, including Fluentd, OpenTracing, and Jaeger.
16. @tyler_treat
Data Available
Understanding
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
17. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
18. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
19. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
FACTS
20. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
FACTS
HYPOTHESES
21. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
ASSUMPTIONS FACTS
HYPOTHESES
22. @tyler_treat
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
DISCOVERIES
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
ASSUMPTIONS FACTS
HYPOTHESES
23. @tyler_treat
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
DISCOVERIES
Data Available
Understanding
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
HYPOTHESES
MonitoringObservability
24. @tyler_treat
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
DISCOVERIES
Data Available
Understanding
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
HYPOTHESES
TestingExploring
26. @tyler_treat
Some
challenges…
Observability Data
application logs
system logs
audit logs
application metrics
distributed traces
events
- Locked up inside a single vendor’s solution
- Not readily available across the enterprise
(or in some cases, too readily available)
- Many tools and products needed for
different data and use cases
- Tool and data needs vary from team to
team
- Ever-changing landscape of tools, products,
and services
- Sheer volume of data can be overwhelming
73. @tyler_treat
Evolving to an Observability Pipeline
• Adopt structured logging
• Move log/data collection out of process
• Use a centralized logging system
• Introduce a streaming data solution
• Start adding data consumers