SlideShare a Scribd company logo
Fernando Mayo
CTO & co-founder, Undefined Labs
Testing in a distributed systems world
About me
Previously
Currently
@fernandomayo
fernando@undefinedlabs.com
Agenda
• Why testing?
• Why is testing microservices so hard?
• Let’s test in production!
• Using distributed context propagation
Debugging an issue in production using
observability is great
… but it should be our last resort!
Cost of solving an issue
• Detection cost
• Troubleshooting cost
• Fixing cost
• Verification cost
• User impact cost
• Engineering burnout cost
👩💻 Developer End user 👨💼
💵
Testing is about reducing the risk
of your application not performing as expected,
at the lowest possible cost
We want applications to be
robust, performant and correct
Testing monoliths
Pre-production:
Robust: test your few known failure modes
Performant: benchmark, load, stress tests
Correct: unit, integration, end-to-end tests
Production:
Monitoring to detect issues (error, latency)
Logging to troubleshoot them
Database
Backend
UI
Testing microservices
Pre-production:
We test each individual service in isolation
Using mocks, contract tests, traffic replays…
Production:
Tracing to troubleshoot issues
Wide divergence
Testing in production
If we can no longer replicate production…
# of services
# of third-party APIs
Data
Traffic patterns
Configuration
Scale
Release cadence
OS kernel version
Service mesh
Network latency
DNS records
SSL certificates
Backups
Monitoring
…let’s test in production!
Serverless
Testing in production
❌ It is NOT a replacement for pre-production testing
✅ It is another testing technique to add to your toolbox
✅ It requires engineering investment to do it right
Types of tests in production
• Integration testing
• End-to-end testing
• Shadowing/traffic mirroring
• Canary deployments
• Feature flags
• Chaos engineering
• API testing/Real user monitoring
Deploy
Operate
Release
Risks of testing in production
• User impact
• State poisoning
• Traffic saturation
• Telemetry data skew
• Misfired alerts
The application needs to be
aware of tests being
performed in production
Test tenancy
End user
Tests
• Test label is propagated across services per-request
• Services and routing layer are aware of test tenancy
Risks of testing in production
• User impact
• State poisoning
• Traffic saturation
• Telemetry data skew
• Misfired alerts
Test before releasing
Separate writes to datastores
Implement QoS based on test label
Mark telemetry with test label
Exclude test telemetry from alerts
Context propagation allows developers to attach
arbitrary metadata to the current request
that will be propagated automatically
to all downstream dependencies
Context propagation
✅ It comes for “free” with tracing
✅ Developer-friendly API
✅ Read/write at any point in the request
✅ Compatible with threads and co-routines
✅ Compatible with multiple sync and async protocols
⚠ Increases request size
Context propagation
Name “Baggage” “Tags” “DistributedContext”
Definition Key (string): Value (string)
TagKey (string): TagValue (string)
+ TagMetadata
EntryKey (string): EntryValue (string)
+ EntryMetadata
Serialization Via Tracer.Extract/Tracer.Inject Via OpenCensus plugins Via OpenTelemetry API
Text-based format
(e.g. HTTP)
Tracer-specific Varies Uses W3C Correlation Context
Binary format
(e.g. gRPC)
Tracer-specific Varies
Own binary format similar to
W3C Correlation Context
Context propagation
traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
Version 32-bit trace ID 16-bit parent span ID Trace flags
Vendor ID Vendor-specific payload
Correlation-Context: tenancy=test;ttl=-1,user.id=3
User-defined key User-defined value Property
Trace Context (W3C Candidate Recommendation)
Correlation Context (W3C Editor’s Draft)
Context propagation
Already used for:
• Tracing information
• Sampling information
But we can also use it for:
• Test traffic label
• Fault injection instructions
• User account information
• Feature flags
Example: integration testing
Proxy
svc v1
svc v1
svc v1
svc v2
End user
Tests
Downstream
services
Correlation-Context: tenancy=test
Instrumenting tests
func TestIntegration(t *testing.T) {
span, ctx := opentracing.StartSpanFromContext(context.Background(), t.Name())
defer span.Finish()
span.SetBaggageItem(“tenancy", "test")
// ...
}
With OpenTracing:
func TestIntegration(t *testing.T) {
tracer := global.TraceProvider().GetTracer("")
ctx := distributedcontext.NewContext(context.Background(), key.String("tenancy", "test"))
ctx, span := tracer.Start(ctx, t.Name())
defer span.End()
// ...
}
With OpenTelemetry:
Managing state
datastore
datastore
datastore
Multi-tenant service
Multi-tenant datastore
Multi-tenant service
Single-tenant datastores
service
service
Single-tenant services
Single-tenant datastores
datastore
datastore
service
service
datastore
Multi-tenant service
Multi-tenant datastore
service
End users
Tests
End users
Tests
End users
Tests
End users
Tests
Managing telemetry data
// Init measure
meter := global.MeterProvider().GetMeter("")
tenancyKey := key.New("tenancy")
measure := meter.NewInt64Measure("myMeasure", metric.WithKeys(tenancyKey))
// Extract tenancy from distributed context
var labels []core.KeyValue
if tenancyValue, ok := distributedcontext.FromContext(ctx).Value("tenancy"); ok {
labels = append(labels, core.KeyValue{Key: tenancyKey, Value: tenancyValue})
}
// Attach labels to measurement
measure.Record(ctx, 123, meter.Labels(labels...))
With OpenTelemetry:
Managing other side effects
// Check if current request belongs to test tenancy
func inTesting(ctx context.Context) bool {
value, ok := distributedcontext.FromContext(ctx).Value("tenancy")
return ok && value == core.String("test")
}
With OpenTelemetry:
Examples:
• Implementing multi-tenant storage
• Using sandbox accounts from third-party services
Example: shadow e2e testing
Proxy
svc v1
svc v1
svc v1
svc v2
Downstream
services
Correlation-Context: tenancy=test,svc.target=v2
End user
Tests
Example: fault injection testing
Proxy
svc v1
svc v1
svc v1
Correlation-Context: tenancy=test,svc.fault.http.delay=10s
🕘
End user
Tests
Downstream
services
Example: feature flagging
Proxy
svc v1
svc v1
svc v1
Correlation-Context: tenancy=test,svc.feature1.enabled=true
End user
Tests
Downstream
services
Example: test accounts
Proxy
svc v1
svc v1
svc v1
Correlation-Context: tenancy=test,user.kind=test
Auth serviceEnd user
Downstream
services
Consequences
• No need to replicate the entire application stack anywhere
• Locally, on CI or staging
• Segregated telemetry allows us to monitor and troubleshoot tests
• Same tools and visibility as with production traffic
• We can add other types of traffic to our application
• Examples: “sandbox”, “development” traffic
Key takeaways
• Let’s catch issues as early as possible through proactive testing
• Testing in production can be the most efficient way to test complex
systems
• We should design our applications to allow safely testing in production
• Let’s make use of context propagation and make the most of our
observability instrumentation
Thank you!
fernando@undefinedlabs.com
@fernandomayo

More Related Content

What's hot

Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
DevOps.com
 
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Kevin Brockhoff
 
Performance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability LawPerformance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability Law
Kevin Brockhoff
 
Observability and more architecture next 2020
Observability and more   architecture next 2020Observability and more   architecture next 2020
Observability and more architecture next 2020
Alon Fliess
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Tonny Adhi Sabastian
 
Security of OpenDaylight platform
Security of OpenDaylight platformSecurity of OpenDaylight platform
Security of OpenDaylight platform
OpenDaylight
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
Itiel Shwartz
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
Hemant Kumar
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
Chandresh Pancholi
 
SFScon21 - Andrea Antonello - Integrated modeling with k.LAB
SFScon21 - Andrea Antonello - Integrated modeling with k.LABSFScon21 - Andrea Antonello - Integrated modeling with k.LAB
SFScon21 - Andrea Antonello - Integrated modeling with k.LAB
South Tyrol Free Software Conference
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
Oracle Korea
 
Software cracking and patching
Software cracking and patchingSoftware cracking and patching
Software cracking and patching
Mayank Gavri
 
Testing strategies in microservices
Testing strategies in microservicesTesting strategies in microservices
Testing strategies in microservices
GeekNightHyderabad
 
Business metrics visualization with Grafana and Prometheus
Business metrics visualization with Grafana and PrometheusBusiness metrics visualization with Grafana and Prometheus
Business metrics visualization with Grafana and Prometheus
Vasilis Stergioulis
 
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Reshmi Krishna
 
Monitoring Cloud Native Applications with Prometheus
Monitoring Cloud Native Applications with PrometheusMonitoring Cloud Native Applications with Prometheus
Monitoring Cloud Native Applications with Prometheus
Jacopo Nardiello
 
OpenDaylight VTN Policy
OpenDaylight VTN PolicyOpenDaylight VTN Policy
OpenDaylight VTN Policy
NEC Corporation
 
Reactive Programming In Java Using: Project Reactor
Reactive Programming In Java Using: Project ReactorReactive Programming In Java Using: Project Reactor
Reactive Programming In Java Using: Project Reactor
Knoldus Inc.
 
Netw 208 Success Begins / snaptutorial.com
Netw 208  Success Begins / snaptutorial.comNetw 208  Success Begins / snaptutorial.com
Netw 208 Success Begins / snaptutorial.com
WilliamsTaylor65
 
Reactive Performance Testing
Reactive Performance TestingReactive Performance Testing
Reactive Performance Testing
Lilit Yenokyan
 

What's hot (20)

Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
 
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
 
Performance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability LawPerformance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability Law
 
Observability and more architecture next 2020
Observability and more   architecture next 2020Observability and more   architecture next 2020
Observability and more architecture next 2020
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
Security of OpenDaylight platform
Security of OpenDaylight platformSecurity of OpenDaylight platform
Security of OpenDaylight platform
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 
SFScon21 - Andrea Antonello - Integrated modeling with k.LAB
SFScon21 - Andrea Antonello - Integrated modeling with k.LABSFScon21 - Andrea Antonello - Integrated modeling with k.LAB
SFScon21 - Andrea Antonello - Integrated modeling with k.LAB
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
 
Software cracking and patching
Software cracking and patchingSoftware cracking and patching
Software cracking and patching
 
Testing strategies in microservices
Testing strategies in microservicesTesting strategies in microservices
Testing strategies in microservices
 
Business metrics visualization with Grafana and Prometheus
Business metrics visualization with Grafana and PrometheusBusiness metrics visualization with Grafana and Prometheus
Business metrics visualization with Grafana and Prometheus
 
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
 
Monitoring Cloud Native Applications with Prometheus
Monitoring Cloud Native Applications with PrometheusMonitoring Cloud Native Applications with Prometheus
Monitoring Cloud Native Applications with Prometheus
 
OpenDaylight VTN Policy
OpenDaylight VTN PolicyOpenDaylight VTN Policy
OpenDaylight VTN Policy
 
Reactive Programming In Java Using: Project Reactor
Reactive Programming In Java Using: Project ReactorReactive Programming In Java Using: Project Reactor
Reactive Programming In Java Using: Project Reactor
 
Netw 208 Success Begins / snaptutorial.com
Netw 208  Success Begins / snaptutorial.comNetw 208  Success Begins / snaptutorial.com
Netw 208 Success Begins / snaptutorial.com
 
Reactive Performance Testing
Reactive Performance TestingReactive Performance Testing
Reactive Performance Testing
 

Similar to Testing in a distributed world

Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with Prometheus
Weaveworks
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
aragozin
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
Testplant
 
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
NETUsergroupZentrals
 
Microservice Automated Testing on Kubernetes
Microservice Automated Testing on KubernetesMicroservice Automated Testing on Kubernetes
Microservice Automated Testing on Kubernetes
Shane Galvin
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017
Idit Levine
 
Software Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalSoftware Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalwww.pixelsolutionbd.com
 
Opencensus with prometheus and kubernetes
Opencensus with prometheus and kubernetesOpencensus with prometheus and kubernetes
Opencensus with prometheus and kubernetes
Jinwoong Kim
 
13090016_vectorcast.ppt
13090016_vectorcast.ppt13090016_vectorcast.ppt
13090016_vectorcast.ppt
Karthika Keshav
 
Software testing: an introduction - 2017
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017
XavierDevroey
 
1,2,3 … Testing : Is this thing on(line)? with Mike Martin
1,2,3 … Testing : Is this thing on(line)? with Mike Martin1,2,3 … Testing : Is this thing on(line)? with Mike Martin
1,2,3 … Testing : Is this thing on(line)? with Mike Martin
NETUserGroupBern
 
Testing Microservices
Testing MicroservicesTesting Microservices
Testing Microservices
Nagarro
 
OpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesOpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and Kubernetes
Jinwoong Kim
 
How to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldHow to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based World
Ken Owens
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
CIVEL Benoit
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1CIVEL Benoit
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
Boyan Dimitrov
 
CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019
Olivera Milenkovic
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahu
Dr. Prakash Sahu
 

Similar to Testing in a distributed world (20)

Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with Prometheus
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
 
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
 
Microservice Automated Testing on Kubernetes
Microservice Automated Testing on KubernetesMicroservice Automated Testing on Kubernetes
Microservice Automated Testing on Kubernetes
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017
 
Software Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalSoftware Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_final
 
Opencensus with prometheus and kubernetes
Opencensus with prometheus and kubernetesOpencensus with prometheus and kubernetes
Opencensus with prometheus and kubernetes
 
13090016_vectorcast.ppt
13090016_vectorcast.ppt13090016_vectorcast.ppt
13090016_vectorcast.ppt
 
Software testing: an introduction - 2017
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017
 
1,2,3 … Testing : Is this thing on(line)? with Mike Martin
1,2,3 … Testing : Is this thing on(line)? with Mike Martin1,2,3 … Testing : Is this thing on(line)? with Mike Martin
1,2,3 … Testing : Is this thing on(line)? with Mike Martin
 
Testing Microservices
Testing MicroservicesTesting Microservices
Testing Microservices
 
OpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesOpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and Kubernetes
 
How to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldHow to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based World
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
 
CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahu
 

Recently uploaded

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 

Recently uploaded (20)

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 

Testing in a distributed world

  • 1.
  • 2. Fernando Mayo CTO & co-founder, Undefined Labs Testing in a distributed systems world
  • 4. Agenda • Why testing? • Why is testing microservices so hard? • Let’s test in production! • Using distributed context propagation
  • 5. Debugging an issue in production using observability is great … but it should be our last resort!
  • 6. Cost of solving an issue • Detection cost • Troubleshooting cost • Fixing cost • Verification cost • User impact cost • Engineering burnout cost 👩💻 Developer End user 👨💼 💵
  • 7. Testing is about reducing the risk of your application not performing as expected, at the lowest possible cost We want applications to be robust, performant and correct
  • 8. Testing monoliths Pre-production: Robust: test your few known failure modes Performant: benchmark, load, stress tests Correct: unit, integration, end-to-end tests Production: Monitoring to detect issues (error, latency) Logging to troubleshoot them Database Backend UI
  • 9. Testing microservices Pre-production: We test each individual service in isolation Using mocks, contract tests, traffic replays… Production: Tracing to troubleshoot issues Wide divergence
  • 10. Testing in production If we can no longer replicate production… # of services # of third-party APIs Data Traffic patterns Configuration Scale Release cadence OS kernel version Service mesh Network latency DNS records SSL certificates Backups Monitoring …let’s test in production! Serverless
  • 11. Testing in production ❌ It is NOT a replacement for pre-production testing ✅ It is another testing technique to add to your toolbox ✅ It requires engineering investment to do it right
  • 12. Types of tests in production • Integration testing • End-to-end testing • Shadowing/traffic mirroring • Canary deployments • Feature flags • Chaos engineering • API testing/Real user monitoring Deploy Operate Release
  • 13. Risks of testing in production • User impact • State poisoning • Traffic saturation • Telemetry data skew • Misfired alerts The application needs to be aware of tests being performed in production
  • 14. Test tenancy End user Tests • Test label is propagated across services per-request • Services and routing layer are aware of test tenancy
  • 15. Risks of testing in production • User impact • State poisoning • Traffic saturation • Telemetry data skew • Misfired alerts Test before releasing Separate writes to datastores Implement QoS based on test label Mark telemetry with test label Exclude test telemetry from alerts
  • 16. Context propagation allows developers to attach arbitrary metadata to the current request that will be propagated automatically to all downstream dependencies
  • 17. Context propagation ✅ It comes for “free” with tracing ✅ Developer-friendly API ✅ Read/write at any point in the request ✅ Compatible with threads and co-routines ✅ Compatible with multiple sync and async protocols ⚠ Increases request size
  • 18. Context propagation Name “Baggage” “Tags” “DistributedContext” Definition Key (string): Value (string) TagKey (string): TagValue (string) + TagMetadata EntryKey (string): EntryValue (string) + EntryMetadata Serialization Via Tracer.Extract/Tracer.Inject Via OpenCensus plugins Via OpenTelemetry API Text-based format (e.g. HTTP) Tracer-specific Varies Uses W3C Correlation Context Binary format (e.g. gRPC) Tracer-specific Varies Own binary format similar to W3C Correlation Context
  • 19. Context propagation traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01 tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE Version 32-bit trace ID 16-bit parent span ID Trace flags Vendor ID Vendor-specific payload Correlation-Context: tenancy=test;ttl=-1,user.id=3 User-defined key User-defined value Property Trace Context (W3C Candidate Recommendation) Correlation Context (W3C Editor’s Draft)
  • 20. Context propagation Already used for: • Tracing information • Sampling information But we can also use it for: • Test traffic label • Fault injection instructions • User account information • Feature flags
  • 21. Example: integration testing Proxy svc v1 svc v1 svc v1 svc v2 End user Tests Downstream services Correlation-Context: tenancy=test
  • 22. Instrumenting tests func TestIntegration(t *testing.T) { span, ctx := opentracing.StartSpanFromContext(context.Background(), t.Name()) defer span.Finish() span.SetBaggageItem(“tenancy", "test") // ... } With OpenTracing: func TestIntegration(t *testing.T) { tracer := global.TraceProvider().GetTracer("") ctx := distributedcontext.NewContext(context.Background(), key.String("tenancy", "test")) ctx, span := tracer.Start(ctx, t.Name()) defer span.End() // ... } With OpenTelemetry:
  • 23. Managing state datastore datastore datastore Multi-tenant service Multi-tenant datastore Multi-tenant service Single-tenant datastores service service Single-tenant services Single-tenant datastores datastore datastore service service datastore Multi-tenant service Multi-tenant datastore service End users Tests End users Tests End users Tests End users Tests
  • 24. Managing telemetry data // Init measure meter := global.MeterProvider().GetMeter("") tenancyKey := key.New("tenancy") measure := meter.NewInt64Measure("myMeasure", metric.WithKeys(tenancyKey)) // Extract tenancy from distributed context var labels []core.KeyValue if tenancyValue, ok := distributedcontext.FromContext(ctx).Value("tenancy"); ok { labels = append(labels, core.KeyValue{Key: tenancyKey, Value: tenancyValue}) } // Attach labels to measurement measure.Record(ctx, 123, meter.Labels(labels...)) With OpenTelemetry:
  • 25. Managing other side effects // Check if current request belongs to test tenancy func inTesting(ctx context.Context) bool { value, ok := distributedcontext.FromContext(ctx).Value("tenancy") return ok && value == core.String("test") } With OpenTelemetry: Examples: • Implementing multi-tenant storage • Using sandbox accounts from third-party services
  • 26. Example: shadow e2e testing Proxy svc v1 svc v1 svc v1 svc v2 Downstream services Correlation-Context: tenancy=test,svc.target=v2 End user Tests
  • 27. Example: fault injection testing Proxy svc v1 svc v1 svc v1 Correlation-Context: tenancy=test,svc.fault.http.delay=10s 🕘 End user Tests Downstream services
  • 28. Example: feature flagging Proxy svc v1 svc v1 svc v1 Correlation-Context: tenancy=test,svc.feature1.enabled=true End user Tests Downstream services
  • 29. Example: test accounts Proxy svc v1 svc v1 svc v1 Correlation-Context: tenancy=test,user.kind=test Auth serviceEnd user Downstream services
  • 30. Consequences • No need to replicate the entire application stack anywhere • Locally, on CI or staging • Segregated telemetry allows us to monitor and troubleshoot tests • Same tools and visibility as with production traffic • We can add other types of traffic to our application • Examples: “sandbox”, “development” traffic
  • 31. Key takeaways • Let’s catch issues as early as possible through proactive testing • Testing in production can be the most efficient way to test complex systems • We should design our applications to allow safely testing in production • Let’s make use of context propagation and make the most of our observability instrumentation