SlideShare a Scribd company logo
>>> 5 must-have
Patterns for your web-scale
Microservices
@aliostad
Ali Kheyrollahi, ASOS
@aliostad
> stackoverflow
> £1.5 bln
global fashion
destination
> 35% every year
/// ASOS in numbers
2 0 1 6 T u r n O v e r → £1.5 bln
A c t i v e C u s t o m e r s → 12 M
N e w P r o d u c t s / w k → 4 k
U n i q u e V i s i t s / m o → 123 M
P a g e V i e w s / d a y → 95 M
P l a t f o r m T e a m s → 40
A z u r e D a t a C e n t r e s → 5
@aliostad
/// Microservices
Architecture
@aliostad
/// why microservices
> Scaling people not the solution
> Decentralising decision centres => Agility
> Frequent deployment => Agility
> Reduced complexity of each ms (Divide/Conquere) => Agility
> Cost of mistakes and bad decisions smaller ...
@aliostad
/// anecdote
Often you can measure your success in
implementing Microservice Architecture not
by the number of services you build, but by
the number you decommission.
@aliostad
/// microservices vs soa
SOA Microservices
Main Goal Architectual Decoupling Business Agility
Set out to solve Architectural Coupling
Scaling People,
Frequent Deployment
Audience Mainly Architecture Business (Everyone)
Law Conway’s Reverse Conway’s
Impact on Structure of
Organisation
Minimal Huge
Service Cardinality Usually up to a dozen >40 (Commonly >100)
When to do Always teams > ~5**
** Debateable. There are articles and discussions on this very topic
@aliostad
/// microservice challenges
> Very difficult to build a complete mental picture of solution
> When things go wrong, need to know where before why
> Potentially increased latency
> Performance outliers intractable to solve
> A complete mind-shift requiring a new operating model
@aliostad
/// probability distribution
Response Time
Probabilty
@aliostad
/// performance outliers
Microservice
A
Microservie
B
99th Percentile = 500ms 99th Percentile = 500ms
A B Total
<1s 99% 99% 98.01%
>500m 1% 99% 0.99%
>500m 99% 1% 0.99%
>1s 1% 1% 0.01%
@aliostad
/// probability dist
Service A Prob.
10ms 33%
15ms 33%
25ms 33%
Service B Prob.
15ms 33%
20ms 33%
30ms 33%
A + B Prob.
25ms 11%
30ms 11%
40ms 11%
30ms 11%
35ms 11%
45ms 11%
40ms 11%
45ms 11%
55ms 11%
X
=
66th=15ms 66th=20ms
44th=35ms
66th=40ms
@aliostad
/// Retry and
Timeout Policy
@aliostad
/// Failure
Microservice
A
1% chance of failure
X
Wait (back-off)
X
Wait (back-off longer)
Microservice
B
1% chance of failure
@aliostad
/// Preemptive Timeout
Microservice
A
X
retry
X
retry
Short timeout
Short timeout
Microservice
B
@aliostad
/// Backup requests with cross server
cancelation
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/Berkeley-
Latency-Mar2012.pdf
BRCSC Paper
Google, 2012
@aliostad
/// Timeout
C
B
A
A > B > C
A > B + C
@aliostad
/// Choosing a timeout?
Static => Based on Server SLO
Dynamic => 95th percentile
@aliostad
/// IO
Monitor
@aliostad
/// Blame Game
“If there is a single place where
you can play blame game,
instead of collective responsibility,
it is in
Microservices troubleshooting”
@aliostad
/// Did you say IO??
Microservice
DB
API
Cache
Measure...
every time your code
goes out of your process
@aliostad
/// Recording Methods
> Explicitly by calling record()
> Asking the library to record a closure
> Aspect-oriented
Java (spf4j)
private static final MeasurementRecorder recorder
= RecorderFactory.createScalableCountingRecorder(forWhat, unitOfMeasurement,
sampleTimeMillis);
…
recorder.record(measurement);
.NET (PerfIt)
var ins = new SimpleInstrumentor(new InstrumentationInfo()
{
Counters = CounterTypes.StandardCounters,
Description = "test",
InstanceName = "Test instance",
CategoryName = TestCategory
});
ins.Instrument(() => Thread.Sleep(100), "test...");
Java and .NET
@PerformanceMonitor(warnThresholdMillis=1, errorThresholdMillis=100, recorderSource =
RecorderSourceInstance.Rs5m.class)
[PerfItFilter(“PerfItTests", InstanceName = "Test")]
public string Get()
{
return Guid.NewGuid().ToString();
}
@aliostad
/// Publishing Methods
> Local file (various to logstash)
> TCP and HTTP (many, to zipkin, influxdb)
> UDP (statsd, collectd to graphite, logstash)
> Raising Kernel-level event (Windows ETW)
> Local communication (statsd)
@aliostad
/// Dapper
Dapper Paper
Google, 2010
Scalable
Transparent Low-overhead
-> Span Id
-> Parent Id
-> Trace Id
@aliostad
/// zipkin
zipkin
by Twitter
zipkin
Collector
Storage
Query
WebWebWebWebWebWebWebWebWeb
@aliostad
/// zipkin
Collector
Kafka
Scribe
EventHub*
Storage
Cassandra
MySQL
Elasticsearch
@aliostad
/// Sampling
“The first production version of Dapper used a uniform sampling
probability for all processes at Google, averaging one sampled trace for
every 1024 candidates… [however] we are in the process of deploying
an adaptive sampling scheme that is parameterized not by a uniform
sampling probability, but by a desired rate of sampled traces per unit
time.”
Dapper Paper
Zipkin samples in the collector using a strategy pattern: an
implementation of CollectorSampler abstract class.
@aliostad
/// Circuit-
Breaker
@aliostad
/// tri-state
> Closed traffic can flow normally
> Open traffic does not flow
> Half-open circuit breaker tests the waters again
Closed
Open
Half-open
Test
Failure
Wait timeout
@aliostad
/// Netflix Hysterix
RequestVolumeThreshold
ErrorThresholdPercentage
SleepWindowInMilliseconds
TimeInMilliseconds
NumBuckets
@aliostad
/// Fallback
> Custom: e.g. serve content from a local cache (status 206)
> Silent: return null/no-data/empty (status 200/204)
> Fail-fast: Customer experience is important (status 5xx)
@aliostad
/// ActivityId
Propagator
@aliostad
/// ActivityId
> Every customer request matters
> Every request is unique
> Every request creates a chain (or tree) of calls/events
> Activities are correlated
> You need an ActivityId (or CorrelationId) to link calls/events
@aliostad
/// ActivityId
Microservice
Id
IdId Thread Local Storage
Id
To Other APIs
Id
Event
@aliostad
/// ActivityId - HTTP
Request
GET /api/v2/foo HTTP/1.1
host: foo.com
activity-id: 96c5a1f106ce468ebcca8303ed7464bd
Response
200 OK
activity-id: 96c5a1f106ce468ebcca8303ed7464bd
@aliostad
/// ActivityId - .NET
// .NET 4.5.2 and below
CallContext.LogicalSetData(“activityId”, activityId)
...
var activityId = (string) CallContext.LogicalGetData(“activityId”);
// .NET 4.6 and above
private static AsyncLocal<string> _activityId = new AsyncLocal<string>();
...
_activityId.Value = “a sensible value”;
...
var activityId = _activityId.Value;
@aliostad
/// Quick Quiz
class Program
{
private static AsyncLocal<string> _activityId = new AsyncLocal<string>();
static void Main(string[] args)
{
_activityId.Value = "initial";
Console.WriteLine(_activityId.Value); // output
new Thread(() =>
{
Console.WriteLine(_activityId.Value);
_activityId.Value = "Inside Thread”; // output
}).Start();
Thread.Sleep(100);
Console.WriteLine(_activityId.Value); // output
}
}
A)
“initial”
“”
“initial”
B)
“initial”
“initial”
“initial”
C)
“initial”
“”
“Inside Thread”
D)
“initial”
“initial”
“Inside Thread”
@aliostad
/// Canary and
Health Endpoint
@aliostad
/// Health Endpoints
Ping returns a success code when invoked
Canary returns a connectivity status and
latency on the service and dependencies
“… none of them invoke any application code”
@aliostad
/// Ping
Request
GET /api/health HTTP/1.1
host: foo.com
Response
200 OK
Response
500 Server Error
@aliostad
/// Canary
Request
GET /api/canary HTTP/1.1
host: foo.com
Response
200 OK
{
[Nested Structure]
}
@aliostad
/// ChirpResult
{
"serviceName": "foo",
"latency": "00:00:00.0542172",
"statusCode": 200,
"isCritical": true
}
@aliostad
/// ChirpResult
@aliostad
/// ChirpResult - critical failure
API
NC
NC
C
200
200
500
500
@aliostad
/// ChirpResult - non-critical failure
API
NC
NC
C
500
200
200
200
@aliostad
/// AOP / Declarative (c#)
[AzureStorageCanary("Foo-AzureStorage-BarDatabaseServer", “config-key-for-cn“)]
[SqlCanary("SQL-BazActiveDatabase", null, typeof(SqlConnectionFactory))]
[CanaryEndpointCanary("Dependency-Api", “config-key-for-endpoint“)]
public class CanaryController : CanaryBaseController
{
… // some boilerplate code
}
@aliostad
/// Deep vs Shallow
API
API
“Deep”“Shallow”
/api/canary?deep=false
@aliostad
/// Wrap-up
> If you have more than ~5 teams, consider Microservices
> Logging/Monitoring/Alerting: single most important asset
> Use ActivityId Propagator to correlate (consider zipkin)
> Cloud is a jungleTM
. Without retry/timeout you won’t survive
> Monitor and measure all calls to external services (blame game)
> Protect your systems with circuit-breakers (and isolation)
> Canary helps you detect connectivity from customer view
@aliostad
Thomas Wood: Daisy Picture
Thomas Au: Thermometer Picture
Torbakhopper: Cables Picture
Dam Picture - Japan
Hsiung: Lights Picture
Health Endpoint in API Design

More Related Content

Similar to 5 must have patterns for your microservice - techorama

Similar to 5 must have patterns for your microservice - techorama (20)

Random numbers
Random numbersRandom numbers
Random numbers
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and Monitoring
 
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015Un monde où 1 ms vaut 100 M€ - Devoxx France 2015
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor Concurrency
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Is writing performant code too expensive?
Is writing performant code too expensive? Is writing performant code too expensive?
Is writing performant code too expensive?
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
 
Curator intro
Curator introCurator intro
Curator intro
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Tt subtemplates-caching
Tt subtemplates-cachingTt subtemplates-caching
Tt subtemplates-caching
 
Undefined Behavior and Compiler Optimizations (NDC Oslo 2018)
Undefined Behavior and Compiler Optimizations (NDC Oslo 2018)Undefined Behavior and Compiler Optimizations (NDC Oslo 2018)
Undefined Behavior and Compiler Optimizations (NDC Oslo 2018)
 

More from Ali Kheyrollahi

More from Ali Kheyrollahi (16)

Autonomous agents with deep reinforcement learning - Oredev 2018
Autonomous agents with deep reinforcement learning - Oredev 2018Autonomous agents with deep reinforcement learning - Oredev 2018
Autonomous agents with deep reinforcement learning - Oredev 2018
 
Buildstuff - what do you need to know about RPC comeback
Buildstuff - what do you need to know about RPC comebackBuildstuff - what do you need to know about RPC comeback
Buildstuff - what do you need to know about RPC comeback
 
Deep learning for developers - oredev
Deep learning for developers - oredevDeep learning for developers - oredev
Deep learning for developers - oredev
 
Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017
 
From Power Chord to the Power of Models - Oredev
From Power Chord to the Power of Models - OredevFrom Power Chord to the Power of Models - Oredev
From Power Chord to the Power of Models - Oredev
 
From Hard Science to Baseless Opinions - Oredev
From Hard Science to Baseless Opinions  - OredevFrom Hard Science to Baseless Opinions  - Oredev
From Hard Science to Baseless Opinions - Oredev
 
From hard science to baseless opinions
From hard science to baseless opinionsFrom hard science to baseless opinions
From hard science to baseless opinions
 
Microservice architecture at ASOS
Microservice architecture at ASOSMicroservice architecture at ASOS
Microservice architecture at ASOS
 
Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005
 
5 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 20165 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 2016
 
From power chords to the power of models
From power chords to the power of modelsFrom power chords to the power of models
From power chords to the power of models
 
5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff
 
5 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 20155 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 2015
 
5 Anti-Patterns in API Design
5 Anti-Patterns in API Design5 Anti-Patterns in API Design
5 Anti-Patterns in API Design
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIs
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCow
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 

Recently uploaded (20)

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 

5 must have patterns for your microservice - techorama