Fault tolerant microservices - LJC Skills Matter 4thNov2014

Christopher Batey
Christopher BateyFreelance Software Engineer
Fault tolerant 
microservices 
BSkyB 
@chbatey
@chbatey 
Who is this guy? 
● Enthusiastic nerd 
● Senior software engineer at BSkyB 
● Builds a lot of distributed applications 
● Apache Cassandra MVP
@chbatey 
Agenda 
1. Setting the scene 
○ What do we mean by a fault? 
○ What is a microservice? 
○ Monolith application vs the micro(ish) service 
2. A worked example 
○ Identify an issue 
○ Reproduce/test it 
○ Show how to deal with the issue
So… what do applications look like? 
@chbatey
So... what do systems look like now? 
@chbatey
But different things go wrong... 
@chbatey 
down 
slow network 
slow app 
2 second max 
GC :( 
missing packets
Fault tolerance 
1. Don’t take forever - Timeouts 
2. Don’t try if you can’t succeed 
3. Fail gracefully 
4. Know if it’s your fault 
5. Don’t whack a dead horse 
6. Turn broken stuff off 
@chbatey
Time for an example... 
● All examples are on github 
● Technologies used: 
@chbatey 
○ Dropwizard 
○ Spring Boot 
○ Wiremock 
○ Hystrix 
○ Graphite 
○ Saboteur
Example: Movie player service 
@chbatey 
Shiny App 
User 
Service 
Device 
Service 
Pin 
Service 
Shiny App 
Shiny App 
Shiny App 
User 
Se rUvisceer 
Service 
Device 
Service 
Play Movie
Testing microservices 
You don’t know a service is 
fault tolerant if you don’t 
test faults 
@chbatey
Isolated service tests 
Shiny App 
@chbatey 
Mocks 
User 
Device 
Pin 
service 
Acceptance Play Movie 
Test 
Prime
1 - Don’t take forever 
@chbatey 
● If at first you don’t 
succeed, don’t take 
forever to tell someone 
● Timeout and fail fast
Which timeouts? 
● Socket connection timeout 
● Socket read timeout 
@chbatey
Your service hung for 30 seconds :( 
@chbatey 
Customer 
You :(
Which timeouts? 
● Socket connection timeout 
● Socket read timeout 
● Resource acquisition 
@chbatey
Your service hung for 10 minutes :( 
@chbatey
Let’s think about this 
@chbatey
A little more detail 
@chbatey
Wiremock + Saboteur + Vagrant 
● Vagrant - launches + provisions local VMs 
● Saboteur - uses tc, iptables to simulate 
@chbatey 
network issues 
● Wiremock - used to mock HTTP 
dependencies 
● Cucumber - acceptance tests
I can write an automated test for that? 
@chbatey 
Vagrant + Virtual box VM 
Wiremock 
User Service 
Device Service 
Pin Service 
Sabot 
eur 
Play 
Movie 
Service 
Acceptance 
Test 
prime to drop traffic 
reset
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor)
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor) 
● Hystrix
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor) 
● Hystrix 
● Spring Cloud Netflix
A simple Spring RestController 
@chbatey 
@RestController 
public class Resource { 
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); 
@Autowired 
private ScaryDependency scaryDependency; 
@RequestMapping("/scary") 
public String callTheScaryDependency() { 
LOGGER.info("RestContoller: I wonder which thread I am on!"); 
return scaryDependency.getScaryString(); 
} 
}
Scary dependency 
@chbatey 
@Component 
public class ScaryDependency { 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
public String getScaryString() { 
LOGGER.info("Scary dependency: I wonder which thread I am on!"); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else { 
Thread.sleep(10000); 
return "Really slow scary string"; } 
} 
}
All on the tomcat thread 
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. 
examples.Resource - RestContoller: I wonder which thread 
I am on! 
13:07:32.896 [http-nio-8080-exec-1] INFO info.batey. 
examples.ScaryDependency - Scary dependency: I wonder 
which thread I am on! 
@chbatey
Seriously this simple now? 
@chbatey 
@Component 
public class ScaryDependency { 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
@HystrixCommand 
public String getScaryString() { 
LOGGER.info("Scary dependency: I wonder which thread I am on!"); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else { 
Thread.sleep(10000); 
return "Really slow scary string"; 
} 
} 
}
What an annotation can do... 
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. 
examples.Resource - RestController: I wonder which 
thread I am on! 
13:07:32.896 [hystrix-ScaryDependency-1] INFO info. 
batey.examples.ScaryDependency - Scary Dependency: I 
wonder which thread I am on! 
@chbatey
Timeouts take home 
● You can’t use network level timeouts for 
@chbatey 
SLAs 
● Test your SLAs - if someone says you can’t, 
hit them with a stick 
● Scary things happen without network issues
2 - Don’t try if you can’t succeed 
@chbatey
Complexity 
● When an application grows in complexity it 
will eventually start sending emails 
@chbatey
Complexity 
● When an application grows in complexity it 
will eventually start sending emails contain 
queues and thread pools 
@chbatey
Don’t try if you can’t succeed 
● Executor Unbounded queues :( 
○ newFixedThreadPool 
○ newSingleThreadExecutor 
○ newThreadCachedThreadPool 
● Bound your queues and threads 
● Fail quickly when the queue / 
@chbatey 
maxPoolSize is met 
● Know your drivers
This is a functional requirement 
● Set the timeout very high 
● Use wiremock to add a large delay to the 
@chbatey 
requests 
● Set queue size and thread pool size to 1 
● Send in 2 requests to use the thread and fill 
the queue 
● What happens on the 3rd request?
3 - Fail gracefully 
@chbatey
Expect rubbish 
● Expect invalid HTTP 
● Expect malformed response bodies 
● Expect connection failures 
● Expect huge / tiny responses 
@chbatey
Testing with Wiremock 
@chbatey 
stubFor(get(urlEqualTo("/dependencyPath")) 
.willReturn(aResponse() 
.withFault(Fault.MALFORMED_RESPONSE_CHUNK))); 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "RANDOM_DATA_THEN_CLOSE" 
} 
} 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "EMPTY_RESPONSE" 
} 
}
4 - Know if it’s your fault 
@chbatey
What to record 
● Metrics: Timings, errors, concurrent 
incoming requests, thread pool statistics, 
connection pool statistics 
● Logging: Boundary logging, elasticsearch / 
@chbatey 
logstash 
● Request identifiers
Graphite + Codahale 
@chbatey
@chbatey 
Response times
Separate resource pools 
● Don’t flood your dependencies 
● Be able to answer the questions: 
○ How many connections will 
you make to dependency X? 
○ Are you getting close to your 
@chbatey 
max connections?
So easy with Dropwizard + Hystrix 
@Override 
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { 
HystrixCodaHaleMetricsPublisher metricsPublisher 
= new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()) 
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); 
@chbatey 
} 
metrics: 
reporters: 
- type: graphite 
host: 192.168.10.120 
port: 2003 
prefix: shiny_app
5 - Don’t whack a dead horse 
@chbatey 
Shiny App 
User 
Service 
Device 
Service 
Pin 
Service 
Shiny App 
Shiny App 
Shiny App 
User 
Se rUvisceer 
Service 
Device 
Service 
Play Movie
What to do.. 
● Yes this will happen.. 
● Mandatory dependency - fail *really* fast 
● Throttling 
● Fallbacks 
@chbatey
Circuit breaker pattern 
@chbatey
Implementation with Hystrix 
@chbatey 
@GET 
@Timed 
public String integrate() { 
LOGGER.info("I best do some integration!"); 
String user = new UserServiceDependency(userService).execute(); 
String device = new DeviceServiceDependency(deviceService).execute(); 
Boolean pinCheck = new PinCheckDependency(pinService).execute(); 
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, 
pinCheck); 
}
Implementation with Hystrix 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
@chbatey 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
}
Implementation with Hystrix 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
@chbatey 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
@Override 
public Boolean getFallback() { 
return true; 
} 
}
Triggering the fallback 
● Error threshold percentage 
● Bucket of time for the percentage 
● Minimum number of requests to trigger 
● Time before trying a request again 
● Disable 
● Per instance statistics 
@chbatey
6 - Turn off broken stuff 
● The kill switch 
@chbatey
To recap 
1. Don’t take forever - Timeouts 
2. Don’t try if you can’t succeed 
3. Fail gracefully 
4. Know if it’s your fault 
5. Don’t whack a dead horse 
6. Turn broken stuff off 
@chbatey
@chbatey 
Links 
● Examples: 
○ https://github.com/chbatey/spring-cloud-example 
○ https://github.com/chbatey/dropwizard-hystrix 
○ https://github.com/chbatey/vagrant-wiremock-saboteur 
● Tech: 
○ https://github.com/Netflix/Hystrix 
○ https://www.vagrantup.com/ 
○ http://wiremock.org/ 
○ https://github.com/tomakehurst/saboteur
Questions? 
● Thanks for listening! 
● http://christopher-batey.blogspot.co.uk/ 
@chbatey
Developer takeaways 
● Learn about TCP 
● Love vagrant, docker etc to enable testing 
● Don’t trust libraries 
@chbatey
Hystrix cost - do this yourself 
@chbatey
Hystrix metrics 
● Failure count 
● Percentiles from Hystrix 
@chbatey 
point of view 
● Error percentages
How to test metric publishing? 
● Stub out graphite and verify calls? 
● Programmatically call graphite and verify 
@chbatey 
numbers? 
● Make metrics + logs part of the story demo
1 of 58

Recommended

Cassandra Summit EU 2014 - Testing Cassandra Applications by
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsCassandra Summit EU 2014 - Testing Cassandra Applications
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsChristopher Batey
2.6K views46 slides
LJC Conference 2014 Cassandra for Java Developers by
LJC Conference 2014 Cassandra for Java DevelopersLJC Conference 2014 Cassandra for Java Developers
LJC Conference 2014 Cassandra for Java DevelopersChristopher Batey
2.9K views39 slides
Cassandra Summit EU 2014 Lightning talk - Paging (no animation) by
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Christopher Batey
2.1K views16 slides
Cassandra is great but how do I test my application? by
Cassandra is great but how do I test my application?Cassandra is great but how do I test my application?
Cassandra is great but how do I test my application?Christopher Batey
9.4K views34 slides
DataStax: Making Cassandra Fail (for effective testing) by
DataStax: Making Cassandra Fail (for effective testing)DataStax: Making Cassandra Fail (for effective testing)
DataStax: Making Cassandra Fail (for effective testing)DataStax Academy
2.1K views52 slides
Devoxx France: Fault tolerant microservices on the JVM with Cassandra by
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraChristopher Batey
2.7K views71 slides

More Related Content

What's hot

Real World Mocking In Swift by
Real World Mocking In SwiftReal World Mocking In Swift
Real World Mocking In SwiftVeronica Lillie
555 views76 slides
The Road To Reactive with RxJava JEEConf 2016 by
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016Frank Lyaruu
749 views80 slides
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E... by
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...Luciano Mammino
1.6K views76 slides
servlets by
servletsservlets
servletsArjun Shanka
745 views57 slides
Azure SQL Database - Connectivity Best Practices by
Azure SQL Database - Connectivity Best PracticesAzure SQL Database - Connectivity Best Practices
Azure SQL Database - Connectivity Best PracticesJose Manuel Jurado Diaz
1.3K views23 slides
Manchester Hadoop Meetup: Cassandra Spark internals by
Manchester Hadoop Meetup: Cassandra Spark internalsManchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internalsChristopher Batey
608 views102 slides

What's hot(20)

The Road To Reactive with RxJava JEEConf 2016 by Frank Lyaruu
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
Frank Lyaruu749 views
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E... by Luciano Mammino
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...
Luciano Mammino1.6K views
Manchester Hadoop Meetup: Cassandra Spark internals by Christopher Batey
Manchester Hadoop Meetup: Cassandra Spark internalsManchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internals
Christopher Batey608 views
Meetup cassandra sfo_jdbc by zznate
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
zznate1.3K views
Drools, jBPM OptaPlanner presentation by Mark Proctor
Drools, jBPM OptaPlanner presentationDrools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentation
Mark Proctor338 views
Experienced Selenium Interview questions by archana singh
Experienced Selenium Interview questionsExperienced Selenium Interview questions
Experienced Selenium Interview questions
archana singh179 views
Hector v2: The Second Version of the Popular High-Level Java Client for Apach... by zznate
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate1.5K views
Effective testing for spark programs scala bay preview (pre-strata ny 2015) by Holden Karau
Effective testing for spark programs scala bay preview (pre-strata ny 2015)Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau971 views
Anton Moldovan "Load testing which you always wanted" by Fwdays
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
Fwdays1.1K views
Javascript Everywhere by Pascal Rettig
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
Pascal Rettig2.2K views
EPAM IT WEEK: AEM & TDD. It's so boring... by Andrew Manuev
EPAM IT WEEK: AEM & TDD. It's so boring...EPAM IT WEEK: AEM & TDD. It's so boring...
EPAM IT WEEK: AEM & TDD. It's so boring...
Andrew Manuev979 views
Beyond Profilers: Tracing Node.js Transactions by Terral R Jordan
Beyond Profilers: Tracing Node.js TransactionsBeyond Profilers: Tracing Node.js Transactions
Beyond Profilers: Tracing Node.js Transactions
Terral R Jordan792 views
Understanding Autovacuum by Dan Robinson
Understanding AutovacuumUnderstanding Autovacuum
Understanding Autovacuum
Dan Robinson357 views
MongoDB: tips, trick and hacks by Scott Hernandez
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
Scott Hernandez5.2K views
Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid... by Codemotion
Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid...Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid...
Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid...
Codemotion198 views
Adventures in Multithreaded Core Data by Inferis
Adventures in Multithreaded Core DataAdventures in Multithreaded Core Data
Adventures in Multithreaded Core Data
Inferis23.7K views

Viewers also liked

Dropwizard Internals by
Dropwizard InternalsDropwizard Internals
Dropwizard Internalscarlo-rtr
2.9K views24 slides
Production Ready Web Services with Dropwizard by
Production Ready Web Services with DropwizardProduction Ready Web Services with Dropwizard
Production Ready Web Services with Dropwizardsullis
7.7K views57 slides
Simple REST-APIs with Dropwizard and Swagger by
Simple REST-APIs with Dropwizard and SwaggerSimple REST-APIs with Dropwizard and Swagger
Simple REST-APIs with Dropwizard and SwaggerLeanIX GmbH
29.4K views42 slides
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura by
Stick to the rules - Consumer Driven Contracts. 2015.07 ConfituraStick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Stick to the rules - Consumer Driven Contracts. 2015.07 ConfituraMarcin Grzejszczak
4.3K views42 slides
Dropwizard by
DropwizardDropwizard
DropwizardScott Leberknight
3.7K views78 slides
Reactive Design Patterns by
Reactive Design PatternsReactive Design Patterns
Reactive Design PatternsLegacy Typesafe (now Lightbend)
16K views36 slides

Viewers also liked(7)

Dropwizard Internals by carlo-rtr
Dropwizard InternalsDropwizard Internals
Dropwizard Internals
carlo-rtr2.9K views
Production Ready Web Services with Dropwizard by sullis
Production Ready Web Services with DropwizardProduction Ready Web Services with Dropwizard
Production Ready Web Services with Dropwizard
sullis7.7K views
Simple REST-APIs with Dropwizard and Swagger by LeanIX GmbH
Simple REST-APIs with Dropwizard and SwaggerSimple REST-APIs with Dropwizard and Swagger
Simple REST-APIs with Dropwizard and Swagger
LeanIX GmbH29.4K views
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura by Marcin Grzejszczak
Stick to the rules - Consumer Driven Contracts. 2015.07 ConfituraStick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Marcin Grzejszczak4.3K views
Patterns for building resilient and scalable microservices platform on AWS by Boyan Dimitrov
Patterns for building resilient and scalable microservices platform on AWSPatterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWS
Boyan Dimitrov21.7K views

Similar to Fault tolerant microservices - LJC Skills Matter 4thNov2014

Voxxed Vienna 2015 Fault tolerant microservices by
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesChristopher Batey
926 views65 slides
LJC: Microservices in the real world by
LJC: Microservices in the real worldLJC: Microservices in the real world
LJC: Microservices in the real worldChristopher Batey
1.4K views61 slides
2012 07 making disqus realtime@euro python by
2012 07 making disqus realtime@euro python2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro pythonAdam Hitchcock
489 views41 slides
13multithreaded Programming by
13multithreaded Programming13multithreaded Programming
13multithreaded ProgrammingAdil Jafri
127 views37 slides
VISUG - Approaches for application request throttling by
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttlingMaarten Balliauw
817 views58 slides
Integrate Solr with real-time stream processing applications by
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
2.3K views39 slides

Similar to Fault tolerant microservices - LJC Skills Matter 4thNov2014(20)

Voxxed Vienna 2015 Fault tolerant microservices by Christopher Batey
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservices
Christopher Batey926 views
2012 07 making disqus realtime@euro python by Adam Hitchcock
2012 07 making disqus realtime@euro python2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python
Adam Hitchcock489 views
13multithreaded Programming by Adil Jafri
13multithreaded Programming13multithreaded Programming
13multithreaded Programming
Adil Jafri127 views
VISUG - Approaches for application request throttling by Maarten Balliauw
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttling
Maarten Balliauw817 views
Integrate Solr with real-time stream processing applications by thelabdude
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
thelabdude2.3K views
Monitoring your Python with Prometheus (Python Ireland April 2015) by Brian Brazil
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
Brian Brazil17.3K views
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon) by Alex Chepurnoy
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Alex Chepurnoy2.4K views
Approaches to application request throttling by Maarten Balliauw
Approaches to application request throttlingApproaches to application request throttling
Approaches to application request throttling
Maarten Balliauw1.6K views
Thread syncronization by priyabogra1
Thread syncronizationThread syncronization
Thread syncronization
priyabogra194 views
Java Concurrency, Memory Model, and Trends by Carol McDonald
Java Concurrency, Memory Model, and TrendsJava Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and Trends
Carol McDonald7.7K views
Introduction to Ethereum by Arnold Pham
Introduction to EthereumIntroduction to Ethereum
Introduction to Ethereum
Arnold Pham1.4K views
Campus HTC at #TechEX15 by Rob Gardner
Campus HTC at #TechEX15Campus HTC at #TechEX15
Campus HTC at #TechEX15
Rob Gardner603 views
Post quantum cryptography in vault (hashi talks 2020) by Mitchell Pronschinske
Post quantum cryptography in vault (hashi talks 2020)Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)
SwampDragon presentation: The Copenhagen Django Meetup Group by Ernest Jumbe
SwampDragon presentation: The Copenhagen Django Meetup GroupSwampDragon presentation: The Copenhagen Django Meetup Group
SwampDragon presentation: The Copenhagen Django Meetup Group
Ernest Jumbe1.7K views
Unit testing without Robolectric, Droidcon Berlin 2016 by Danny Preussler
Unit testing without Robolectric, Droidcon Berlin 2016Unit testing without Robolectric, Droidcon Berlin 2016
Unit testing without Robolectric, Droidcon Berlin 2016
Danny Preussler7.5K views

More from Christopher Batey

Cassandra summit LWTs by
Cassandra summit  LWTsCassandra summit  LWTs
Cassandra summit LWTsChristopher Batey
503 views58 slides
Docker and jvm. A good idea? by
Docker and jvm. A good idea?Docker and jvm. A good idea?
Docker and jvm. A good idea?Christopher Batey
3.2K views65 slides
NYC Cassandra Day - Java Intro by
NYC Cassandra Day - Java IntroNYC Cassandra Day - Java Intro
NYC Cassandra Day - Java IntroChristopher Batey
1.2K views28 slides
Cassandra Day NYC - Cassandra anti patterns by
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patternsChristopher Batey
636 views61 slides
Think your software is fault-tolerant? Prove it! by
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!Christopher Batey
1.1K views37 slides
Cassandra London - 2.2 and 3.0 by
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Christopher Batey
749 views44 slides

More from Christopher Batey(20)

Cassandra Day NYC - Cassandra anti patterns by Christopher Batey
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patterns
Christopher Batey636 views
Think your software is fault-tolerant? Prove it! by Christopher Batey
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!
Christopher Batey1.1K views
3 Dundee-Spark Overview for C* developers by Christopher Batey
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
Christopher Batey450 views
Cassandra Day London: Building Java Applications by Christopher Batey
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java Applications
Christopher Batey1.1K views
Data Science Lab Meetup: Cassandra and Spark by Christopher Batey
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
Manchester Hadoop Meetup: Spark Cassandra Integration by Christopher Batey
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
Christopher Batey1.6K views
Manchester Hadoop User Group: Cassandra Intro by Christopher Batey
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra Intro
Christopher Batey791 views
Munich March 2015 - Cassandra + Spark Overview by Christopher Batey
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark Overview
Christopher Batey708 views
Reading Cassandra Meetup Feb 2015: Apache Spark by Christopher Batey
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache Spark
Christopher Batey1.2K views

Recently uploaded

20231123_Camunda Meetup Vienna.pdf by
20231123_Camunda Meetup Vienna.pdf20231123_Camunda Meetup Vienna.pdf
20231123_Camunda Meetup Vienna.pdfPhactum Softwareentwicklung GmbH
41 views73 slides
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
56 views21 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
300 views20 slides
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
34 views35 slides
Unit 1_Lecture 2_Physical Design of IoT.pdf by
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdfStephenTec
12 views36 slides
Network Source of Truth and Infrastructure as Code revisited by
Network Source of Truth and Infrastructure as Code revisitedNetwork Source of Truth and Infrastructure as Code revisited
Network Source of Truth and Infrastructure as Code revisitedNetwork Automation Forum
27 views45 slides

Recently uploaded(20)

Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10300 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto12 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson92 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views

Fault tolerant microservices - LJC Skills Matter 4thNov2014

  • 2. @chbatey Who is this guy? ● Enthusiastic nerd ● Senior software engineer at BSkyB ● Builds a lot of distributed applications ● Apache Cassandra MVP
  • 3. @chbatey Agenda 1. Setting the scene ○ What do we mean by a fault? ○ What is a microservice? ○ Monolith application vs the micro(ish) service 2. A worked example ○ Identify an issue ○ Reproduce/test it ○ Show how to deal with the issue
  • 4. So… what do applications look like? @chbatey
  • 5. So... what do systems look like now? @chbatey
  • 6. But different things go wrong... @chbatey down slow network slow app 2 second max GC :( missing packets
  • 7. Fault tolerance 1. Don’t take forever - Timeouts 2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault 5. Don’t whack a dead horse 6. Turn broken stuff off @chbatey
  • 8. Time for an example... ● All examples are on github ● Technologies used: @chbatey ○ Dropwizard ○ Spring Boot ○ Wiremock ○ Hystrix ○ Graphite ○ Saboteur
  • 9. Example: Movie player service @chbatey Shiny App User Service Device Service Pin Service Shiny App Shiny App Shiny App User Se rUvisceer Service Device Service Play Movie
  • 10. Testing microservices You don’t know a service is fault tolerant if you don’t test faults @chbatey
  • 11. Isolated service tests Shiny App @chbatey Mocks User Device Pin service Acceptance Play Movie Test Prime
  • 12. 1 - Don’t take forever @chbatey ● If at first you don’t succeed, don’t take forever to tell someone ● Timeout and fail fast
  • 13. Which timeouts? ● Socket connection timeout ● Socket read timeout @chbatey
  • 14. Your service hung for 30 seconds :( @chbatey Customer You :(
  • 15. Which timeouts? ● Socket connection timeout ● Socket read timeout ● Resource acquisition @chbatey
  • 16. Your service hung for 10 minutes :( @chbatey
  • 17. Let’s think about this @chbatey
  • 18. A little more detail @chbatey
  • 19. Wiremock + Saboteur + Vagrant ● Vagrant - launches + provisions local VMs ● Saboteur - uses tc, iptables to simulate @chbatey network issues ● Wiremock - used to mock HTTP dependencies ● Cucumber - acceptance tests
  • 20. I can write an automated test for that? @chbatey Vagrant + Virtual box VM Wiremock User Service Device Service Pin Service Sabot eur Play Movie Service Acceptance Test prime to drop traffic reset
  • 21. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor)
  • 22. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor) ● Hystrix
  • 23. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor) ● Hystrix ● Spring Cloud Netflix
  • 24. A simple Spring RestController @chbatey @RestController public class Resource { private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); @Autowired private ScaryDependency scaryDependency; @RequestMapping("/scary") public String callTheScaryDependency() { LOGGER.info("RestContoller: I wonder which thread I am on!"); return scaryDependency.getScaryString(); } }
  • 25. Scary dependency @chbatey @Component public class ScaryDependency { private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); public String getScaryString() { LOGGER.info("Scary dependency: I wonder which thread I am on!"); if (System.currentTimeMillis() % 2 == 0) { return "Scary String"; } else { Thread.sleep(10000); return "Really slow scary string"; } } }
  • 26. All on the tomcat thread 13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. examples.Resource - RestContoller: I wonder which thread I am on! 13:07:32.896 [http-nio-8080-exec-1] INFO info.batey. examples.ScaryDependency - Scary dependency: I wonder which thread I am on! @chbatey
  • 27. Seriously this simple now? @chbatey @Component public class ScaryDependency { private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); @HystrixCommand public String getScaryString() { LOGGER.info("Scary dependency: I wonder which thread I am on!"); if (System.currentTimeMillis() % 2 == 0) { return "Scary String"; } else { Thread.sleep(10000); return "Really slow scary string"; } } }
  • 28. What an annotation can do... 13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. examples.Resource - RestController: I wonder which thread I am on! 13:07:32.896 [hystrix-ScaryDependency-1] INFO info. batey.examples.ScaryDependency - Scary Dependency: I wonder which thread I am on! @chbatey
  • 29. Timeouts take home ● You can’t use network level timeouts for @chbatey SLAs ● Test your SLAs - if someone says you can’t, hit them with a stick ● Scary things happen without network issues
  • 30. 2 - Don’t try if you can’t succeed @chbatey
  • 31. Complexity ● When an application grows in complexity it will eventually start sending emails @chbatey
  • 32. Complexity ● When an application grows in complexity it will eventually start sending emails contain queues and thread pools @chbatey
  • 33. Don’t try if you can’t succeed ● Executor Unbounded queues :( ○ newFixedThreadPool ○ newSingleThreadExecutor ○ newThreadCachedThreadPool ● Bound your queues and threads ● Fail quickly when the queue / @chbatey maxPoolSize is met ● Know your drivers
  • 34. This is a functional requirement ● Set the timeout very high ● Use wiremock to add a large delay to the @chbatey requests ● Set queue size and thread pool size to 1 ● Send in 2 requests to use the thread and fill the queue ● What happens on the 3rd request?
  • 35. 3 - Fail gracefully @chbatey
  • 36. Expect rubbish ● Expect invalid HTTP ● Expect malformed response bodies ● Expect connection failures ● Expect huge / tiny responses @chbatey
  • 37. Testing with Wiremock @chbatey stubFor(get(urlEqualTo("/dependencyPath")) .willReturn(aResponse() .withFault(Fault.MALFORMED_RESPONSE_CHUNK))); { "request": { "method": "GET", "url": "/fault" }, "response": { "fault": "RANDOM_DATA_THEN_CLOSE" } } { "request": { "method": "GET", "url": "/fault" }, "response": { "fault": "EMPTY_RESPONSE" } }
  • 38. 4 - Know if it’s your fault @chbatey
  • 39. What to record ● Metrics: Timings, errors, concurrent incoming requests, thread pool statistics, connection pool statistics ● Logging: Boundary logging, elasticsearch / @chbatey logstash ● Request identifiers
  • 42. Separate resource pools ● Don’t flood your dependencies ● Be able to answer the questions: ○ How many connections will you make to dependency X? ○ Are you getting close to your @chbatey max connections?
  • 43. So easy with Dropwizard + Hystrix @Override public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { HystrixCodaHaleMetricsPublisher metricsPublisher = new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()) HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); @chbatey } metrics: reporters: - type: graphite host: 192.168.10.120 port: 2003 prefix: shiny_app
  • 44. 5 - Don’t whack a dead horse @chbatey Shiny App User Service Device Service Pin Service Shiny App Shiny App Shiny App User Se rUvisceer Service Device Service Play Movie
  • 45. What to do.. ● Yes this will happen.. ● Mandatory dependency - fail *really* fast ● Throttling ● Fallbacks @chbatey
  • 47. Implementation with Hystrix @chbatey @GET @Timed public String integrate() { LOGGER.info("I best do some integration!"); String user = new UserServiceDependency(userService).execute(); String device = new DeviceServiceDependency(deviceService).execute(); Boolean pinCheck = new PinCheckDependency(pinService).execute(); return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, pinCheck); }
  • 48. Implementation with Hystrix public class PinCheckDependency extends HystrixCommand<Boolean> { @chbatey @Override protected Boolean run() throws Exception { HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); HttpResponse pinCheckResponse = httpClient.execute(pinCheck); String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); return Boolean.valueOf(pinCheckInfo); } }
  • 49. Implementation with Hystrix public class PinCheckDependency extends HystrixCommand<Boolean> { @chbatey @Override protected Boolean run() throws Exception { HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); HttpResponse pinCheckResponse = httpClient.execute(pinCheck); String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); return Boolean.valueOf(pinCheckInfo); } @Override public Boolean getFallback() { return true; } }
  • 50. Triggering the fallback ● Error threshold percentage ● Bucket of time for the percentage ● Minimum number of requests to trigger ● Time before trying a request again ● Disable ● Per instance statistics @chbatey
  • 51. 6 - Turn off broken stuff ● The kill switch @chbatey
  • 52. To recap 1. Don’t take forever - Timeouts 2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault 5. Don’t whack a dead horse 6. Turn broken stuff off @chbatey
  • 53. @chbatey Links ● Examples: ○ https://github.com/chbatey/spring-cloud-example ○ https://github.com/chbatey/dropwizard-hystrix ○ https://github.com/chbatey/vagrant-wiremock-saboteur ● Tech: ○ https://github.com/Netflix/Hystrix ○ https://www.vagrantup.com/ ○ http://wiremock.org/ ○ https://github.com/tomakehurst/saboteur
  • 54. Questions? ● Thanks for listening! ● http://christopher-batey.blogspot.co.uk/ @chbatey
  • 55. Developer takeaways ● Learn about TCP ● Love vagrant, docker etc to enable testing ● Don’t trust libraries @chbatey
  • 56. Hystrix cost - do this yourself @chbatey
  • 57. Hystrix metrics ● Failure count ● Percentiles from Hystrix @chbatey point of view ● Error percentages
  • 58. How to test metric publishing? ● Stub out graphite and verify calls? ● Programmatically call graphite and verify @chbatey numbers? ● Make metrics + logs part of the story demo