SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
DevNexus 2020 "Break me if you can: practical guide to building fault-tolerant systems" slides
DevNexus 2020 "Break me if you can: practical guide to building fault-tolerant systems" slides
1.
Break Me If You Can
Practical Guide to Building Fault-tolerant Systems
DevNexus, Atlanta, GA
February 20, 2020
Alex Borysov, Software Engineer @ Netflix
Mykyta Protsenko, Software Engineer @ Netflix
2.
Who are we?
Alex Borysov
Software Engineer @Netflix
Mykyta Protsenko
Software Engineer @Netflix
@aiborisov
@mykyta_p
@WeAreNetflix
5.
@aiborisov
@mykyta_p
Fault
@aiborisov
@mykyta_p
incorrect
internal
state
Picture by Bob McMillan. Public domain. See slides ##178-181 for details
6.
@aiborisov
@mykyta_p
Error
@aiborisov
@mykyta_p
visibly
incorrect
behaviour
Picture by David Goehring. CC BY 2.0. See slides ##178-181 for details
7.
@aiborisov
@mykyta_p
Failure
@aiborisov
@mykyta_p
main
functionality
is broken
Picture by Camerafiend. CC BY-SA 3.0. See slides ##178-181 for details
8.
@aiborisov
@mykyta_p
RMS Titanic vs Miracle on the Hudson
@aiborisov
@mykyta_p
Willy Stöwer. Public domain. See slides ##178-181 for details By Greg Lam Pak Ng. CC BY 2.0. See slides ##178-181 for details
9.
@aiborisov
@mykyta_p
RMS Titanic
@aiborisov
@mykyta_p
Fault: Hitting an iceberg
Error: Water in the hull
Failure: Sinking
Willy Stöwer. Public domain. See slides ##178-181 for details
10.
@aiborisov
@mykyta_p
Miracle on the Hudson
@aiborisov
@mykyta_p
Fault: Hitting geese at 859 m
Error: Engines shut down
No Failure!
By Greg Lam Pak Ng. CC BY 2.0. See slides ##178-181 for details
13.
@aiborisov
@mykyta_p
Fault Tolerance
@aiborisov
@mykyta_p
Code and Design Patterns
Product-Driven Decisions
Communication
By Greg Lam Pak Ng. CC BY 2.0. See slides ##178-181 for details
15.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Geese Service
Clouds Service
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
See slides ##178-181 for licensing details
16.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Geese Service
Clouds Service
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
17.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Geese Service
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
Clouds Service
18.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
Clouds Service
Geese Service
19.
@aiborisov
@mykyta_p
Dodging Geese Architecture
Geese Service
Clouds ServiceAPI
Gateway
@aiborisov
@mykyta_p
TOP-5
Leaderboard Service
20.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Geese Service
Clouds Service
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
21.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Geese Service
Clouds Service
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
22.
@aiborisov
@mykyta_p
Dodging Geese Architecture
TOP-5
Geese Service
Clouds Service
Leaderboard Service
API
Gateway
@aiborisov
@mykyta_p
24.
@aiborisov
@mykyta_p
gRPC Service Definitions
@aiborisov
@mykyta_p
service GeeseService {
// Return next line of geese.
rpc GetGeese (GetGeeseRequest) returns (GeeseResponse);
}
25.
@aiborisov
@mykyta_p
gRPC Service Definitions
@aiborisov
@mykyta_p
service GeeseService {
// Return next line of geese.
rpc GetGeese (GetGeeseRequest) returns (GeeseResponse);
}
service CloudsService {
// Return next line of clouds.
rpc GetClouds (GetCloudsRequest) returns (CloudsResponse);
}
26.
@aiborisov
@mykyta_p
service FixtureService {
// Return next line of geese and clouds.
rpc GetFixture (GetFixtureRequest) returns (FixtureResponse);
}
gRPC Gateway Service
@aiborisov
@mykyta_p
27.
@aiborisov
@mykyta_p
service FixtureService {
// Return next line of geese and clouds.
rpc GetFixture (GetFixtureRequest) returns (FixtureResponse);
}
+ = Fixture
gRPC Gateway Service
@aiborisov
@mykyta_p
28.
@aiborisov
@mykyta_p
public class FixtureService extends FixtureServiceImplBase {
Gateway Fixture Service
@aiborisov
@mykyta_p
29.
@aiborisov
@mykyta_p
Gateway Fixture Service
API
Gateway
@aiborisov
@mykyta_p
Geese Service
Clouds Service
30.
@aiborisov
@mykyta_p
@aiborisov
@mykyta_p
Non-Blocking Calls
Don’t block
Send requests in parallel
Combine results when ready
31.
@aiborisov
@mykyta_p
public class FixtureService extends FixtureServiceImplBase {
Gateway Service Implementation
@aiborisov
@mykyta_p
private final GeeseServiceFutureStub geeseClient = ...;
private final CloudsServiceFutureStub cloudsClient = ...;
32.
@aiborisov
@mykyta_p
public class FixtureService extends FixtureServiceImplBase {
Gateway Service Implementation
@aiborisov
@mykyta_p
private final GeeseServiceFutureStub geeseClient = ...;
private final CloudsServiceFutureStub cloudsClient = ...;
@Override
public void getFixture(GetFixtureRequest request, StreamObserver<FixtureResponse> response) {
ListenableFuture<GeeseResponse> geese = geeseClient.getGeese(toGeese(request));
ListenableFuture<CloudsResponse> clouds = cloudsClient.getClouds(toClouds(request));
ListenableFuture<List<GeneratedMessageV3>> geeseAndClouds =
Futures.allAsList(geese, clouds);
...
33.
@aiborisov
@mykyta_p
public class FixtureService extends FixtureServiceImplBase {
Gateway Service Implementation
@aiborisov
@mykyta_p
private final GeeseServiceFutureStub geeseClient = ...;
private final CloudsServiceFutureStub cloudsClient = ...;
@Override
public void getFixture(GetFixtureRequest request, StreamObserver<FixtureResponse> response) {
ListenableFuture<GeeseResponse> geese = geeseClient.getGeese(toGeese(request));
ListenableFuture<CloudsResponse> clouds = cloudsClient.getClouds(toClouds(request));
ListenableFuture<List<GeneratedMessageV3>> geeseAndClouds =
Futures.allAsList(geese, clouds);
...
87.
@aiborisov
@mykyta_p
Deadlines
API
Gateway
@aiborisov
@mykyta_p
See slides ##178-181 for licensing details
88.
@aiborisov
@mykyta_p
Deadlines
API
Gateway
@aiborisov
@mykyta_p
Deadline 500 ms
→
89.
@aiborisov
@mykyta_p
Deadlines
API
Gateway
@aiborisov
@mykyta_p
Deadline 500 ms
→ Spent 300 ms
→
90.
@aiborisov
@mykyta_p
Deadlines
API
Gateway
@aiborisov
@mykyta_p
Spent 300 ms
→ Spent 250 ms
Deadline 500 ms
→
X
91.
@aiborisov
@mykyta_p
Deadlines
API
Gateway
@aiborisov
@mykyta_p
Spent 300 ms
→ Spent 250 ms
Deadline 500 ms
→
X
→
92.
@aiborisov
@mykyta_p
Deadline Propagation
API
Gateway
@aiborisov
@mykyta_p
Deadline 500 ms
→
93.
@aiborisov
@mykyta_p
Deadline 200 ms
Deadline Propagation
API
Gateway
@aiborisov
@mykyta_p
Deadline 500 ms
→ Spent 300 ms
→
94.
@aiborisov
@mykyta_p
Deadline 200 ms
Deadline Propagation
API
Gateway
@aiborisov
@mykyta_p
Spent 300 ms
→ Spent 250 ms
Deadline 500 ms
→
X
95.
@aiborisov
@mykyta_p
Deadline 200 ms
Deadline Propagation
API
Gateway
@aiborisov
@mykyta_p
Spent 300 ms
→ Spent 250 ms
Deadline -50 ms
Deadline 500 ms
→
X
106.
@aiborisov
@mykyta_p
Monitoring
@aiborisov
@mykyta_p
APM
Service
metrics
Distributed
tracing
Business
metrics
Picture by Alex Borysov. CC BY 2.0. See slides ##178-181 for details
127.
@aiborisov
@mykyta_p
@aiborisov
@mykyta_p
Per
instance
limits
128.
@aiborisov
@mykyta_p
Door Capacity
@aiborisov
@mykyta_p
Why didn’t Rose make room for
Jack on the door?
Willy Stöwer. Public domain. See slides ##178-181 for details
129.
@aiborisov
@mykyta_p
Door Capacity
@aiborisov
@mykyta_p
130.
@aiborisov
@mykyta_p
Door Capacity
@aiborisov
@mykyta_p
131.
@aiborisov
@mykyta_p
Door Capacity
@aiborisov
@mykyta_p
132.
@aiborisov
@mykyta_p
Door Capacity
@aiborisov
@mykyta_p
133.
@aiborisov
@mykyta_p
Door Capacity
@aiborisov
@mykyta_p
Why didn’t Rose make room for
Jack on the door?
“ The answer is very simple
because it says on page 147
that Jack dies “
James Cameron
Willy Stöwer. Public domain. See slides ##178-181 for details
147.
@aiborisov
@mykyta_p
@aiborisov
@mykyta_p
Deployments
can be
risky
148.
@aiborisov
@mykyta_p
Exploding Whale
Engineering solution
Half a ton of dynamite
@aiborisov
@mykyta_p
Illustration by Greg Williams. CC BY 3.0. See slides ##178-181
149.
@aiborisov
@mykyta_p
Exploding Whale
Engineering solution
Half a ton of dynamite
Ooops! Non-limited blast radius
Learn more at TheExplodingWhale.com
@aiborisov
@mykyta_p
Illustration by Greg Williams. CC BY 3.0. See slides ##178-181
177.
@aiborisov
@mykyta_p
AMA @ the Netflix Booth (#13)
@aiborisov
@mykyta_p
Thursday, Feb 20th
12:50 PM ask Mykyta Protsenko & Alex Borysov anything about
fault-tolerance
3:10 PM ask Nadav Cohen anything about developer tooling
Friday, Feb 21th
10:00 AM ask Philip Fisher-Ogden anything about Netflix’s streaming
architecture
12:50 PM ask Sangeeta Narayanan anything about cloud native services
3:10 PM ask Vinod Viswanathan anything about media processing
178.
@aiborisov
@mykyta_p
Images and Licensing
Images of geese, clouds, pilots, plane, arrows, cup, airport traffic control tower are property of Mykyta Protsenko and Alex Borysov, if not
stated otherwise (see below). All Rights Reserved.
Other images used:
commons.wikimedia.org/wiki/File:FEMA_-_16381_-_Photograph_by_Bob_McMillan_taken_on_09-28-2005_in_Texas.jpg
- Picture by Bob McMillan, the US federal government work, public domain
www.flickr.com/photos/carbonnyc/3290528875
- Picture by David Goehring. Attribution 2.0 Generic (CC BY 2.0): creativecommons.org/licenses/by/2.0
- changes were made
www.flickr.com/photos/carbonnyc/3290528875
- Picture by Camerafiend. Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0): creativecommons.org/licenses/by-sa/3.0/deed.en
- no changes were made
commons.wikimedia.org/wiki/File:Titanic_sinking,_painting_by_Willy_St%C3%B6wer.jpg
- Willy Stöwer. Public domain work of art
179.
@aiborisov
@mykyta_p
Images and Licensing
www.flickr.com/photos/22608787@N00/3200086900
- Picture y Greg Lam Pak Ng. Attribution 2.0 Generic (CC BY 2.0): creativecommons.org/licenses/by/2.0
- no changes were made
piq.codeus.net/picture/31994/Blue-Game-Boy-Color
- Blue Game Boy Color by kure
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- changes were made
piq.codeus.net/picture/191706/The-Sun
- The Sun by Vinicius615
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- changes were made
Slide #106:
- Picture by Alex Borysov. Attribution 2.0 Generic (CC BY 2.0): creativecommons.org/licenses/by/2.0
180.
@aiborisov
@mykyta_p
Images and Licensing
piq.codeus.net/picture/254492/CVsantahat
- Santa hat for CommanderVideo, CVsantahat by anonymous
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- no changes were made
piq.codeus.net/picture/423109/UFO
- UFO by anonymous
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- no changes were made
piq.codeus.net/picture/334023/beer
- beer by Investa
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- changes were made
181.
@aiborisov
@mykyta_p
Images and Licensing
piq.codeus.net/picture/444498/Beer-Bottle
- Beer Bottle by jacklrj
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- changes were made
https://piq.codeus.net/picture/330338/Deal-With-It
- Deal With It by Shiro
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- changes were made
https://commons.wikimedia.org/wiki/File:Whale_WikiWorld.png
- Cartoon illustration has been created by Greg Williams in cooperation with the Wikimedia Foundation
- Attribution 3.0 Unported (CC BY 3.0): creativecommons.org/licenses/by/3.0
- changes were made