SlideShare a Scribd company logo
1 of 89
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:Invent
A Day in the life of a Cloud Network
Engineer at Netflix
D o n a v a n F r i t z : S r . C l o u d N e t w o r k S R E
J o e l K o d a m a : S r . C l o u d N e t w o r k S R E
N E T 3 0 3
109,000,000
Global Subscribers
1,000,000+
Requests Per Second
150,000+
EC2 Instances
75+
Accounts
4
AWS Regions
AWS Infrastructure
AWS EC2-Classic
10.0.0.0/8
EC2 instance
EC2 instance
EC2 instance
EC2 instance
EC2 instance EC2 instance
EC2 instance
EC2 instance
EC2 instanceEC2 instance
Account A Account B Account C
AWS EC2-Classic
EC2 instance EC2 instance
EC2 instance EC2 instance
Public
Private
VPC NAT
Gateway
Internet
EC2 instance EC2 instance
EC2 instance EC2 instance EC2 instance EC2 instance
VPC
peering
EC2 instance EC2 instance
Internet
MPLS Backbone
Globally Unique IP Space is Good
100.64.0.0/10
IP Management is Hard
Infrastructure Insight
Infrastructure Insight
DNS
api.netflix.com
api.netflix.com
www.netflix.com
DNS Insight
Availability and Performance
Network Insight
“Hi there, can someone help me resolve a
network connectivity issue between one
microservice to another?”
- Sr. Platform Engineer
“Does anyone know if there are any
network weather events in us-east-1?
We’ve seen a couple hosts run into
network partitions.”
- Sr. Database Engineer
“I'm thinking this might be due to
networking unpleasantness...”
- Sr. Edge Engineer
“I am seeing what seem to be network
related errors on start-up.”
- Stunning Colleague #1
VPC Flow Logs
Really Good,
Meaningless Data.
VPC Flow Logs
Really Good.
VPC Flow Logs
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
Network Segmentation
EC2 instance
Foo
EC2 instance
Foo
Auto Scaling group
EC2 instance
Bar
EC2 instance
Bar
Auto Scaling group
EC2 instance
Baz
EC2 instance
Baz
Auto Scaling group
Classic Load
Balancer
Lambda
Function
RDS DB
instance
Application
Load Balancer
ElastiCache Redis
Instance
172.31.16.139 172.31.16.21
Foo Foo
Auto Scaling group
172.31.16.54
Bar
172.31.16.248
Bar
Auto Scaling group
172.31.61.95
Baz
172.16.31.10
Baz
Auto Scaling group
172.31.16.22 172.31.16.19 172.31.16.60172.31.16.133172.31.16.231
EC2 Instance EC2 Instance
Foo Foo
Auto Scaling group
Amazon
SQS
172.31.16.139 172.31.16.21
Foo Foo
Auto Scaling group
72.21.207.173
What app has these IPs?
IP Address: 172.16.100.100
t0 tnt1
EC2
Instance
t3
EC2
Instance
t2
Lambda
Function
What app had these IPs, at this
time?
EC2 instance
Foo
EC2 instance
Foo
Auto Scaling group
172.31.0.0/16
EC2 instance
Bar
EC2 instance
Bar
Auto Scaling group
172.31.0.0/16
What app had these IPs, at this
time, in this routing domain?
VPC Flow
LogsIP Addresses Mean Nothing
Challenges
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/20641 TCP/22
IP Addresses Mean
Nothing
Stateless
Challenges
//
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/11211 TCP/8008???
IP Addresses Mean
Nothing
Stateless
Challenges
//
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/11211 TCP/8008memcached
IP Addresses Mean
Nothing
Stateless
Challenges
//
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/11211 TCP/8008
Classic Load Balancer EC2 Instance
memcached
IP Addresses Mean
Nothing
Stateless
Challenges
//
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/11211 TCP/8008
Classic Load Balancer EC2 Instance
memcached
IP Addresses Mean
Nothing
Stateless
Challenges
//
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/11211 TCP/8008HTTP
Classic Load Balancer EC2 Instance
IP Addresses Mean
Nothing
Stateless
Challenges
//
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK
172.31.16.139 172.31.16.21
TCP/11211 TCP/8008HTTP
Classic Load Balancer EC2 Instance
IP Addresses Mean
Nothing
Stateless
Challenges
//
VPC Flow
LogsIP Addresses Mean Nothing
Stateless
Fragmented
Challenges
instance instance
1 TCP Connection
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
instance instance
1 TCP Connection, 4 VPC Flow Log
Records
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
instance instance
1 TCP Connection, 4 VPC Flow Log
Records
elastic
network
interface
elastic
network
interface
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
instance
Amazon
SQS
1 TCP Connection, 6 VPC Flow Log Records
VPC NAT
Gateway
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
instance
2 TCP Connections
Classic Load
Balancer
instance
, 12 VPC Flow Log Records
VPC NAT
Gateway
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
instance
What I care about
instance
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
VPC Flow
Logs
We have a lot of Flow Logs
IP Addresses Mean Nothing
Stateless
Fragmented
Challenges
1,000,000+
Requests Per Second
4 AWS Regions 75+ of accounts
150,000+
EC2 Instances
IP Addresses Mean
Nothing
Stateless
Challenges
We have a lot of Flow LogsFragmented //////
10,000,000+
Flow Log Records
Every Second
IP Addresses Mean
Nothing
Stateless
Challenges
We have a lot of Flow LogsFragmented //////
VPC Flow
LogsIP Addresses Mean Nothing
Stateless
We have a lot of Flow Logs
Fragmented
Solutions
What app had these IPs, at this
time, in this routing domain?
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
f(domain, ip, time) = app
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Sonar
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Extract Transform Load
AWS
APIs / Logs
Netflix
APIs / Logs
CloudWatch Events
DNS Crawling
Polling
Event
Processing
Netflix Events t1: ip(172.31.2.2) + eni-123
t2: ip(172.31.2.2) + i-abcdef
t3: ip(172.31.2.2) + titus cabc
t10: ip(172.31.2.2) - titus cabc
t11: ip(172.31.2.2) - eni-123
t12: ip(172.31.2.2) - i-abcdef
...
IP Change Events
t20: ip(1.1.1.1) + AWS SNS
t21: ip(2.2.2.2) + AWS SQS
t30: ip(2.2.2.2) - AWS SQS
t31: ip(1.1.1.1) - AWS SNS
...
Sonar
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
VPC Flow
LogsIP Addresses Mean Nothing
Stateless
We have a lot of Flow Logs
Fragmented
Solutions
TCP/80
TCP/443
TCP/8080
TCP/8443
...
SSM Agent
EC2
Instances
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
VPC Flow
LogsIP Addresses Mean Nothing
Stateless
We have a lot of Flow Logs
Fragmented
Solutions
Known Deficiency
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
VPC Flow
LogsIP Addresses Mean Nothing
Stateless
We have a lot of Flow Logs
Fragmented
Solutions
Dredge
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Dredge
Amazon VPC Flow Logs
(via Kinesis )
IP Change Events
(Sonar)
Stream Joins
Netflix
Data Pipeline
VPC Flow Logs (via Amazon Kinesis)
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Stream Joins
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
ACCEPT
f(domain, ip, time) = app
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IPv4 Addresses TimestampRouting Domain
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT
Stream Joins
f(0, 172.31.16.139, 1418530010) =
f(0, 172.31.16.21, 1418530010) =
foo
bar
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IPv4 Addresses TimestampRouting Domain
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT
Stream Joins
172.31.16.139:20641 = Not Listening Outbound Request
f(0, 172.31.16.139, 1418530010) =
f(0, 172.31.16.21, 1418530010) =
foo
bar
=
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IPv4 Addresses TimestampRouting Domain
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT
Stream Joins
{
srcIP: ‘172.31.16.139’,
dstIP: ‘172.31.16.21’,
srcPort: 20641,
dstPort: 22,
packets: 20,
bytes: 4249,
startTs: 1418530010,
endTs: 1418530070,
action: ‘ACCEPT’,
srcApp: ‘foo’,
dstApp: ‘bar’,
state: ‘Outbound Request’,
…
}
{
srcIP: ‘ ’,
dstIP: ‘ ’,
srcPort: ,
dstPort: ,
packets: ,
bytes: ,
startTs: ,
endTs: ,
action: ‘ ’,
srcApp: ‘ ’,
dstApp: ‘ ’,
state: ‘ ’,
…
}
172.31.16.139
172.31.16.21
20641
22
1418530010
1418530070
foo
bar
Outbound Request
4249
20
Transform
ACCEPT
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Load
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Netflix
Data Pipeline
{
srcIP: ‘172.31.16.139’,
dstIP: ‘172.31.16.21’,
srcPort: 20641,
dstPort: 22,
packets: 20,
bytes: 4249,
startTs: 1418530010,
endTs: 1418530070,
action: ‘ACCEPT’,
srcApp: ‘foo’,
dstApp: ‘bar’,
state: ‘Outbound Request’,
…
}
Results
Slack App
$ netstat -tpna
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 100.81.51.147:35024 100.81.3.96:8080 TIME_WAIT -
tcp 0 0 100.81.51.147:27945 100.76.196.35:12345 ESTABLISHED -
tcp 0 0 100.81.51.147:40881 100.76.155.222:12345 ESTABLISHED -
tcp 0 0 100.81.51.147:58127 100.81.77.56:8080 ESTABLISHED -
tcp 0 0 100.81.51.147:57581 100.76.157.241:12345 ESTABLISHED -
tcp 0 0 100.81.51.147:8080 100.81.213.243:47269 ESTABLISHED -
tcp 0 0 100.81.51.147:42184 100.81.50.229:8080 ESTABLISHED -
tcp 0 0 100.81.51.147:37429 100.81.57.18:8080 TIME_WAIT -
tcp 0 0 100.81.51.147:38336 100.81.75.198:8080 ESTABLISHED -
tcp 0 0 100.81.51.147:21432 100.81.90.93:8080 TIME_WAIT -
tcp 0 0 100.81.51.147:22824 100.76.228.39:12345 ESTABLISHED -
tcp 1 0 100.81.51.147:13514 100.81.107.125:8080 CLOSE_WAIT -
tcp 0 0 100.81.51.147:63556 100.81.160.56:8080 ESTABLISHED -
tcp 0 0 100.81.51.147:21591 100.81.19.52:8080 TIME_WAIT -
tcp 0 0 100.81.51.147:41689 100.81.59.253:8081 ESTABLISHED -
tcp 0 0 100.81.51.147:8080 100.81.37.100:31639 FIN_WAIT2 -
tcp 54 0 100.81.51.147:52883 52.218.128.113:443 ESTABLISHED -
tcp 0 0 100.81.51.147:27556 100.76.198.44:12345 ESTABLISHED -
tcp 1 0 100.81.51.147:25435 100.81.79.120:8080 CLOSE_WAIT -
tcp 1 54 100.81.51.147:14703 52.218.128.121:443 ESTABLISHED -
tcp 1 0 100.81.51.147:53777 100.81.107.125:8080 CLOSE_WAIT -
tcp 0 0 100.81.51.147:38366 100.76.157.217:12345 ESTABLISHED -
tcp 1 0 100.81.51.147:62763 100.81.107.125:8080 ESTABLISHED -
tcp 0 0 100.81.51.147:55510 100.81.22.63:8080 TIME_WAIT -
tcp 0 0 100.81.51.147:8080 100.81.234.159:27884 ESTABLISHED -
+-----------+-------------+----------------------------+-----------+-----------+-------------+-----+
| Direction | ForeignKind | ExtraInfo | Account | Region | State | Qty |
+-----------+-------------+----------------------------+-----------+-----------+-------------+-----+
| inbound | Instance | asg: bastion-v078 | 111111111 | us-west-1 | ESTABLISHED | 1 |
| outbound | Instance | asg: ledo-v004 | 222222222 | us-east-1 | ESTABLISHED | 80 |
| outbound | Instance | asg: ledo-v003 | 222222222 | us-east-1 | ESTABLISHED | 80 |
| outbound | AwsService | dynamodb | | us-east-1 | ESTABLISHED | 19 |
| outbound | AwsService | kinesis | | us-east-1 | ESTABLISHED | 14 |
| outbound | Instance | asg: brigo-us-east-1e-v011 | 333333333 | us-east-1 | ESTABLISHED | 8 |
| outbound | Instance | asg: brigo-us-east-1d-v011 | 333333333 | us-east-1 | ESTABLISHED | 8 |
| outbound | Instance | asg: brigo-us-east-1c-v012 | 333333333 | us-east-1 | ESTABLISHED | 8 |
| outbound | Instance | asg: berberb-v012 | 333333333 | us-east-1 | ESTABLISHED | 3 |
| outbound | Instance | asg: pikango-v003 | 444444444 | us-east-1 | ESTABLISHED | 2 |
| outbound | Instance | asg: endai-v003 | 555555555 | us-east-1 | ESTABLISHED | 1 |
| outbound | Instance | asg: kotts-v111 | 444444444 | us-east-1 | ESTABLISHED | 1 |
| outbound | Instance | asg: akrah-v000 | 333333333 | us-east-1 | ESTABLISHED | 1 |
| outbound | Instance | asg: barta-v095 | 333333333 | us-east-1 | ESTABLISHED | 1 |
| outbound | Instance | asg: padok-v061 | 333333333 | us-east-1 | ESTABLISHED | 1 |
| outbound | AwsService | kinesis | | us-east-1 | TIME_WAIT | 3 |
| outbound | Instance | asg: ledo-v004 | 222222222 | us-east-1 | TIME_WAIT | 2 |
| outbound | Instance | asg: ledo-v003 | 222222222 | us-east-1 | TIME_WAIT | 1 |
| outbound | Instance | asg: berberb-v012 | 333333333 | us-east-1 | TIME_WAIT | 1 |
+-----------+-------------+----------------------------+-----------+-----------+-------------+-----+
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Donavan Fritz: Sr. Cloud Network SRE
d f r i t z @ n e t f l i x . c o m
Joel Kodama: Sr. Cloud Network SRE
j k o d a m a @ n e t f l i x . c o m

More Related Content

What's hot

Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Winhon Network Solution
Winhon Network SolutionWinhon Network Solution
Winhon Network SolutionJinzdm
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain confluent
 
Infrastructure as Code for Beginners
Infrastructure as Code for BeginnersInfrastructure as Code for Beginners
Infrastructure as Code for BeginnersDavid Völkel
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityJean-Paul Azar
 
Hashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs EnterpriseHashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs EnterpriseStenio Ferreira
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Credential store using HashiCorp Vault
Credential store using HashiCorp VaultCredential store using HashiCorp Vault
Credential store using HashiCorp VaultMayank Patel
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLRené Cannaò
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialKenny Gryp
 
IT Automation with Ansible
IT Automation with AnsibleIT Automation with Ansible
IT Automation with AnsibleRayed Alrashed
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...HostedbyConfluent
 

What's hot (20)

Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Ansible - Hands on Training
Ansible - Hands on TrainingAnsible - Hands on Training
Ansible - Hands on Training
 
Winhon Network Solution
Winhon Network SolutionWinhon Network Solution
Winhon Network Solution
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Infrastructure as Code for Beginners
Infrastructure as Code for BeginnersInfrastructure as Code for Beginners
Infrastructure as Code for Beginners
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Hashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs EnterpriseHashicorp Vault Open Source vs Enterprise
Hashicorp Vault Open Source vs Enterprise
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Credential store using HashiCorp Vault
Credential store using HashiCorp VaultCredential store using HashiCorp Vault
Credential store using HashiCorp Vault
 
Kubernetes Security
Kubernetes SecurityKubernetes Security
Kubernetes Security
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 
IT Automation with Ansible
IT Automation with AnsibleIT Automation with Ansible
IT Automation with Ansible
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
 
Introducing Vault
Introducing VaultIntroducing Vault
Introducing Vault
 

Similar to A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent 2017

Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovSergey Fedorov
 
(NET406) Deep Dive: AWS Direct Connect and VPNs
(NET406) Deep Dive: AWS Direct Connect and VPNs(NET406) Deep Dive: AWS Direct Connect and VPNs
(NET406) Deep Dive: AWS Direct Connect and VPNsAmazon Web Services
 
Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017
Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017
Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017Amazon Web Services
 
(ARC402) Double Redundancy With AWS Direct Connect
(ARC402) Double Redundancy With AWS Direct Connect(ARC402) Double Redundancy With AWS Direct Connect
(ARC402) Double Redundancy With AWS Direct ConnectAmazon Web Services
 
another day, another billion packets
another day, another billion packetsanother day, another billion packets
another day, another billion packetsAmazon Web Services
 
Another day, another billion packets - Toronto
Another day, another billion packets - TorontoAnother day, another billion packets - Toronto
Another day, another billion packets - TorontoAmazon Web Services
 
A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013
A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013
A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013Amazon Web Services
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web ServicesGeorge Ang
 
Edge to Instance - AWS Networking
Edge to Instance - AWS Networking Edge to Instance - AWS Networking
Edge to Instance - AWS Networking Amazon Web Services
 
(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014
(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014
(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014Amazon Web Services
 
Networking Advanced VPC Design and New Capabilities
Networking Advanced VPC Design and New CapabilitiesNetworking Advanced VPC Design and New Capabilities
Networking Advanced VPC Design and New CapabilitiesAmazon Web Services
 
Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...
Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...
Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...Amazon Web Services
 
Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018
Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018
Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018Amazon Web Services
 
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel AvivAWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel AvivAmazon Web Services
 
Lync 2010 deep dive edge
Lync 2010 deep dive edgeLync 2010 deep dive edge
Lync 2010 deep dive edgeHarold Wong
 
Building Highly Scalable Immersive Media Solutions on AWS
Building Highly Scalable Immersive Media Solutions on AWSBuilding Highly Scalable Immersive Media Solutions on AWS
Building Highly Scalable Immersive Media Solutions on AWSETCenter
 
初探 AWS 平台上的 Docker 服務
初探 AWS 平台上的 Docker 服務初探 AWS 平台上的 Docker 服務
初探 AWS 平台上的 Docker 服務Amazon Web Services
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion PacketsAmazon Web Services
 
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...Amazon Web Services
 

Similar to A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent 2017 (20)

Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
 
(NET406) Deep Dive: AWS Direct Connect and VPNs
(NET406) Deep Dive: AWS Direct Connect and VPNs(NET406) Deep Dive: AWS Direct Connect and VPNs
(NET406) Deep Dive: AWS Direct Connect and VPNs
 
Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017
Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017
Container Networking Deep Dive with Amazon ECS - CON401 - re:Invent 2017
 
(ARC402) Double Redundancy With AWS Direct Connect
(ARC402) Double Redundancy With AWS Direct Connect(ARC402) Double Redundancy With AWS Direct Connect
(ARC402) Double Redundancy With AWS Direct Connect
 
another day, another billion packets
another day, another billion packetsanother day, another billion packets
another day, another billion packets
 
Another day, another billion packets - Toronto
Another day, another billion packets - TorontoAnother day, another billion packets - Toronto
Another day, another billion packets - Toronto
 
A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013
A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013
A Day in the Life of a Billion Packets (CPN401) | AWS re:Invent 2013
 
VPC and DX PoP @ HKG
VPC and DX PoP @ HKGVPC and DX PoP @ HKG
VPC and DX PoP @ HKG
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Edge to Instance - AWS Networking
Edge to Instance - AWS Networking Edge to Instance - AWS Networking
Edge to Instance - AWS Networking
 
(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014
(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014
(GAM304) How Riot Games re:Invented Their AWS Model | AWS re:Invent 2014
 
Networking Advanced VPC Design and New Capabilities
Networking Advanced VPC Design and New CapabilitiesNetworking Advanced VPC Design and New Capabilities
Networking Advanced VPC Design and New Capabilities
 
Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...
Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...
Customer Case Study: Land Registry as a Service in the Cloud - AWS PS Summit ...
 
Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018
Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018
Kubernetes Networking in Amazon EKS (CON412) - AWS re:Invent 2018
 
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel AvivAWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
 
Lync 2010 deep dive edge
Lync 2010 deep dive edgeLync 2010 deep dive edge
Lync 2010 deep dive edge
 
Building Highly Scalable Immersive Media Solutions on AWS
Building Highly Scalable Immersive Media Solutions on AWSBuilding Highly Scalable Immersive Media Solutions on AWS
Building Highly Scalable Immersive Media Solutions on AWS
 
初探 AWS 平台上的 Docker 服務
初探 AWS 平台上的 Docker 服務初探 AWS 平台上的 Docker 服務
初探 AWS 平台上的 Docker 服務
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion Packets
 
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:Invent A Day in the life of a Cloud Network Engineer at Netflix D o n a v a n F r i t z : S r . C l o u d N e t w o r k S R E J o e l K o d a m a : S r . C l o u d N e t w o r k S R E N E T 3 0 3
  • 9. 10.0.0.0/8 EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instanceEC2 instance Account A Account B Account C AWS EC2-Classic
  • 10. EC2 instance EC2 instance EC2 instance EC2 instance Public Private VPC NAT Gateway Internet
  • 11. EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance VPC peering EC2 instance EC2 instance
  • 13. Globally Unique IP Space is Good 100.64.0.0/10
  • 17. DNS
  • 19.
  • 21.
  • 23. “Hi there, can someone help me resolve a network connectivity issue between one microservice to another?” - Sr. Platform Engineer “Does anyone know if there are any network weather events in us-east-1? We’ve seen a couple hosts run into network partitions.” - Sr. Database Engineer “I'm thinking this might be due to networking unpleasantness...” - Sr. Edge Engineer “I am seeing what seem to be network related errors on start-up.” - Stunning Colleague #1
  • 24. VPC Flow Logs Really Good, Meaningless Data. VPC Flow Logs Really Good. VPC Flow Logs
  • 25. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
  • 26. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
  • 27. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21
  • 29. EC2 instance Foo EC2 instance Foo Auto Scaling group EC2 instance Bar EC2 instance Bar Auto Scaling group EC2 instance Baz EC2 instance Baz Auto Scaling group Classic Load Balancer Lambda Function RDS DB instance Application Load Balancer ElastiCache Redis Instance
  • 30. 172.31.16.139 172.31.16.21 Foo Foo Auto Scaling group 172.31.16.54 Bar 172.31.16.248 Bar Auto Scaling group 172.31.61.95 Baz 172.16.31.10 Baz Auto Scaling group 172.31.16.22 172.31.16.19 172.31.16.60172.31.16.133172.31.16.231
  • 31. EC2 Instance EC2 Instance Foo Foo Auto Scaling group Amazon SQS
  • 32. 172.31.16.139 172.31.16.21 Foo Foo Auto Scaling group 72.21.207.173
  • 33. What app has these IPs?
  • 34. IP Address: 172.16.100.100 t0 tnt1 EC2 Instance t3 EC2 Instance t2 Lambda Function
  • 35.
  • 36. What app had these IPs, at this time?
  • 37. EC2 instance Foo EC2 instance Foo Auto Scaling group 172.31.0.0/16 EC2 instance Bar EC2 instance Bar Auto Scaling group 172.31.0.0/16
  • 38. What app had these IPs, at this time, in this routing domain?
  • 39. VPC Flow LogsIP Addresses Mean Nothing Challenges
  • 40. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21
  • 41. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/20641 TCP/22 IP Addresses Mean Nothing Stateless Challenges //
  • 42. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008??? IP Addresses Mean Nothing Stateless Challenges //
  • 43. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008memcached IP Addresses Mean Nothing Stateless Challenges //
  • 44. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008 Classic Load Balancer EC2 Instance memcached IP Addresses Mean Nothing Stateless Challenges //
  • 45. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008 Classic Load Balancer EC2 Instance memcached IP Addresses Mean Nothing Stateless Challenges //
  • 46. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008HTTP Classic Load Balancer EC2 Instance IP Addresses Mean Nothing Stateless Challenges //
  • 47. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008HTTP Classic Load Balancer EC2 Instance IP Addresses Mean Nothing Stateless Challenges //
  • 48. VPC Flow LogsIP Addresses Mean Nothing Stateless Fragmented Challenges
  • 49. instance instance 1 TCP Connection IP Addresses Mean Nothing Stateless Challenges Fragmented////
  • 50. instance instance 1 TCP Connection, 4 VPC Flow Log Records IP Addresses Mean Nothing Stateless Challenges Fragmented////
  • 51. instance instance 1 TCP Connection, 4 VPC Flow Log Records elastic network interface elastic network interface IP Addresses Mean Nothing Stateless Challenges Fragmented////
  • 52. instance Amazon SQS 1 TCP Connection, 6 VPC Flow Log Records VPC NAT Gateway IP Addresses Mean Nothing Stateless Challenges Fragmented////
  • 53. instance 2 TCP Connections Classic Load Balancer instance , 12 VPC Flow Log Records VPC NAT Gateway IP Addresses Mean Nothing Stateless Challenges Fragmented////
  • 54. instance What I care about instance IP Addresses Mean Nothing Stateless Challenges Fragmented////
  • 55. VPC Flow Logs We have a lot of Flow Logs IP Addresses Mean Nothing Stateless Fragmented Challenges
  • 56. 1,000,000+ Requests Per Second 4 AWS Regions 75+ of accounts 150,000+ EC2 Instances IP Addresses Mean Nothing Stateless Challenges We have a lot of Flow LogsFragmented //////
  • 57. 10,000,000+ Flow Log Records Every Second IP Addresses Mean Nothing Stateless Challenges We have a lot of Flow LogsFragmented //////
  • 58. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  • 59. What app had these IPs, at this time, in this routing domain? IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 60. f(domain, ip, time) = app IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 61. Sonar IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 62. Extract Transform Load AWS APIs / Logs Netflix APIs / Logs CloudWatch Events DNS Crawling Polling Event Processing Netflix Events t1: ip(172.31.2.2) + eni-123 t2: ip(172.31.2.2) + i-abcdef t3: ip(172.31.2.2) + titus cabc t10: ip(172.31.2.2) - titus cabc t11: ip(172.31.2.2) - eni-123 t12: ip(172.31.2.2) - i-abcdef ... IP Change Events t20: ip(1.1.1.1) + AWS SNS t21: ip(2.2.2.2) + AWS SQS t30: ip(2.2.2.2) - AWS SQS t31: ip(1.1.1.1) - AWS SNS ... Sonar IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 63. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  • 64. TCP/80 TCP/443 TCP/8080 TCP/8443 ... SSM Agent EC2 Instances IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 65. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  • 66. Known Deficiency IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 67. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  • 68. Dredge IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 69. IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// Dredge Amazon VPC Flow Logs (via Kinesis ) IP Change Events (Sonar) Stream Joins Netflix Data Pipeline
  • 70. VPC Flow Logs (via Amazon Kinesis) IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 71. Stream Joins 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// ACCEPT
  • 72. f(domain, ip, time) = app IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// IPv4 Addresses TimestampRouting Domain 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT Stream Joins
  • 73. f(0, 172.31.16.139, 1418530010) = f(0, 172.31.16.21, 1418530010) = foo bar IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// IPv4 Addresses TimestampRouting Domain 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT Stream Joins
  • 74. 172.31.16.139:20641 = Not Listening Outbound Request f(0, 172.31.16.139, 1418530010) = f(0, 172.31.16.21, 1418530010) = foo bar = IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// IPv4 Addresses TimestampRouting Domain 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT Stream Joins
  • 75. { srcIP: ‘172.31.16.139’, dstIP: ‘172.31.16.21’, srcPort: 20641, dstPort: 22, packets: 20, bytes: 4249, startTs: 1418530010, endTs: 1418530070, action: ‘ACCEPT’, srcApp: ‘foo’, dstApp: ‘bar’, state: ‘Outbound Request’, … } { srcIP: ‘ ’, dstIP: ‘ ’, srcPort: , dstPort: , packets: , bytes: , startTs: , endTs: , action: ‘ ’, srcApp: ‘ ’, dstApp: ‘ ’, state: ‘ ’, … } 172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070 foo bar Outbound Request 4249 20 Transform ACCEPT IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  • 76. Load IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// Netflix Data Pipeline { srcIP: ‘172.31.16.139’, dstIP: ‘172.31.16.21’, srcPort: 20641, dstPort: 22, packets: 20, bytes: 4249, startTs: 1418530010, endTs: 1418530070, action: ‘ACCEPT’, srcApp: ‘foo’, dstApp: ‘bar’, state: ‘Outbound Request’, … }
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 85. $ netstat -tpna Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 100.81.51.147:35024 100.81.3.96:8080 TIME_WAIT - tcp 0 0 100.81.51.147:27945 100.76.196.35:12345 ESTABLISHED - tcp 0 0 100.81.51.147:40881 100.76.155.222:12345 ESTABLISHED - tcp 0 0 100.81.51.147:58127 100.81.77.56:8080 ESTABLISHED - tcp 0 0 100.81.51.147:57581 100.76.157.241:12345 ESTABLISHED - tcp 0 0 100.81.51.147:8080 100.81.213.243:47269 ESTABLISHED - tcp 0 0 100.81.51.147:42184 100.81.50.229:8080 ESTABLISHED - tcp 0 0 100.81.51.147:37429 100.81.57.18:8080 TIME_WAIT - tcp 0 0 100.81.51.147:38336 100.81.75.198:8080 ESTABLISHED - tcp 0 0 100.81.51.147:21432 100.81.90.93:8080 TIME_WAIT - tcp 0 0 100.81.51.147:22824 100.76.228.39:12345 ESTABLISHED - tcp 1 0 100.81.51.147:13514 100.81.107.125:8080 CLOSE_WAIT - tcp 0 0 100.81.51.147:63556 100.81.160.56:8080 ESTABLISHED - tcp 0 0 100.81.51.147:21591 100.81.19.52:8080 TIME_WAIT - tcp 0 0 100.81.51.147:41689 100.81.59.253:8081 ESTABLISHED - tcp 0 0 100.81.51.147:8080 100.81.37.100:31639 FIN_WAIT2 - tcp 54 0 100.81.51.147:52883 52.218.128.113:443 ESTABLISHED - tcp 0 0 100.81.51.147:27556 100.76.198.44:12345 ESTABLISHED - tcp 1 0 100.81.51.147:25435 100.81.79.120:8080 CLOSE_WAIT - tcp 1 54 100.81.51.147:14703 52.218.128.121:443 ESTABLISHED - tcp 1 0 100.81.51.147:53777 100.81.107.125:8080 CLOSE_WAIT - tcp 0 0 100.81.51.147:38366 100.76.157.217:12345 ESTABLISHED - tcp 1 0 100.81.51.147:62763 100.81.107.125:8080 ESTABLISHED - tcp 0 0 100.81.51.147:55510 100.81.22.63:8080 TIME_WAIT - tcp 0 0 100.81.51.147:8080 100.81.234.159:27884 ESTABLISHED -
  • 86. +-----------+-------------+----------------------------+-----------+-----------+-------------+-----+ | Direction | ForeignKind | ExtraInfo | Account | Region | State | Qty | +-----------+-------------+----------------------------+-----------+-----------+-------------+-----+ | inbound | Instance | asg: bastion-v078 | 111111111 | us-west-1 | ESTABLISHED | 1 | | outbound | Instance | asg: ledo-v004 | 222222222 | us-east-1 | ESTABLISHED | 80 | | outbound | Instance | asg: ledo-v003 | 222222222 | us-east-1 | ESTABLISHED | 80 | | outbound | AwsService | dynamodb | | us-east-1 | ESTABLISHED | 19 | | outbound | AwsService | kinesis | | us-east-1 | ESTABLISHED | 14 | | outbound | Instance | asg: brigo-us-east-1e-v011 | 333333333 | us-east-1 | ESTABLISHED | 8 | | outbound | Instance | asg: brigo-us-east-1d-v011 | 333333333 | us-east-1 | ESTABLISHED | 8 | | outbound | Instance | asg: brigo-us-east-1c-v012 | 333333333 | us-east-1 | ESTABLISHED | 8 | | outbound | Instance | asg: berberb-v012 | 333333333 | us-east-1 | ESTABLISHED | 3 | | outbound | Instance | asg: pikango-v003 | 444444444 | us-east-1 | ESTABLISHED | 2 | | outbound | Instance | asg: endai-v003 | 555555555 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: kotts-v111 | 444444444 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: akrah-v000 | 333333333 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: barta-v095 | 333333333 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: padok-v061 | 333333333 | us-east-1 | ESTABLISHED | 1 | | outbound | AwsService | kinesis | | us-east-1 | TIME_WAIT | 3 | | outbound | Instance | asg: ledo-v004 | 222222222 | us-east-1 | TIME_WAIT | 2 | | outbound | Instance | asg: ledo-v003 | 222222222 | us-east-1 | TIME_WAIT | 1 | | outbound | Instance | asg: berberb-v012 | 333333333 | us-east-1 | TIME_WAIT | 1 | +-----------+-------------+----------------------------+-----------+-----------+-------------+-----+
  • 87.
  • 88.
  • 89. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Donavan Fritz: Sr. Cloud Network SRE d f r i t z @ n e t f l i x . c o m Joel Kodama: Sr. Cloud Network SRE j k o d a m a @ n e t f l i x . c o m