SlideShare a Scribd company logo
1 of 91
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Monitoring, Logging and Debugging
Containerized Services
N a r e H a y r a p e t y a n , S e n i o r S o f t w a r e E n g i n e e r , A W S
C a l v i n F r e n c h - O w e n , C o - f o u n d e r , S e g m e n t
C O N 3 2 0
N o v e m b e r 2 8 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring, Logging, and Debugging
• Microservices are hard to monitor and debug
• A lot of interacting components:
• load balancers
• instances
• clusters
• tasks
• Containers are transient
• Need visibility into all parts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring, Logging, and Debugging
• Can customers access my service?
• If not, what is failing?
• What trends and anomalies emerge?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key Metrics
Infrastructure
• cpu, memory
• load balancers
• disk space
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key Metrics
Application
• error rates
• request volume
• request latencies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring with Amazon CloudWatch
• ECS metrics are sent to Cloudatch
• Agent version >= 1.4.0
• CPU, memory reservation and utilization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring with Amazon CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring with Amazon CloudWatch
• ECS provides 1 minute metrics
• High resolution metrics: 5 seconds
• Custom metrics: disk space
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring with Amazon CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring with Datadog
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring with Sysdig
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Logging
• Call volume is up!
• Error rates spiked!
• What’s going on?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Logging
• Send logs to a logging framework
• ECS supports multiple log drivers
• Configure in task definition
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS Supported Log Drivers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWSLogs Driver
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon CloudWatch Logs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alarms with Amazon CloudWatch
• How do you know something’s wrong?
• Page yourself!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alarms with Amazon CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alarms with Amazon CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Debugging Task Failures
• New deployment
• Service is not launching tasks
• Tasks go into PENDING and disappear
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Debugging Service Failures
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Debugging Task Failures
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Debugging Task Failures
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Logs
• Instance, ecs-agent and Docker daemon logs
in CloudWatch
• Install and configure CloudWatch logs agent
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Logs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Trends
• Billions of log lines
• Extract trends and patterns
• Search and analytics tools
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Trends
Amazon CloudWatch Logs
Amazon S3
Amazon Kinesis
AWS Lambda
Amazon Elasticsearch Service
Amazon ECS
Store
Stream
Process
Search
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Trends
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons learned
• Metrics and logs are critical
• Monitor at all levels
• Log and alarm
• Use tools to aggregate and visualize data and
pinpoint anomalies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SEGMENT
Calvin French-Owen Co-Founder
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It’s 2 a.m., I’m getting paged…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It’s 2 a.m., I’m getting paged…
…now what?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Segment by the numbers
- 140 billion monthly events
- 160k events/second peak
- 16,000 containers
- 350 Amazon Container
Service (ECS)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Three steps to the (debugging) epiphany
1. Build your mental model
2. Dig in
3. Lean into the cloud
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Build your mental model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 1: Build your mental model
- How does the system fit together?
- How is my service configured?
- What should I even be looking at?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 1: Build your mental model
- How does the system fit together? specs
- How is my service configured?
- What should I even be looking at?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
$ docker run segment/specs
https://github.com/segmentio/specs
Running specs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 1: Build your mental model
- How does the system fit together? specs
- How is my service configured? specs and terraform
- What should I even be looking at?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A terraform ECS service
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Terraform service—under the hood
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 1: Build your mental model
- How does the system fit together? specs
- How is my service configured? specs and terraform
- What should I even be looking at? Amazon CloudWatch and Datadog
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 1: Build your mental model
- How does the system fit together? specs
- How is my service configured? specs and terraform
- What should I even be looking at? Amazon CloudWatch and Datadog
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
2. Dig in
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 2: Dig in
• Stats?
• Logging?
• Tracing?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 2: Dig in
• Stats? Amazon CloudWatch and statsd/veneur
• Logging?
• Tracing?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
service service service
veneur datadog-agent
api.datadog.com CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 2: Dig in
• Stats? Amazon CloudWatch and statsd/veneur
• Logging? CloudWatch, ECS-logs, cwlogs
• Tracing?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CloudWatch Logs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
…
RateLimitIntervalSec=1m
RateLimitBurst=200000
…
Journald.conf
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Logging
• github.com/segmentio/ecs-logs
• github.com/segmentio/ecs-logs-go
• github.com/segmentio/ecs-logs-js
• github.com/segmentio/rate-limiting-log-proxy
• github.com/segmentio/cwlogs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 2: Dig in
• Stats? Amazon CloudWatch and statsd/veneur
• Logging? CloudWatch, ECS-logs, cwlogs
• Tracing? BCC, pprof-server
• BPF Compiler Collection
• No kernel modules, no
instrumentation
• A lot of very useful tools
• https://github.com/iovisor/bcc
BCC
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Pprof is automatically exposed
by the go runtime
• Gives you profiling, heatmaps,
memory dumps, and more
• Nice visualizations, one URL
click away
• Server exposed by consul
• github.com/segmentio/pprof-server
Pprof-server
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
$ docker run -it --rm -p 6061:6061 
segment/pprof-server 
-registry consul://172.17.0.1:8500
github.com/segmentio/pprof-server
Pprof-server
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 2: Dig in
• Stats? Amazon CloudWatch and statsd/veneur
• Logging? CloudWatch, ECS-logs, cwlogs
• Tracing? BCC, pprof-server
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
3. Lean into the cloud
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 3: Lean into the cloud
- Cattle, not pets
- Auto-scale and pre-scale
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cattle, not pets
- Reproduceable machine images
- Built with packer
- Run via systems
- Out-of-the-box autoscaling
- Created with terraform
- github.com/segmentio/roll-instances
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Auto Scaling everywhere
- Comes default with any service
- Tuned for systems like Amazon DynamoDB
- No autoscaling==not ready for prod
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons learned
Build tools to surface actionable information first
Auto-scaling a huge win
Give developers alerting + scaling policies out of the box
Passive tools are easy to build adoption around
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
C L I C K T O A D D T E X T
C L I C K T O A D D T E X T

More Related Content

What's hot

Serverless DevOps to the Rescue - SRV330 - re:Invent 2017
Serverless DevOps to the Rescue - SRV330 - re:Invent 2017Serverless DevOps to the Rescue - SRV330 - re:Invent 2017
Serverless DevOps to the Rescue - SRV330 - re:Invent 2017Amazon Web Services
 
ENT212-An Overview of Best Practices for Large-Scale Migrations
ENT212-An Overview of Best Practices for Large-Scale MigrationsENT212-An Overview of Best Practices for Large-Scale Migrations
ENT212-An Overview of Best Practices for Large-Scale MigrationsAmazon Web Services
 
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017Amazon Web Services
 
MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...
MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...
MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...Amazon Web Services
 
CON318_Interstella 8888 Monolith to Microservices with Amazon ECS
CON318_Interstella 8888 Monolith to Microservices with Amazon ECSCON318_Interstella 8888 Monolith to Microservices with Amazon ECS
CON318_Interstella 8888 Monolith to Microservices with Amazon ECSAmazon Web Services
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfAmazon Web Services
 
What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017Amazon Web Services
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Amazon Web Services
 
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesAmazon Web Services
 
NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017
NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017
NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017Amazon Web Services
 
Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...Amazon Web Services
 
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...Amazon Web Services
 
IOT308-One Message to a Million Things Done in 60 seconds with AWS IoT
IOT308-One Message to a Million Things Done in 60 seconds with AWS IoTIOT308-One Message to a Million Things Done in 60 seconds with AWS IoT
IOT308-One Message to a Million Things Done in 60 seconds with AWS IoTAmazon Web Services
 
GPSTEC306-Continuous Compliance for Healthcare and Life Sciences
GPSTEC306-Continuous Compliance for Healthcare and Life SciencesGPSTEC306-Continuous Compliance for Healthcare and Life Sciences
GPSTEC306-Continuous Compliance for Healthcare and Life SciencesAmazon Web Services
 
MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...
MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...
MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...Amazon Web Services
 
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfDEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfAmazon Web Services
 
GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...
GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...
GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...Amazon Web Services
 
SRV310_Designing Microservices with Serverless
SRV310_Designing Microservices with ServerlessSRV310_Designing Microservices with Serverless
SRV310_Designing Microservices with ServerlessAmazon Web Services
 

What's hot (20)

Serverless DevOps to the Rescue - SRV330 - re:Invent 2017
Serverless DevOps to the Rescue - SRV330 - re:Invent 2017Serverless DevOps to the Rescue - SRV330 - re:Invent 2017
Serverless DevOps to the Rescue - SRV330 - re:Invent 2017
 
ENT212-An Overview of Best Practices for Large-Scale Migrations
ENT212-An Overview of Best Practices for Large-Scale MigrationsENT212-An Overview of Best Practices for Large-Scale Migrations
ENT212-An Overview of Best Practices for Large-Scale Migrations
 
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
 
MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...
MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...
MBL209_Learn How MicroStrategy on AWS is Helping Vivint Solar Deliver Clean E...
 
SID402_An AWS Security Odyssey
SID402_An AWS Security OdysseySID402_An AWS Security Odyssey
SID402_An AWS Security Odyssey
 
CON318_Interstella 8888 Monolith to Microservices with Amazon ECS
CON318_Interstella 8888 Monolith to Microservices with Amazon ECSCON318_Interstella 8888 Monolith to Microservices with Amazon ECS
CON318_Interstella 8888 Monolith to Microservices with Amazon ECS
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdf
 
What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
 
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
 
NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017
NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017
NEW LAUNCH! AWS IoT Device Management - IOT330 - re:Invent 2017
 
Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...
 
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
 
IOT308-One Message to a Million Things Done in 60 seconds with AWS IoT
IOT308-One Message to a Million Things Done in 60 seconds with AWS IoTIOT308-One Message to a Million Things Done in 60 seconds with AWS IoT
IOT308-One Message to a Million Things Done in 60 seconds with AWS IoT
 
Introducing Amazon EKS
Introducing Amazon EKSIntroducing Amazon EKS
Introducing Amazon EKS
 
GPSTEC306-Continuous Compliance for Healthcare and Life Sciences
GPSTEC306-Continuous Compliance for Healthcare and Life SciencesGPSTEC306-Continuous Compliance for Healthcare and Life Sciences
GPSTEC306-Continuous Compliance for Healthcare and Life Sciences
 
MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...
MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...
MBL309_User Engagement, Messaging, and Analytics Using Amazon Pinpoint from A...
 
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfDEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
 
GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...
GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...
GPSWKS404-GPS Game Changing C2S Services To Transform Your Customers Speed To...
 
SRV310_Designing Microservices with Serverless
SRV310_Designing Microservices with ServerlessSRV310_Designing Microservices with Serverless
SRV310_Designing Microservices with Serverless
 

Similar to CON320_Monitoring, Logging and Debugging Containerized Services

re:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized Servicesre:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized ServicesCalvin French-Owen
 
SID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamSID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamAmazon Web Services
 
Serverless: State of The Union I AWS Dev Day 2018
Serverless: State of The Union I AWS Dev Day 2018Serverless: State of The Union I AWS Dev Day 2018
Serverless: State of The Union I AWS Dev Day 2018AWS Germany
 
Introduction to the Serverless Cloud
Introduction to the Serverless CloudIntroduction to the Serverless Cloud
Introduction to the Serverless CloudAmazon Web Services
 
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Amazon Web Services
 
How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017
How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017
How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017Amazon Web Services
 
Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...
Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...
Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...Amazon Web Services
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...Amazon Web Services
 
SRV210 Improving Microservice and Serverless Observability with Monitoring Data
SRV210 Improving Microservice and Serverless Observability with Monitoring DataSRV210 Improving Microservice and Serverless Observability with Monitoring Data
SRV210 Improving Microservice and Serverless Observability with Monitoring DataNew Relic
 
FSV305-Optimizing Payments Collections with Containers and Machine Learning
FSV305-Optimizing Payments Collections with Containers and Machine LearningFSV305-Optimizing Payments Collections with Containers and Machine Learning
FSV305-Optimizing Payments Collections with Containers and Machine LearningAmazon Web Services
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Improving Microservice and Serverless Observability with Monitoring Data - SR...
Improving Microservice and Serverless Observability with Monitoring Data - SR...Improving Microservice and Serverless Observability with Monitoring Data - SR...
Improving Microservice and Serverless Observability with Monitoring Data - SR...Amazon Web Services
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech SessionCloudHealth by VMware
 
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...Amazon Web Services
 
Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...
Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...
Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...Amazon Web Services
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_SingaporeAmazon Web Services
 
DEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceDEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceAmazon Web Services
 

Similar to CON320_Monitoring, Logging and Debugging Containerized Services (20)

re:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized Servicesre:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized Services
 
SID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamSID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security Team
 
Serverless: State of The Union I AWS Dev Day 2018
Serverless: State of The Union I AWS Dev Day 2018Serverless: State of The Union I AWS Dev Day 2018
Serverless: State of The Union I AWS Dev Day 2018
 
Introduction to the Serverless Cloud
Introduction to the Serverless CloudIntroduction to the Serverless Cloud
Introduction to the Serverless Cloud
 
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
 
How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017
How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017
How Amazon.com Uses AWS Management Tools - DEV340 - re:Invent 2017
 
Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...
Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...
Cache Me If You Can Minimizing Latency While Optimizing Cost Through Advanced...
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
 
Serverless - State of the Union
Serverless - State of the UnionServerless - State of the Union
Serverless - State of the Union
 
SRV210 Improving Microservice and Serverless Observability with Monitoring Data
SRV210 Improving Microservice and Serverless Observability with Monitoring DataSRV210 Improving Microservice and Serverless Observability with Monitoring Data
SRV210 Improving Microservice and Serverless Observability with Monitoring Data
 
FSV305-Optimizing Payments Collections with Containers and Machine Learning
FSV305-Optimizing Payments Collections with Containers and Machine LearningFSV305-Optimizing Payments Collections with Containers and Machine Learning
FSV305-Optimizing Payments Collections with Containers and Machine Learning
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
 
Improving Microservice and Serverless Observability with Monitoring Data - SR...
Improving Microservice and Serverless Observability with Monitoring Data - SR...Improving Microservice and Serverless Observability with Monitoring Data - SR...
Improving Microservice and Serverless Observability with Monitoring Data - SR...
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech Session
 
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
 
Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...
Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...
Leveraging a Cloud Policy Framework - From Zero to Well Governed - ENT318 - r...
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
DEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceDEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 Service
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

CON320_Monitoring, Logging and Debugging Containerized Services

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Monitoring, Logging and Debugging Containerized Services N a r e H a y r a p e t y a n , S e n i o r S o f t w a r e E n g i n e e r , A W S C a l v i n F r e n c h - O w e n , C o - f o u n d e r , S e g m e n t C O N 3 2 0 N o v e m b e r 2 8 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring, Logging, and Debugging • Microservices are hard to monitor and debug • A lot of interacting components: • load balancers • instances • clusters • tasks • Containers are transient • Need visibility into all parts
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring, Logging, and Debugging • Can customers access my service? • If not, what is failing? • What trends and anomalies emerge?
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key Metrics Infrastructure • cpu, memory • load balancers • disk space
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key Metrics Application • error rates • request volume • request latencies
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring with Amazon CloudWatch • ECS metrics are sent to Cloudatch • Agent version >= 1.4.0 • CPU, memory reservation and utilization
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring with Amazon CloudWatch
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring with Amazon CloudWatch • ECS provides 1 minute metrics • High resolution metrics: 5 seconds • Custom metrics: disk space
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring with Amazon CloudWatch
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring with Datadog
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring with Sysdig
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Logging • Call volume is up! • Error rates spiked! • What’s going on?
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Logging • Send logs to a logging framework • ECS supports multiple log drivers • Configure in task definition
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS Supported Log Drivers
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWSLogs Driver
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon CloudWatch Logs
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alarms with Amazon CloudWatch • How do you know something’s wrong? • Page yourself!
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alarms with Amazon CloudWatch
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alarms with Amazon CloudWatch
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Debugging Task Failures • New deployment • Service is not launching tasks • Tasks go into PENDING and disappear
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Debugging Service Failures
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Debugging Task Failures
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Debugging Task Failures
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Logs • Instance, ecs-agent and Docker daemon logs in CloudWatch • Install and configure CloudWatch logs agent
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Logs
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Trends • Billions of log lines • Extract trends and patterns • Search and analytics tools
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Trends Amazon CloudWatch Logs Amazon S3 Amazon Kinesis AWS Lambda Amazon Elasticsearch Service Amazon ECS Store Stream Process Search
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Trends
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons learned • Metrics and logs are critical • Monitor at all levels • Log and alarm • Use tools to aggregate and visualize data and pinpoint anomalies
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SEGMENT Calvin French-Owen Co-Founder
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It’s 2 a.m., I’m getting paged…
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It’s 2 a.m., I’m getting paged… …now what?
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Segment by the numbers - 140 billion monthly events - 160k events/second peak - 16,000 containers - 350 Amazon Container Service (ECS)
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Three steps to the (debugging) epiphany 1. Build your mental model 2. Dig in 3. Lean into the cloud
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Build your mental model
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 1: Build your mental model - How does the system fit together? - How is my service configured? - What should I even be looking at?
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 1: Build your mental model - How does the system fit together? specs - How is my service configured? - What should I even be looking at?
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. $ docker run segment/specs https://github.com/segmentio/specs Running specs
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 1: Build your mental model - How does the system fit together? specs - How is my service configured? specs and terraform - What should I even be looking at?
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A terraform ECS service
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Terraform service—under the hood
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 1: Build your mental model - How does the system fit together? specs - How is my service configured? specs and terraform - What should I even be looking at? Amazon CloudWatch and Datadog
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 1: Build your mental model - How does the system fit together? specs - How is my service configured? specs and terraform - What should I even be looking at? Amazon CloudWatch and Datadog
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2. Dig in
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 2: Dig in • Stats? • Logging? • Tracing?
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 2: Dig in • Stats? Amazon CloudWatch and statsd/veneur • Logging? • Tracing?
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. service service service veneur datadog-agent api.datadog.com CloudWatch
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 2: Dig in • Stats? Amazon CloudWatch and statsd/veneur • Logging? CloudWatch, ECS-logs, cwlogs • Tracing?
  • 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CloudWatch Logs
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. … RateLimitIntervalSec=1m RateLimitBurst=200000 … Journald.conf
  • 65. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 66. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 67. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 68. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 69. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 70. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 71. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Logging • github.com/segmentio/ecs-logs • github.com/segmentio/ecs-logs-go • github.com/segmentio/ecs-logs-js • github.com/segmentio/rate-limiting-log-proxy • github.com/segmentio/cwlogs
  • 72. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 2: Dig in • Stats? Amazon CloudWatch and statsd/veneur • Logging? CloudWatch, ECS-logs, cwlogs • Tracing? BCC, pprof-server
  • 73. • BPF Compiler Collection • No kernel modules, no instrumentation • A lot of very useful tools • https://github.com/iovisor/bcc BCC
  • 74. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 75. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 76. • Pprof is automatically exposed by the go runtime • Gives you profiling, heatmaps, memory dumps, and more • Nice visualizations, one URL click away • Server exposed by consul • github.com/segmentio/pprof-server Pprof-server
  • 77. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 78. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 79. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 80. $ docker run -it --rm -p 6061:6061 segment/pprof-server -registry consul://172.17.0.1:8500 github.com/segmentio/pprof-server Pprof-server
  • 81. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 2: Dig in • Stats? Amazon CloudWatch and statsd/veneur • Logging? CloudWatch, ECS-logs, cwlogs • Tracing? BCC, pprof-server
  • 82. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3. Lean into the cloud
  • 83. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 3: Lean into the cloud - Cattle, not pets - Auto-scale and pre-scale
  • 84. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cattle, not pets - Reproduceable machine images - Built with packer - Run via systems - Out-of-the-box autoscaling - Created with terraform - github.com/segmentio/roll-instances
  • 85. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 86. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 87. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 88. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Auto Scaling everywhere - Comes default with any service - Tuned for systems like Amazon DynamoDB - No autoscaling==not ready for prod
  • 89. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 90. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons learned Build tools to surface actionable information first Auto-scaling a huge win Give developers alerting + scaling policies out of the box Passive tools are easy to build adoption around
  • 91. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! C L I C K T O A D D T E X T C L I C K T O A D D T E X T