1 0 0 % V I S I B I L I T Y
H O L I S T I C A L LY V I E W I N G S Y S T E M S
A M B I G U O U S C Y L I N D E R S
P E R S P E C T I V E M AT T E R S
J A S O N Y E E
D O C S & TA L K S
T R AV E L H A C K E R
P O K E M O N T R A I N E R
W H I S K E Y H U N T E R
T W: @ g i t b i s e c t
E M : j y e e @ d a t a d o g h q . c o m
D ATA D O G
S A A S - B A S E D M O N I T O R I N G
T R I L L I O N S O F P O I N T S / D AY
W E ’ R E H I R I N G :
j o b s . d a t a d o g h q . c o m
T W: @ d a t a d o g h q
V I S I B I L I T Y ?
W H E R E A R E W E G E T T I N G
D E V O P S ?
W H AT I S
C u l t u re , A u t o m a t i o n , M e t r i c s , S h a r i n g
D E V O P S I S
C A M S
• Culture - collaboration & learning
C A M S
• Culture - collaboration & learning
• Automation - accelerate tasks & reduce errors
C A M S
• Culture - collaboration & learning
• Automation - accelerate tasks & reduce errors
• Measurement - know how you’re doing & improve
C A M S
• Culture - collaboration & learning
• Automation - accelerate tasks & reduce errors
• Measurement - know how you’re doing & improve
• Sharing - spread information
This is not DevOps
N O V E N N D I A G R A M S !
W I T H C A M S T H E R E A R E
T H E S TA C K
T H E T R A D I T I O N A L V I E W O F
R E T H I N K T H E S TA C K
W E N E E D T O
I N F R A S T R U C T U R E
V I S I B I L I T Y
The Data
• Metrics
• Logs
The Tools
• Infrastructure Monitoring
• Log Management
VA L U E - B A S E D D ATA
W H AT I S A M E T R I C ?
M E T R I C S
• Often combined or aggregated
M E T R I C S
• Often combined or aggregated
• Useful for spotting trends/patterns
M E T R I C S
• Often combined or aggregated
• Useful for spotting trends/patterns
• Send alerts from metrics
M E T R I C S
• Often combined or aggregated
• Useful for spotting trends/patterns
• Send alerts from metrics
• Help catch known unknowns
Unown Pokemon
L O G S
• Event-based
L O G S
• Event-based
• Easy to read for humans
L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
• Ideally verbose & contain a lot of information
L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
• Ideally verbose & contain a lot of information
• Useful for finding details of an event
L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
• Ideally verbose & contain a lot of information
• Useful for finding details of an event
• Help catch unknown unknowns
The Data
• Metrics
• Logs
• Traces
The Tools
• Application Monitoring
• Log Management
• APM
B A C K E N D
V I S I B I L I T Y
T R A C E S
• Request-based
T R A C E S
• Request-based
• Follow activity from request across function and service
calls.
T R A C E S
• Request-based
• Follow activity from request across function and service
calls.
• Useful for following code to answer “Where?” and
“How long?”
The Data
• Metrics
The Tools
• Real-User Monitoring
(RUM)
• Synthetics
F R O N T E N D
V I S I B I L I T Y
P E O P L E & R O B O T S
• RUM & Synthetics work best together
P E O P L E & R O B O T S
• RUM & Synthetics work best together
• RUM provides insight into how users actually use a
product
P E O P L E & R O B O T S
• RUM & Synthetics work best together
• RUM provides insight into how users actually use a
product
• Synthetics operate independently of users
D AT E - A - D O G
W H AT ’ S I T A L L M E A N ?
T I N D E R F O R P U P S
T H I S A P P I S
G R E AT !
W H O ’ S A G O O D B O Y ? ! ?
I G O T TA T E L L
M Y F R I E N D S
A B O U T T H I S
A P P !
T H E Y ’ R E S O C U T E ! ! !
A N D M Y
F R I E N D S A R E
G O N N A T E L L
T H E I R F R I E N D S …
A A A W W W W W W W ! ! !
W H AT J U S T
H A P P E N E D ? ! ?
W H E R E ’ D T H E P U P P I E S G O ?
H O W D O W E K N O W
S O M E T H I N G W E N T W R O N G ?
U S E R S A R E H AV I N G A H O R R I B L E E X P E R I E N C E
R E A L - U S E R
M O N I T O R I N G
H O W D O W E K N O W ?
R E A L - U S E R M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
S Y N T H E T I C S
H O W D O W E K N O W ?
S Y N T H E T I C S
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
S C E N A R I O : T H I R D PA R T Y C D N O U TA G E
We host puppy photos on Fastly & the app pulls
directly from the Fastly CDN. Fastly suffers massive
DDOS attack.
S C E N A R I O : T H I R D PA R T Y C D N O U TA G E
We host puppy photos on Fastly & the app pulls
directly from the Fastly CDN. Fastly suffers massive
DDOS attack.
• RUM & Synthetics: Will alert and can show what assets
are slow or are not being served.
S C E N A R I O : T H I R D PA R T Y C D N O U TA G E
We host puppy photos on Fastly & the app pulls
directly from the Fastly CDN. Fastly suffers massive
DDOS attack.
• RUM & Synthetics: Will alert and can show what assets
are slow or are not being served.
• APM, Application and Infrastructure Monitoring: No
alerts. Everything is fine!
T R A C I N G ( A P M )
H O W D O W E K N O W ?
T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
T R A C I N G ( A P M )
H O W D O W E K N O W W H AT W E N T W R O N G ?
T R A C I N G ( A P M )
H O W D O W E K N O W W H AT W E N T W R O N G ?
T R A C I N G ( A P M )
H O W D O W E K N O W W H AT W E N T W R O N G ?
S C E N A R I O : S E R V I C E O U TA G E
We use an image resizing/optimizing service that
resizes images asynchronously. It has issues. Images are
returned slowly.
S C E N A R I O : S E R V I C E O U TA G E
We use an image resizing/optimizing service that
resizes images asynchronously. It has issues. Images are
returned slowly.
• RUM & Synthetics: Might see alerts, but not know
where
S C E N A R I O : S E R V I C E O U TA G E
We use an image resizing/optimizing service that
resizes images asynchronously. It has issues. Images are
returned slowly.
• RUM & Synthetics: Might see alerts, but not know
where
• Application & Infrastructure Monitoring: Everything is
fine!
S C E N A R I O : S E R V I C E O U TA G E
We use an image resizing/optimizing service that
resizes images asynchronously. It has issues. Images are
returned slowly.
• RUM & Synthetics: Might see alerts, but not know
where
• Application & Infrastructure Monitoring: Everything is
fine!
• APM: Can alert on latency and show where in the code
you are making the API calls.
A P P L I C AT I O N
M O N I T O R I N G
H O W D O W E K N O W ?
S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly
checks password hashes, so all user logins fail.
S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly
checks password hashes, so all user logins fail.
• RUM & Synthetics, APM: No alerts. Everything is fine!
S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly
checks password hashes, so all user logins fail.
• RUM & Synthetics, APM: No alerts. Everything is fine!
• Infrastructure Monitoring: No alerts. Everything is fine!
S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly
checks password hashes, so all user logins fail.
• RUM & Synthetics, APM: No alerts. Everything is fine!
• Infrastructure Monitoring: No alerts. Everything is fine!
• Application Monitoring: Will alert impact on custom
metrics and can help identify why.
A P P L I C AT I O N M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
I N F R A S T R U C T U R E
M O N I T O R I N G
H O W D O W E K N O W ?
I N F R A S T R U C T U R E M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
• RUM & Synthetics, APM, Application Monitoring: Alerts
that latency is high. Will not be able to help identify
why.
S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
• RUM & Synthetics, APM, Application Monitoring: Alerts
that latency is high. Will not be able to help identify
why.
• Infrastructure Monitoring: Alerts on high resource use
and may be able to trigger automatic remediation.
A N O M A LY D E T E C T I O N
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
H O W D O W E K N O W
W H AT W E N T W R O N G ?
U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
L O G S
E X P L O R I N G W H AT W E N T W R O N G
H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
• Get multiple perspectives
H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
• Get multiple perspectives
• Consider all 5 observability tools:
• RUM
• Synthetics
• Tracing
• Application+Infrastructure Monitoring
• Logs
Q U E S T I O N S ?
@ G I T B I S E C T
J Y E E @ D ATA D O G H Q . C O M
S L I D E S : h t t p : / / b i t . l y / c m - 1 0 0 v i z
@ G I T B I S E C T
J Y E E @ D ATA D O G H Q . C O M

100% Visibility - Jason Yee - Codemotion Amsterdam 2018

  • 1.
    1 0 0% V I S I B I L I T Y H O L I S T I C A L LY V I E W I N G S Y S T E M S
  • 2.
    A M BI G U O U S C Y L I N D E R S P E R S P E C T I V E M AT T E R S
  • 3.
    J A SO N Y E E D O C S & TA L K S T R AV E L H A C K E R P O K E M O N T R A I N E R W H I S K E Y H U N T E R T W: @ g i t b i s e c t E M : j y e e @ d a t a d o g h q . c o m
  • 4.
    D ATA DO G S A A S - B A S E D M O N I T O R I N G T R I L L I O N S O F P O I N T S / D AY W E ’ R E H I R I N G : j o b s . d a t a d o g h q . c o m T W: @ d a t a d o g h q
  • 5.
    V I SI B I L I T Y ? W H E R E A R E W E G E T T I N G
  • 6.
    D E VO P S ? W H AT I S
  • 7.
    C u lt u re , A u t o m a t i o n , M e t r i c s , S h a r i n g D E V O P S I S
  • 8.
    C A MS • Culture - collaboration & learning
  • 9.
    C A MS • Culture - collaboration & learning • Automation - accelerate tasks & reduce errors
  • 10.
    C A MS • Culture - collaboration & learning • Automation - accelerate tasks & reduce errors • Measurement - know how you’re doing & improve
  • 11.
    C A MS • Culture - collaboration & learning • Automation - accelerate tasks & reduce errors • Measurement - know how you’re doing & improve • Sharing - spread information
  • 13.
  • 14.
    N O VE N N D I A G R A M S ! W I T H C A M S T H E R E A R E
  • 15.
    T H ES TA C K T H E T R A D I T I O N A L V I E W O F
  • 16.
    R E TH I N K T H E S TA C K W E N E E D T O
  • 17.
    I N FR A S T R U C T U R E V I S I B I L I T Y The Data • Metrics • Logs The Tools • Infrastructure Monitoring • Log Management
  • 18.
    VA L UE - B A S E D D ATA W H AT I S A M E T R I C ?
  • 19.
    M E TR I C S • Often combined or aggregated
  • 20.
    M E TR I C S • Often combined or aggregated • Useful for spotting trends/patterns
  • 21.
    M E TR I C S • Often combined or aggregated • Useful for spotting trends/patterns • Send alerts from metrics
  • 22.
    M E TR I C S • Often combined or aggregated • Useful for spotting trends/patterns • Send alerts from metrics • Help catch known unknowns Unown Pokemon
  • 23.
    L O GS • Event-based
  • 24.
    L O GS • Event-based • Easy to read for humans
  • 25.
    L O GS • Event-based • Easy to read for humans • Well structured & easy to parse/grep for computers
  • 26.
    L O GS • Event-based • Easy to read for humans • Well structured & easy to parse/grep for computers • Ideally verbose & contain a lot of information
  • 27.
    L O GS • Event-based • Easy to read for humans • Well structured & easy to parse/grep for computers • Ideally verbose & contain a lot of information • Useful for finding details of an event
  • 28.
    L O GS • Event-based • Easy to read for humans • Well structured & easy to parse/grep for computers • Ideally verbose & contain a lot of information • Useful for finding details of an event • Help catch unknown unknowns
  • 29.
    The Data • Metrics •Logs • Traces The Tools • Application Monitoring • Log Management • APM B A C K E N D V I S I B I L I T Y
  • 30.
    T R AC E S • Request-based
  • 31.
    T R AC E S • Request-based • Follow activity from request across function and service calls.
  • 32.
    T R AC E S • Request-based • Follow activity from request across function and service calls. • Useful for following code to answer “Where?” and “How long?”
  • 33.
    The Data • Metrics TheTools • Real-User Monitoring (RUM) • Synthetics F R O N T E N D V I S I B I L I T Y
  • 34.
    P E OP L E & R O B O T S • RUM & Synthetics work best together
  • 35.
    P E OP L E & R O B O T S • RUM & Synthetics work best together • RUM provides insight into how users actually use a product
  • 36.
    P E OP L E & R O B O T S • RUM & Synthetics work best together • RUM provides insight into how users actually use a product • Synthetics operate independently of users
  • 37.
    D AT E- A - D O G W H AT ’ S I T A L L M E A N ? T I N D E R F O R P U P S
  • 38.
    T H IS A P P I S G R E AT ! W H O ’ S A G O O D B O Y ? ! ?
  • 39.
    I G OT TA T E L L M Y F R I E N D S A B O U T T H I S A P P ! T H E Y ’ R E S O C U T E ! ! !
  • 40.
    A N DM Y F R I E N D S A R E G O N N A T E L L T H E I R F R I E N D S … A A A W W W W W W W ! ! !
  • 41.
    W H ATJ U S T H A P P E N E D ? ! ? W H E R E ’ D T H E P U P P I E S G O ?
  • 42.
    H O WD O W E K N O W S O M E T H I N G W E N T W R O N G ? U S E R S A R E H AV I N G A H O R R I B L E E X P E R I E N C E
  • 44.
    R E AL - U S E R M O N I T O R I N G H O W D O W E K N O W ?
  • 45.
    R E AL - U S E R M O N I T O R I N G H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 46.
    S Y NT H E T I C S H O W D O W E K N O W ?
  • 47.
    S Y NT H E T I C S H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 48.
    S C EN A R I O : T H I R D PA R T Y C D N O U TA G E We host puppy photos on Fastly & the app pulls directly from the Fastly CDN. Fastly suffers massive DDOS attack.
  • 49.
    S C EN A R I O : T H I R D PA R T Y C D N O U TA G E We host puppy photos on Fastly & the app pulls directly from the Fastly CDN. Fastly suffers massive DDOS attack. • RUM & Synthetics: Will alert and can show what assets are slow or are not being served.
  • 50.
    S C EN A R I O : T H I R D PA R T Y C D N O U TA G E We host puppy photos on Fastly & the app pulls directly from the Fastly CDN. Fastly suffers massive DDOS attack. • RUM & Synthetics: Will alert and can show what assets are slow or are not being served. • APM, Application and Infrastructure Monitoring: No alerts. Everything is fine!
  • 51.
    T R AC I N G ( A P M ) H O W D O W E K N O W ?
  • 52.
    T R AC I N G ( A P M ) H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 53.
    T R AC I N G ( A P M ) H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 54.
    T R AC I N G ( A P M ) H O W D O W E K N O W W H AT W E N T W R O N G ?
  • 55.
    T R AC I N G ( A P M ) H O W D O W E K N O W W H AT W E N T W R O N G ?
  • 56.
    T R AC I N G ( A P M ) H O W D O W E K N O W W H AT W E N T W R O N G ?
  • 57.
    S C EN A R I O : S E R V I C E O U TA G E We use an image resizing/optimizing service that resizes images asynchronously. It has issues. Images are returned slowly.
  • 58.
    S C EN A R I O : S E R V I C E O U TA G E We use an image resizing/optimizing service that resizes images asynchronously. It has issues. Images are returned slowly. • RUM & Synthetics: Might see alerts, but not know where
  • 59.
    S C EN A R I O : S E R V I C E O U TA G E We use an image resizing/optimizing service that resizes images asynchronously. It has issues. Images are returned slowly. • RUM & Synthetics: Might see alerts, but not know where • Application & Infrastructure Monitoring: Everything is fine!
  • 60.
    S C EN A R I O : S E R V I C E O U TA G E We use an image resizing/optimizing service that resizes images asynchronously. It has issues. Images are returned slowly. • RUM & Synthetics: Might see alerts, but not know where • Application & Infrastructure Monitoring: Everything is fine! • APM: Can alert on latency and show where in the code you are making the API calls.
  • 61.
    A P PL I C AT I O N M O N I T O R I N G H O W D O W E K N O W ?
  • 62.
    S C EN A R I O : D E V D E P L O Y S B A D C O D E Developer accidentally deploys code that improperly checks password hashes, so all user logins fail.
  • 63.
    S C EN A R I O : D E V D E P L O Y S B A D C O D E Developer accidentally deploys code that improperly checks password hashes, so all user logins fail. • RUM & Synthetics, APM: No alerts. Everything is fine!
  • 64.
    S C EN A R I O : D E V D E P L O Y S B A D C O D E Developer accidentally deploys code that improperly checks password hashes, so all user logins fail. • RUM & Synthetics, APM: No alerts. Everything is fine! • Infrastructure Monitoring: No alerts. Everything is fine!
  • 65.
    S C EN A R I O : D E V D E P L O Y S B A D C O D E Developer accidentally deploys code that improperly checks password hashes, so all user logins fail. • RUM & Synthetics, APM: No alerts. Everything is fine! • Infrastructure Monitoring: No alerts. Everything is fine! • Application Monitoring: Will alert impact on custom metrics and can help identify why.
  • 66.
    A P PL I C AT I O N M O N I T O R I N G H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 67.
    I N FR A S T R U C T U R E M O N I T O R I N G H O W D O W E K N O W ?
  • 68.
    I N FR A S T R U C T U R E M O N I T O R I N G H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 69.
    S C EN A R I O : W E ’ R E T O O P O P U L A R Everyone loves puppies and we’re completely out of resources.
  • 70.
    S C EN A R I O : W E ’ R E T O O P O P U L A R Everyone loves puppies and we’re completely out of resources. • RUM & Synthetics, APM, Application Monitoring: Alerts that latency is high. Will not be able to help identify why.
  • 71.
    S C EN A R I O : W E ’ R E T O O P O P U L A R Everyone loves puppies and we’re completely out of resources. • RUM & Synthetics, APM, Application Monitoring: Alerts that latency is high. Will not be able to help identify why. • Infrastructure Monitoring: Alerts on high resource use and may be able to trigger automatic remediation.
  • 72.
    A N OM A LY D E T E C T I O N H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
  • 73.
    H O WD O W E K N O W W H AT W E N T W R O N G ?
  • 74.
    U N TI L Y O U F I N D T H E C A U S E S R E C U R S E R E C U R S E R E C U R S E
  • 75.
    U N TI L Y O U F I N D T H E C A U S E S R E C U R S E R E C U R S E R E C U R S E
  • 76.
    L O GS E X P L O R I N G W H AT W E N T W R O N G
  • 77.
    H O WT O G E T 1 0 0 % V I S I B I L I T Y ? • Think about your system as a whole
  • 78.
    H O WT O G E T 1 0 0 % V I S I B I L I T Y ? • Think about your system as a whole • Get multiple perspectives
  • 79.
    H O WT O G E T 1 0 0 % V I S I B I L I T Y ? • Think about your system as a whole • Get multiple perspectives • Consider all 5 observability tools: • RUM • Synthetics • Tracing • Application+Infrastructure Monitoring • Logs
  • 80.
    Q U ES T I O N S ? @ G I T B I S E C T J Y E E @ D ATA D O G H Q . C O M
  • 81.
    S L ID E S : h t t p : / / b i t . l y / c m - 1 0 0 v i z @ G I T B I S E C T J Y E E @ D ATA D O G H Q . C O M