• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Instrumentation as a Living Documentation: Teaching Humans About Complex Systems
 

Instrumentation as a Living Documentation: Teaching Humans About Complex Systems

on

  • 2,151 views

Instrumentation of Complex Systems is necessary and addresses the issues of static documentation of said systems. Instrumentation is flawed, flaws which are resolvable with an intentional kind of ...

Instrumentation of Complex Systems is necessary and addresses the issues of static documentation of said systems. Instrumentation is flawed, flaws which are resolvable with an intentional kind of documentation.

Given at Write the Docs, Portland OR 2014.

Statistics

Views

Total Views
2,151
Views on SlideShare
2,037
Embed Views
114

Actions

Likes
6
Downloads
23
Comments
0

7 Embeds 114

https://twitter.com 74
http://blog.troutwine.us 17
http://dev.pubmedia.us 12
http://localhost 7
http://feedly.com 2
http://www.newsblur.com 1
http://www.slideee.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Instrumentation as a Living Documentation: Teaching Humans About Complex Systems Instrumentation as a Living Documentation: Teaching Humans About Complex Systems Presentation Transcript

    • Instrumentation as a Living DocumentationTEACHING HUMANS ABOUT COMPLEX SYSTEMS
    • I do things to/with computers.
    • I build real-time systems.
    • I build distributed systems.
    • I build critical systems.
    • AdRoll
    • L E S S T H I S
    • M O R E T H I S
    • W E ’ R E A N A D T E C H C O M P A N Y .
    • R E A L - T I M E B I D D I N G
    • The nature of the problem domain: • Low latency ( < 100ms per transaction ) • Firm real-time system • Highly concurrent ( > 55 billion transactions per day ) • Global, 24/7 operation
    • I build Complex Systems
    • Complex Systems • Non-linear feedback • Tightly coupled to external systems • Difficult to model, understand • Usually a solution to some “wicked problem”
    • - - C . W E S T C H U R C H M A N , - G U E S T E D I T O R I A L : W I C K E D P R O B L E M S - M A N A G E M E N T S C I E N C E V O L . 4 , 1 9 6 7 [WICKED PROBLEMS ARE] SOCIAL PROBLEMS WHICH ARE ILL FORMULATED, WHERE THE INFORMATION IS CONFUSING, WHERE THERE ARE MANY CLIENTS AND DECISION-MAKERS WITH CONFLICTING VALUES, AND WHERE THE RAMIFICATIONS IN THE WHOLE SYSTEM ARE THOROUGHLY CONFUSING. […] THE ADJECTIVE ‘WICKED’ IS SUPPOSED TO DESCRIBE THE MISCHIEVOUS AND EVEN EVIL QUALITY OF THESE PROBLEMS, WHERE PROPOSED ‘SOLUTIONS’ OFTEN TURN OUT TO BE WORSE THAN THE SYMPTOMS.”
    • Bad things happen when Complex Systems fail.
    • Complex Systems often create worse problems than those they solve.
    • HUMANS ARE BAD AT PREDICTING THE PERFORMANCE OF COMPLEX SYSTEMS(…). OUR ABILITY TO CREATE LARGE AND COMPLEX SYSTEMS FOOLS US INTO BELIEVING THAT WE’RE ALSO ENTITLED TO UNDERSTAND THEM. C A R L O S B U E N O “ M AT U R E O P T I M I Z AT I O N H A N D B O O K ”
    • The key challenge to sustaining a complex system is maintaining our understanding of it.
    • We write documentation.
    • Complex systems are fiendishly difficult to communicate about.
    • Miscommunications are accidents in the making.
    • Documentation reduces accidents.
    • I F Y O U D O N ’ T K N O W H O W T H E S Y S T E M S H O U L D B E H A V E Y O U C A N ’ T S AY H O W I T S H O U L D N ’ T O R I S N ’ T .
    • Trouble is, documentation goes out of date.
    • Complex Systems evolve and written words “rot” as the system moves on.
    • Engineers fail to update documentation as the system changes.
    • D AV I D E . H O F F M A N “ T H E D E A D H A N D : T H E U N T O L D S T O R Y O F T H E C O L D WA R A R M S R A C E A N D I T ’ S D A N G E R O U S L E G A C Y ” ONE OPERATOR (…) WAS CONFUSED BY THE LOGBOOK. HE CALLED SOMEONE ELSE TO INQUIRE. ! “WHAT SHALL I DO?” HE ASKED. “IN THE PROGRAM THERE ARE INSTRUCTIONS OF WHAT TO DO, AND THEN A LOT OF THINGS CROSSED OUT.” ! THE OTHER PERSON THOUGHT FOR A MINUTE, THEN R E P L I E D , “ F O L L O W T H E C R O S S E D O U T INSTRUCTIONS.”
    • Engineers can be unaware of the system as it is actually used.
    • E R I C S C H L O S S E R C O M M A N D A N D C O N T R O L : N U C L E A R W E A P O N S , T H E D A M A S C U S A C C I D E N T, A N D T H E I L L U S I O N O F S A F E T Y CLEARLY THE TEXTBOOKS (…) DIDN’T TELL YOU WHAT REALLY HAPPENED IN THE FIELD. (…) (T)HERE WAS A WAY YOU WERE SUPPOSED TO DO THINGS – AND THE WAY THINGS GOT DONE. RFHCO SUITS WERE HOT AND CUMBERSOME (…) AND IF A MAINTENANCE TASK COULD BE ACCOMPLISHED QUICKLY WITHOUT AN OFFICER NOTICING, SOMETIMES THE SUITS WEREN’T WORN.
    • (Normal) Accidents happen.
    • H E N R Y S . F. C O O P E R , J R . X I I I : T H E A P O L L O F L I G H T T H AT FA I L E D THE FIRST DISASTER IN SPACE HAD OCCURRED, AND NO ONE KNEW WHAT HAD HAPPENED. ON THE GROUND, THE FLIGHT CONTROLLERS W E R E N O T E V E N S U R E T H AT ANYTHING HAD.
    • Documentation doesn’t necessarily reflect the reality of the system.
    • What can we do?
    • INSTRUMENTATION
    • Instrumentation reflects the reality of the system as it exists.
    • Instrumentation allows users and engineers to explore the system as it exists.
    • Exploration, done honestly, guides us to a new, better understanding of the system.
    • THIS “COLLECTIVE ENTITY” WAS ORGANIZED AROUND THE PILOT TO MAKE IT “SAFER AND MORE EFFICIENT IF THERE WAS A FOCAL POINT. AND I WAS THE FOCAL POINT. JIM FED THINGS INTO MY EARS. THE MOON FED THINGS INTO MY EYES AND I COULD FEEL THE MACHINE OPERATING.” C O M M A N D E R D AV I D S C O T T A S Q U O T E D I N D AV I D A . M I N D E L L ' S D I G I TA L A P O L L O : H U M A N A N D M A C H I N E I N S PA C E F L I G H T
    • Instrumentation democratizes the organization around a complex system.
    • Case Studies
    • Case Study: Exchange Throttling
    • Case Study: Exchange Throttling Healthy pattern of bid requests
    • Case Study: Exchange Throttling The trough of throttling
    • B A D G O O D Case Study: Exchange Throttling
    • Problem confirmed with Exchange Case Study: Exchange Throttling
    • Case Study: Exchange Throttling • All other metrics (run-queue, CPU, network IO) were fine. • Confirmed that no changes had been made to the running systems via deployment. • Amazon data showed no network issues to our machines.
    • What happened? Case Study: Exchange Throttling
    • We hit an implicit exchange limit. (Arguably, a bug.) Case Study: Exchange Throttling
    • Case Study: Timeout Jumps
    • Case Study: Timeout Jumps Healthy Pattern of Background Timeouts
    • Case Study: Timeout Jumps Unhealthy timeouts.
    • Case Study: Timeout Jumps Healthy Bid Requests
    • Case Study: Timeout Jumps Unhealthy Bid Requests Cliff of Throttling
    • Case Study: Timeout Jumps • Timeouts jump occurred only in US East, US West fine. • All other metrics (as above) checked out. • System deployment strongly correlated with timeout jump. • Rollback to previous release reduce timeouts to acceptable levels.
    • What happened? Case Study: Timeout Jumps
    • Who can say? ¯_(シ)_/¯ Case Study: Timeout Jumps
    • Lessons Learned
    • It is possible to have too little information.
    • (THE FIREFIGHTERS) TRIED TO BEAT DOWN THE FLAMES (OF CHERNOBYL REACTOR 4). THEY KICKED AT THE BURNING GRAPHITE WITH THEIR FEET. … THE DOCTORS KEPT TELLING THEM THEY’D BEEN POISONED BY GAS. - S V E T L A N A A L E X I E V I C H - V O I C E S F R O M C H E R N O B Y L : T H E O R A L H I S T O R Y O F A N U C L E A R D I S A S T E R
    • It is possible to collect too much information, or present it badly.
    • SAFETY SYSTEMS, SUCH AS WARNING LIGHTS, ARE NECESSARY, BUT THEY HAVE THE POTENTIAL FOR DECEPTION. (…) ONE OF THE LESSONS OF COMPLEX SYSTEMS AND (THREE MILE ISLAND) IS THAT ANY PART OF THE SYSTEM MIGHT BE INTERACTING WITH OTHER PARTS IN UNANTICIPATED WAYS. - C H A R L E S P E R R O W - N O R M A L A C C I D E N T S : L I V I N G W I T H H I G H - R I S K T E C H N O L O G I E S
    • Instrumentation is not a panacea.
    • Instruments may be misleading.
    • Must know some Mathematics.
    • Too much information hampers interpretation.
    • Instruments may be inaccurate.
    • Instruments may be ignored.
    • Instrumentation may be used for undesirable purposes.
    • What can we do?
    • Write documentation!
    • Context reduces misinterpretations. Misleading Instruments
    • Procedure manuals and visualizations reduce the need for math background. Must Know Math
    • The more contextual layers you add, the more you reduce “big boards of blinky lights”. Too Much Information
    • INSTRUMENTATION IS LIKE A SUIT. IT NEEDS TO FIT YOUR OWN MIND. VA L E N T I N O V O L O N G H I
    • Cross-checks and documented error margins mitigate instrument inaccuracy. Inaccuracy
    • IF YOU DON'T TRUST A COMPUTER BECAUSE SOMETIMES IT DOESN'T TELL YOU THE TRUTH, TELLING IT TO TELL YOU TO TRUST IT IS ASKING IT TO LIE TO YOU SOMETIMES. M I K E S A S S A K , C U R B S I D E
    • Checklists with references to instrumentation at decision points. May be Ignored
    • Collaborative Workplaces, Cooperatives, Unions, Laws etc. Undesirable Purposes
    • I PROPOSE THAT MEN AND WOMEN BE RETURNED TO WORK AS CONTROLLERS OF MACHINES, AND THAT THE CONTROL OF PEOPLE BY MACHINES BE CURTAILED. I PROPOSE, FURTHER, THAT THE EFFECTS OF CHANGES IN TECHNOLOGY AND ORGANIZATION ON LIFE PATTERNS BE TAKEN INTO CAREFUL CONSIDERATION, AND THAT THE CHANGES BE WITHHELD OR INTRODUCED ON THE BASIS OF THIS CONSIDERATION. K U R T V O N N E G U T P L AY E R P I A N O
    • Instrumentation addresses the problems of documentation, documentation the problems of instrumentation. TL;DR
    • Complex Systems need them both.
    • How do I get started?
    • Exometer
    • Dropwizard’s Metrics
    • Scales
    • DataDog NewRelic Librato
    • Questions?
    • Thanks! <3 @bltroutwine