2. Best Practices
Best Practice
Steps to Take
Assure code quality
• Version control
• Test coverage
Dedicate time to
performance and security
• Dedicated positions are ideal
• For new projects, build into time estimates
• For older projects, dedicated time to revisit performance/security
Avoid technical debt
• Be proactive about updating versions
• Don’t write untested code
Continuous integration
2
•
•
•
•
Avoid doing work on production!
Proper dev/stage/prod workflow
Use CI tools to integrate and test code commits
Stage should be a (sanitized) clone of prod, but completely separated
from prod
3. But … Stuff Happens
Unexpected Situations
• Relevant employees are
inaccessible
• Unknown dependencies
• Emergencies affecting the
stakeholder
Bugs that only show up at scale
• Table locks
• ‘Noisy neighbor’ problems
• Queries lacking indexes or requiring
temp tables to disk (occurs with
blob/text columns, string cols >512
char in group by, distinct or unions)
• Rolling outages
• Users with atypical behavior, like
posting thousands of comments,
dozens of “tabs”
3
Bugs caused by third parties
• DNS
4. 5 Steps to…
Writing code that won't
cause debugging headaches
Structuring deployments to keep
them responsive and reversible
Continuous monitoring across production
Agility in following problems through complex
systems
Making the right team members are aware of problems
4
5. Step 1: Writing code that won't cause debugging headaches
1
2
Dynamic dispatch
3
Evented or highly asynchronous code
4
Difficult data structures (e.g., closures)
5
5
Confusing execution flow
Include monitoring considerations in your codebase
6. Step 2: Structuring deployments to keep them responsive & reversible
1
2
Consider the cost of backing out: having to do a full database restore can take a huge
amount of time on a large site, so you ideally want to be able to do selective reverts
3
Know what changes are irreversible and what aren’t
4
Keep deployments as small as possible
5
Gradually roll out production changes, rather than all at once
6
Consider where you can make architectural decisions that enable fast changes, e.g.
setting a very low DNS TTL
7
6
Knowledge of what goes into a deployment
Deploy during office hours when possible, and not on Fridays
7. Step 3: Continuous monitoring across production
1
2
Log files
3
Availability
4
Latency
5
7
Infrastructure health
Cache effectiveness
8. Step 4: Agility in following problems through complex systems
1
2
Might require dedicated time for building tooling
3
Playbooks for when it goes well, and also for when it doesn’t
4
8
A big part of why DevOps matters
Making monitoring accessible
9. Step 5: Making the right team members are aware of problems
Doesn’t have to be a “flat
hierarchy”, but how effective are
your channels?
9