Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Ops for Developers

326
views

Published on

Ops for Developers, training session, presented by Ben Klang at Lone Star Ruby Conference 6 2012

Ops for Developers, training session, presented by Ben Klang at Lone Star Ruby Conference 6 2012

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
326
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. spkr8.com/t/13191 Ops for Developers Or: How I Learned To Stop Worrying And Love The Shell Ben Klang bklang@mojolingo.comFriday, August 10, 12
  • 2. Prologue IntroductionsFriday, August 10, 12
  • 3. Who Am I? Ben Klang @bklang Github/Twitter bklang@mojolingo.comFriday, August 10, 12
  • 4. What are my passions? • Telephony Applications • Information Security • Performance and Availability Design • Open SourceFriday, August 10, 12
  • 5. What do I do? • Today I write code and run Mojo Lingo • But Yesterday...Friday, August 10, 12
  • 6. This was my worldFriday, August 10, 12
  • 7. Ops CultureFriday, August 10, 12
  • 8. “I am allergic to downtime”Friday, August 10, 12
  • 9. It’s About Risk • If something breaks, it will be my pager that goes off at 2am • New software == New ways to break • If I can’t see it, I can’t manage it or monitor it and it will breakFriday, August 10, 12
  • 10. Agenda • 9:00 - 10:30 • 1:30 - 3:00 • Operating Systems & • Autopsy of an HTTP Hardware Request • All About Bootup • Dealing with Murphy • 10:30 - 11:00: Break • 3:00 - 3:30: Break • 11:00 - 12:30 • 3:30 - 5:00 • Observing a Running • Scaling Up System • Deploying Apps • Optimization/Tuning • Audience Requests • 12:30 - 1:30 LunchFriday, August 10, 12
  • 11. Part I Operating Systems & HardwareFriday, August 10, 12
  • 12. OS History Lesson BSD, System V, Linux and WindowsFriday, August 10, 12
  • 13. UNICS Soon renamed “Unix (Sep. 1969) Time Sharing System Version 1” UNIX Time Sharing System Version 5 (Jun. 1974) UNIX Sys III 1BSD (Nov. 1981) (Mar. 1978) UNIX Sys V 4.3BSD (Jan. 1983) (Jun. 1986)Friday, August 10, 12
  • 14. Friday, August 10, 12
  • 15. Hardware ComponentsFriday, August 10, 12
  • 16. Common Architectures • Intel x86 (i386, x86_64) • SPARC • POWER • ARM • But none of this really matters anymoreFriday, August 10, 12
  • 17. CPU Configurations • Individual CPU • SMP: Symmetric Multi-Processing • Multiple Cores • Hyperthreading/Virtual CoresFriday, August 10, 12
  • 18. (Virtual) Memory • RAM + Swap = Available Memory • Swapping strategies vary across OSes • What your code sees is a complete virtualization of this • x86/32-bit processes can only “see” 3GB of RAM from a 4GB address spaceFriday, August 10, 12
  • 19. Storage Types • Local Storage (SATA, SAS, USB, Firewire) • Network Storage (NFS, SMB, iSCSI, AOE) • Storage Network (FibreChannel, Fabrics)Friday, August 10, 12
  • 20. Networking • LAN (100Mb still common; 1Gbit standard; 10Gb and 100Gb on horizon) • WAN (T-1, Frame Relay, ATM, MetroE) • Important Characteristics • Throughput • Loss • DelayFriday, August 10, 12
  • 21. Part II All About BootupFriday, August 10, 12
  • 22. Phases • BIOS • Kernel Bootstrap • Hardware Detection • Init SystemFriday, August 10, 12
  • 23. System Services • Varies by OS • Common: SysV Init Scripts; /etc/inittab; rc.local • Solaris: SMF • Ubuntu: Upstart • Debian: SysV default; Upstart optional • OSX: launchd • RedHat/CentOS: SysV Init ScriptsFriday, August 10, 12
  • 24. SysV Init Scripts • Created in /etc/init.d; Symlinked into runlevel directories • Symlinks prefixed with special characters to control startup/shutdown order • Prefixed with “S” or “K” to start or stop service in each level • Numeric prefix determines order • /etc/rc3.d/S10sshd -> /etc/init.d/sshdFriday, August 10, 12
  • 25. rc.local • Single “dumb” startup script • Run at end of system startup • Quick/dirty mechanism to start something at bootupFriday, August 10, 12
  • 26. /etc/inittab • The original process supervisor • Not (easily) scriptable • Starts a process in a given runlevel • Restarts the process when it diesFriday, August 10, 12
  • 27. Supervisor Processes • Solaris SMF • Ubuntu Upstart • OSX launchd • daemontoolsFriday, August 10, 12
  • 28. Ruby Integrations • Supervisor Processes • Bluepill • God • Startup Script Generator • ForemanFriday, August 10, 12
  • 29. Choosing a Boot Mechanism • Is automatic recovery desirable? (Hint: sometimes it’s not) • Does it integrate with monitoring? • Is it a one-off that will get forgotten? • Does it integrate into OS startup/shutdown? • How much work to integrate with your app?Friday, August 10, 12
  • 30. Part III Observing a Running SystemFriday, August 10, 12
  • 31. Common Tools • top • free • vmstat • netstat • fuser • ps • sar (not always installed by default)Friday, August 10, 12
  • 32. Power Tools • lsof • iostat • iftop • pstree • Tracing tools • strace • tcpdump/wiresharkFriday, August 10, 12
  • 33. Observing CPU • Go-to tools: top, ps • CPU is not just about computation • Most Important: %user, %system, %nice, %idle, %wait • Other: hardware/software interrupts, “stolen” time (especially on EC2)Friday, August 10, 12
  • 34. The Mystical Load Avg. • Broken into 1, 5 and 15 minute averages • Gives a coarse view of overall system load • Based on # processes waiting for CPU time • Rule of thumb: stay below the number of CPUs in a system (eg. a 4 CPU host should be below a 4.00 load average)Friday, August 10, 12
  • 35. When am I CPU bound? • 15 minute load average exceeding the number of non-HT processors • %user + %system consistently above 90%Friday, August 10, 12
  • 36. Observing RAM • Go-to tools: top, vmstat • Available memory isn’t just “Free” • Buffers + Cache fill to consume available RAM (this is a good thing!)Friday, August 10, 12
  • 37. RAM vs. Swap • RAM is the amount of physical memory • Swap is disk used to augment RAM • Swap is orders of magnitude slower • Some VM types have no meaningful swap • Rule of thumb: pretend swap doesn’t existFriday, August 10, 12
  • 38. Paging Strategies • Solaris: Page in advance • Linux: Page on demand (last resort) • Windows: CrazinessFriday, August 10, 12
  • 39. When am I memory bound? • Free + buffers + cache < 15% of RAM • Swap utilization above 10% avail. swap (Linux only) • Check for high disk utilization to confirm “thrashing”Friday, August 10, 12
  • 40. Observing Disk • Go-to tools: iostat, top • Disk is usually hardest thing to observe • Better in recent Linux kernels (> 2.6.20)Friday, August 10, 12
  • 41. RAID • Redundant Array of Inexpensive Drives • Different strategies have different performance/durability tradeoffs • RAID-0 • RAID-1 • RAID-10 • RAID-5Friday, August 10, 12 • RAID-6
  • 42. When am I disk bound? • %wait is consistently above 10% to 20% • ... though %wait can be network too • SCSI and FC command queues are long • Known failure mode: disk more than 85% full causes tremendous VFS overheadFriday, August 10, 12
  • 43. Observing Network • Go-to tools: netstat, iftop, wireshark • Be wary of choke-points • Switch interconnects • WAN links • FirewallsFriday, August 10, 12
  • 44. Link Optimization • Use Jumbo Frames for Gbit+ links • Port aggregation for throughput: • Best: many-to-many • Good: one-to-many • Useless: one-to-one • ... but still useful for HAFriday, August 10, 12
  • 45. When am I network bound? • This one is easy: 99% of the time this is link saturation • Gotchas: which link? • Addendum: loss/delay (especially for TCP) can wreak havoc on throughput • ... but usually only a problem across WANFriday, August 10, 12
  • 46. Part IV Optimization & Performance TuningFriday, August 10, 12
  • 47. Hardware Options • A.K.A. “Throw hardware at it” • Not the first thing to try • Are the services tuned? SQL queries, application behavior, caching options • Is something broken, causing performance degradation?Friday, August 10, 12
  • 48. Hardware Options • RAM is usually the single biggest performance win (cost/benefit tradeoff) • Faster disk is next best • Then look at CPU and/or Network • ...but do the work to figure out why your performance is limited in the first placeFriday, August 10, 12
  • 49. Kernel Tunables • Not as necessary as in the “old days” • Almost all settings can be adjusted at runtime on Linux, Solaris • Most valuable settings are buffer limits or counters/timers • There be dragons! Read carefully before twisting these knobsFriday, August 10, 12
  • 50. Environment Settings • ulimits • max files • stack size • memory limits • core dumps • others • Still subject to system-wide (kernel) limitsFriday, August 10, 12
  • 51. Environment limits • Hard limits cannot be raised by unprivileged users • PAM configuration may also be in effectFriday, August 10, 12
  • 52. Application Tunables • There are not many for C-Ruby • JVM has many • Mostly related to how RAM is allocated and garbage collected • Very dependent on application • Any time an “xVM” is involved, there is probably a tunable (JVM, CLR) • But we are developers! Tune/profile your app before looking to the environmentFriday, August 10, 12
  • 53. Performance Management Tools • sysstat (sar) • SNMP (and related tools like Cacti) • Integrated Monitoring + Trending tools • Zabbix • OpenNMS • and a plethora of commercial toolsFriday, August 10, 12
  • 54. Part V Putting It All Together Autopsy of a single HTTP request, end-to-endFriday, August 10, 12
  • 55. Live Demo/WhiteboardFriday, August 10, 12
  • 56. Part VI Pulling It All Apart Anticipating Murphy and his LawFriday, August 10, 12
  • 57. Most Common Pitfalls • Disk Full • DNS Unavailable/Slow • Insufficient RAM • Suboptimal Service Configuration • Firewall misconfiguration • Archaic: Network mismatch (Full/Half Duplex)Friday, August 10, 12
  • 58. DNS and Performance • Possibly most-overlooked perf. impact • Everything uses DNS • If you make nothing else redundant, make this redundant!Friday, August 10, 12
  • 59. Part VII Scaling UpFriday, August 10, 12
  • 60. Horizontal or Vertical? • Vertical: Making one server/instance go faster • Horizontal: Parallelizing requests to get more things done in the same amount of timeFriday, August 10, 12
  • 61. Clustering • Parallelizing requests to increase overall throughput: horizontal scaling • Techniques to make information more available: • Caching (memcache, file-based caching) • Distribute data sets • ReplicationFriday, August 10, 12
  • 62. Distributing Data • Replication • Split Reads (One writer/master; multiple slaves/readers) • Multiple Masters (dangerous!) • Sharding (must consider HA)Friday, August 10, 12
  • 63. Failover/HA • Consistency requires concept of Quorum • Losing partition gets killed: STONITH • Multi-master systems ignore this at the cost of potential non-determinisimFriday, August 10, 12
  • 64. Tuning Services • Some VM types (especially JVM or CLR) have tunables for memory consumption • Databases usually have memory settings • These can make dramatic differences • Very workload dependent • Deep troubleshooting: strace, wiresharkFriday, August 10, 12
  • 65. Part VIII Deploying ApplicationsFriday, August 10, 12
  • 66. 12 Factor Application • Deployability starts with application design • Clear line between configuration and logic • Permit easy horizontal scaling • Are OS-agnostic (yay Ruby!) • Minimize differences between dev and prod • http://12factor.net - by Heroku cofounderFriday, August 10, 12
  • 67. Deployment Tools • Capistrano • The de facto standard • Requires effort to set up, test • Requires integration with system startup • Most flexibleFriday, August 10, 12
  • 68. Deployment Tools • “Move it to the cloud” • Heroku • Cloud FoundryFriday, August 10, 12