• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Devops down-under
 

Devops down-under

on

  • 274 views

 

Statistics

Views

Total Views
274
Views on SlideShare
264
Embed Views
10

Actions

Likes
0
Downloads
4
Comments
0

2 Embeds 10

http://coderwall.com 5
http://www.linkedin.com 5

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Devops down-under Devops down-under Presentation Transcript

    • DevOps Down Under 2011 Sprinkling DevOps Magic in Other People's Environments Robert Postill
    • How's This Gonna Go Down?
    • How's This Gonna Go Down?
      • Everybody's got a story and this is our's
      • Our first architecture
      • Learning from failure
      • A brief aside
      • Getting better
      • Tough messages
      • Where to from here
    • C3's Story
      • There was a dream
      • It makes the excel go into the data warehouse
      • And it's done badly
      • So we built a prototype
      • Then we made a sale
    • A little bit of how it works
    • Priorities of our first architecture
      • Works!
      • Restarts when the machine restarts
      • Remotely deploy updates
      • Not a lot of state on the VM
    • Our first architecture
    • Our first architecture
    • Lesson: Most customers will accept a small selection of services if you give them a report from that service
    • create_deployment.sh
      • Poor man's capistrano
      • A shell script that:
        • Fetched the latest from github
        • Exported it to a datestamped directory
        • Made a set of symlinks point to the right places
        • Restarted the app
    • Flaws
      • We knew practically nothing about what was happening on the box
      • The logs... THE LOGS FIX THOSE FREAKING LOGS!!!
    • And the worst flaw of all...
      • We started to get calls that started with:
        • “ Integrity’s down, what's the score?”
      • Then we'd have a look...
      • And it would be the database
    • Lesson: Things you don't own go badly wrong and the first people to know are the end users
    • A lot of sad face
    • So we revved the architecture
    • Then more stuff happened...
      • We continued to get calls that started with:
      “ Integrity’s down, what's the score?”
      • Then we'd have a look...
      • And it would be the VM, mounted disks read-only
    • Lesson: Virtual Machines are prone to at least a couple of novel modes of failure
    • Which started to lead to the inevitable
    • So the next problem... Us
      • New Relic gives you slow transaction reports
      • In ruby select, collect and friends are ways of making in memory decisions over collections of things
      • Which works on test set sizes of ten or so
      • But doesn't on large volumes of things, like say a couple of million objects
      • We'd created a technical debt mountain
    • Hiring someone new
    • A brief trip to the metaworld
      • We're devops by necessity
      • There is no ops department
      • Our devs cover a lot of ground
        • Architecture
        • Operations
        • Database Administration
        • Networking
        • Support
        • Business Analysis
    • Behold the AnDevOpSuptecht
      • It used to be that a lot of places had Systems Programmers
      • Now it feels like architects are going the same way
      • Where's the limit going to be drawn on the responsibility of an individual...
      • Are we thinking about the roles we play in the wrong way?
    • Crap Maths Applied To Recruitment
      • Australian Population : 21,874,900
      • Melbourne Population: 3,478,138
      • 22.6% ' professionals' in 2006 census: 786,059
      • Professionals in 'information, media and telecoms': 14,246
      • Spolsky says 1 in 200 dev applicants can dev, leaving: 712
      • TIOBE Index says Ruby is used by 1.484% of devs: 10
    • Crap Maths Applied To Recruitment
      • Australian Population : 21,874,900
      • Melbourne Population: 3,478,138
      • 22.6% ' professionals' in 2006 census: 786,059
      • Professionals in 'information, media and telecoms': 14,246
      • Spolsky says 1 in 200 dev applicants can dev, leaving: 712
      • TIOBE Index says Ruby is used by 1.484% of devs: 10
    • So...
      • Before we look into
        • Team fit
        • Seniority
        • Skills (Ubuntu, Databases, Business intelligence...)
      • I need a lie down :(
      • Congratulations to you in Melbourne who do hire devops!
      • Do we need to think about apprenticeships?
    • Lesson: You need good people, really good people
    • Meanwhile, back at the point...
    • Looking To Get Smart
      • We wanted to get start deploying to numbers of machines (> 10)
      • We needed a way to start automating deployment
      • Have you seen this chef thing?
      • So we started creating recipes
    • But we had issues
      • I don't want to beat up on chef
      • The development of our architecture was *much* slower through chef
      • We lost our chef database
      • We tried to run chef server internally on two instances
      • We spent a lot of time learning things like never use the ui, only ever use data bags
      • chef changed too fast and we also changed too fast
    • Lesson: The tools may not be mature enough and more importantly you may not be mature enough to use them
    • So now we...
      • Take a stock Ubuntu VM
      • Customise via capistrano scripts
      • Snapshot, distribute
      • Update via capistrano and create_deployment.sh
      • Distribute SSH keys via chef
    • And the customers kept on ringing
      • In particular there was the terrible case of the wild performance swings
      • New Relic would give us 6x, 4x, 12x performance swings dependant on the week.
      • We'd see CPU spikes and terrible loads applied to the mongrels as users got frustrated
      “ Integrity’s slow, what's the score?”
      • And we'd see... not much
    • And that got difficult
      • We had to start asking for VMWare metrics
      • Our working assumption was the same version does not pitch and roll like this
      • Lets be honest what we're saying is “we don't think you can manage your own infrastructure”
      • Explicitly :(
    • A lot of thinking...
    • Little by little we ground out answers
      • We found out there wasn't a lot of separation between VMs
      • Then we found out the VMs were moving over different physical hosts (vMotion)
      • And then we started to get a handle on overcommitment
    • Lesson: Smart tools can play havoc with performance
    • Lesson: VMWare (or their competitors) is not a magic well
    • Where we are now
    • Where we are now
    • There's plenty for us still to do
      • Retire create_deployment.sh
      • Automate deployment
      • Refactor the architecture to give us scalability over numerous machines
      • Deploy to only part of the architecture
      • Deploy based on need
    • Wrapping Up
      • Pushing your stuff into other people's environments is hard
      • Back yourself with the stats and share them
      • Make sure your app has sufficient canaries
      • Find good people
      • Prepare for tough conversations
    • Questions?
        Photo credits (in order of appearance):
      • http://www.flickr.com/photos/ricoslounge/38351363/ - ricoslounge
      • http://www.flickr.com/photos/jima/3435396513/ - jima
      • http://www.flickr.com/photos/34495711@N06/3613301938/ - Aaron Frutman
      • http://www.flickr.com/photos/dancoulter/21042744/ - Dan Coulter
      • http://www.flickr.com/photos/abennett96/2639105060/ - BenSpark
      • http://www.flickr.com/photos/bcymet/1923368669/ - bcymet