Your SlideShare is downloading. ×
Building DevOps with Beer & Whiteboards
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Building DevOps with Beer & Whiteboards

1,716
views

Published on

Velocity 2013 - How Edmunds learned from failure, begin opening communications between silos, and build a DevOps culture over beer and whiteboards. …

Velocity 2013 - How Edmunds learned from failure, begin opening communications between silos, and build a DevOps culture over beer and whiteboards.

(HINT: Download to see the presenter's notes for what may not make sense without a speaker!)

Published in: Technology, News & Politics

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,716
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • - The automative resource of the Internet - Originally in print, then Gopher in 1994, Web in 1996
  • - Our environment is highly distributed. When you visit Edmunds.com you’re interacting one or more of our 30 web apps spread out across a couple hundred hosts. - The website itself is built on Apache Tomcat, Solr, MongoDB, and Oracle Coherence. - Internally, you’ll also find ActiveMQ, Oracle, and some lingering WebLogic services we’ll soon be doing away with. - We rely heavily on a mix of different tools to build and support all this: chef, jenkins, CloudStack, AppDynamics, Splunk, to name a few. - But I’m getting ahead of myself because how we got to this architecture is part of the tale on how Edmunds came to embrace a DevOps mindset.
  • - So then where does our story start? - Let me be up front: WE STUMBLED. WE PERFECTED THE FACEPALM. - The specifics of our situations when the shit hit the fan may have felt unique, but they’re not. - We learned from our mistakes with the intent of getting better. - Let’s talk facepalms...
  • - This may be familiar... - In 2005, we had 30 servers. In 2006, we burst up to 300 and held steady for a few years with slow growth. - In 2009, we saw radical jump in server deployment - We grew in number of servers, but not in the number of admins - We had Kickstart, but that’s only good at bootstrap time - BladeLogic + AnthillPro seemed a good solution, but there were major issues - Growth is painful
  • - One very specific breakdown in our history that stands out to me. - 2007 - Edmunds 2.0: Introducing CMS for the business - All content was locked to a monthly release cycle - Six months of functional testing, without any performance validation. - Two months before launch, performance testing uncovered scalability issues. - Ops response: double application infrastructure and throw a hardware cache appliance. - Breakdown in relationships between Dev/Ops lead to major business costs. - Fast forward to 2009; remember that big jump in the number of servers we were deploying?
  • - 2010 Edmunds Redesign: Complete rewrite of all website code + modular breakout of applications. - Good collaboration between Dev/Ops to understand requirements on all sides. - But QA + BETA were build brick-by-brick, and not easily reproducible. - Armed with BladeLogic + AnthillPro, build/deploy was more automated but weren’t coupled together! - Production environment took 3 months to build while BETA served the new website. - We started to realize that the real challenge wasn’t technology but culture .
  • We wanted to stop working like this...
  • and start building like this.
  • We really wanted to get out of here.
  • - And go here - This is the Daily Pint! Let me buy you a beer! - This is where the wildest of ideas are born - Disagreements are worked through with positive jest and jeers - It is where we talked it over
  • - Then we’d take it here! - THE MOST UNDER RATED TOOL YOU ALREADY HAVE. - Floor-to-ceiling whiteboards where we worked out our ideas. - We talked gaps in handoffs, failure rates due to manual builds, linking tools in together - “self-service”, Automated testing, and much much more. - What happened those was no “ops”, no “dev”. We were technologists working to solve problems with no boundaries of roles in the way. - Our proposal: tear down silos. - We did just that!
  • - So who and how did this happen? - TechLeads who spent too much time in war rooms started chewing on the problem together. - Identified gaps in provisioning/config management and app deployment tools. - Scott McNealy was right about hardware/software dependencies. - Two teams, Production Engineering & Automation Engineering set about to provide tools which bridged the divide. - (ProdEng = Ops) + (AutoEng = Dev) == How we really started gaining inroads. (NOT IDEAL!) - Members of both these teams shed traditional views on what they were supposed to do and just did it. - The result were improved relationships, better tooling, and a clearer perspective on how future projects could work.
  • - So we started linking all our tools together! - “Your tools don’t make your culture, but they do have an impact on the people who do.”
  • - We now talk about data that our tools provide us - You can talk from your gut, but you better back it up with data - We pushed ownership and accountability by leveraging what we found with data . - The metrics were clearly pointing out our failures, allowing us to learn how to prevent them in the future.
  • - Armed with a tighter toolchain and a new way of working together, we were once again about to be put to the test. - Edmunds began investing resources into “the cloud”. - Heavily virtualized since 2010, but no clear “cloud” offerings - Two teams, one objective: make edmunds.com work on $x cloud platform - Why two? DIVERSIFY.
  • - This was our first shot at a “new” project armed with our new practices + tooling - They were uncharted waters, even though we’d been virtualized for a few years. “Cloud” is a different beast. - But with familiar tooling + improved communications, these teams produced success results that were easily measured. - Environment build time down to less than a week. - Done with 95% of the same toolset for both cloud platforms.
  • - We’ve all spent our careers as firefighters. - Street cred with co-workers, bosses, executives as cool headed during a mess - So what about when there are less - or different - kinds of fires? - By increasing accountable individuals, more “self-service”, less fires == increased capacity for business acumen. - This is the business value of what we call DevOps is leading is to.
  • - To go from this to this... - Invest in addressing systemic issues around communication + partnerships, we increase our capacity to take on other challenges - No big secret, it’s been talked about by Damon Edwards, John Willis - Covered beautifully in “The Phoenix Project” - Technologist in the age of the Internet are no longer back-office workers keeping the lights on - We help shape the direction of our companies; direct impact on revenue in ways our field sees change now yearly. - We needed to change the way we work together to free ourselves for “bigger things”. - An exciting time to be working in our field!
  • - Okay, back to our cloud initiatives... - With this additional capacity, here’s a few things we learned to give value to our company - Cloud isn’t free; server sprawl can be expensive and lack of education with “self-service” becomes a major issue. - How much does it cost to operate your environment? It’s tough to calculate! - Licensing by host or CPUs is costly at scale, so look for alternatives to those things you pay a premium for. - Managing operating costs starts with understanding where the money is going!
  • - A great growing experience the last few years @ Edmunds. - No rose-tinted glasses to suggest we’ve solved all our problems! BUT WE GOT SOME BIG ONES! - And today we work a helluva lot more like this! - So, let’s take on the challenge of showing some metrics of success by adopting a DevOps culture...
  • - Application Availability has increased. Not the holy metrics of “four 9’s”, but a bump all the same! - The number of high-severity INCs has dropped 50% year-over-year - The number of TKTs filed has dropped 50% year-over-year --- Self-service is slick! - The MTTR of pre-production issues has drastically reduced from 5 days to 2 days and even faster than that in most situations. - The time it takes us to build runways has gone down from 3 months to 1 week! - Deeper inspection of our costs-per-host, we’re expecting to begin shaving off overall operating costs drastically for next year’s budget. - Team morale? Well...
  • We got out of here.
  • And into here, so it’s pretty good.
  • - Always more to be done! You’re never “finished” growing. - Devs on-call! (You build it, you run it!) - Reducing infrastructure footprint == reducing operating costs - More RESTful applications - Other cloud offerings?
  • Transcript

    • 1. BUILDING DEVOPSWITHBEER ANDWHITEBOARDSJOHN MARTIN@tekBuddhaSTEVE BURTON@BurtonSays
    • 2. CALL OF DUTY:DEV OPS
    • 3. the challengeGAME SELECTDEVELOPERDEVELOPEROPERATIONSOPERATIONSDEVOPSDEVOPSNOOPSNOOPSAADEVOPSMISSION PARAMETERS:MISSION OBJECTIVESKILL YOUR COMPETITORS- DEVELOP, TEST, DEPLOY, OPERATE- AUTOMATION & BUSINESS AGILITYRECOMMENDED ESSENTIALSBEER, WHITEBOARDS, COMMUNICATION
    • 4. but what is success?
    • 5. “success is going from failure tofailure without losingenthusiasm”Winston Churchill
    • 6. failure
    • 7. mean time to innocence (MTTI)
    • 8. mean time to resolution (MTTR)Weeks, Days, Hours or Minutes?
    • 9. mean time between failure (MTBF)Weeks, Days, Hours or Minutes?
    • 10. availability?99.9%The most meaningless metric in IT today.
    • 11. business metrics> revenue> throughput> performance> productivity
    • 12. Edmunds.comEXPERT CAR ADVICEFOUNDED IN 1966550 EMPLOYEES650K DAILY UNIQUES
    • 13. whoamiSR DIRECTOR PRODUCTION ENGINEERINGA DECADE SUPPORTING JAVAARCHITECTURESFUELED BY METRICS, WHITEBOARDS,LOGS, AND BEER
    • 14. Our environment.
    • 15. Compelling EventsSource: http://is.gd/iJU4et
    • 16. Growing Pains
    • 17. Communication
    • 18. 2010 RedesignSource: http://is.gd/L77vl1COMPLETE REWRITE OF PLATFORMQA & BETA WORKED GREAT!BETA BECOMES PROD3 MONTHS IN A WAR ROOM
    • 19. NOT LIKE THISSource:http://is.gd/PFLRmW
    • 20. LIKE THISSource: http://is.gd/iJU4et
    • 21. OUT OF HERESource: http://is.gd/oFCXNH
    • 22. IN TO HERESource: http://is.gd/iJU4et
    • 23. ONE OF THEMOST UNDERRATED TOOLSYOU ALREADYHAVE.THE WHITEBOARD
    • 24. TEARING IT DOWNSource: http://is.gd/Vrnwu4
    • 25. The Toolshed
    • 26. Communicating with MetricsSource: http://is.gd/L77vl1DATA DRIVEN CULTURECHECK THE GUTDRIVE ACCOUNTABILITYLEARN FROM FAILURE
    • 27. CLOUDY SKIESSource: http://is.gd/arBZ4M
    • 28. Putting It All TogetherSource: http://is.gd/L77vl1UNCHARTED WATERSFAMILIAR TOOLINGIMPROVED COMMUNICATIONSMEASURABLE SUCCESS STORIES
    • 29. A Personal NoteSource: http://is.gd/L77vl1
    • 30. A Personal NoteSource: http://is.gd/L77vl1
    • 31. The Business PropositionSource: http://is.gd/L77vl1THE CLOUD ISN’T FREECOST PER HOST CAN GET SCARYLOOK FOR THE FREEBIES
    • 32. AWESOMENESSSource: http://is.gd/iJU4et
    • 33. Measuring SuccessSource: http://is.gd/L77vl1Before After Benefit $ SavingsApplication Availability % 99.91% 99.95% > 0.04% $167k revenue protection# of High Severity Incidents 21 10 < 50% $307k productivity# of Help desk Tickets 196 99 < 50%MTTR in Pre-Production 5 Days 2 Days < 45% $320k productivityTime To Build Runways 3 Months < 1 Week Seriously?!Operating Costs $$$$ TBDTeam Morale Bummered Beer
    • 34. OUT OF HERESource: http://is.gd/oFCXNH
    • 35. IN TO HERESource: http://is.gd/iJU4et
    • 36. Source: http://is.gd/xKdI6EWHERE NEXT?
    • 37. JOHN MARTINjmartin@edmunds.com@tekBuddhaSTEVE BURTONsburton@appdynamics.com@BurtonSaysQUESTIONS?We’re hiring!Stop by our booth!