• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Building DevOps with Beer & Whiteboards
 

Building DevOps with Beer & Whiteboards

on

  • 1,591 views

Velocity 2013 - How Edmunds learned from failure, begin opening communications between silos, and build a DevOps culture over beer and whiteboards. ...

Velocity 2013 - How Edmunds learned from failure, begin opening communications between silos, and build a DevOps culture over beer and whiteboards.

(HINT: Download to see the presenter's notes for what may not make sense without a speaker!)

Statistics

Views

Total Views
1,591
Views on SlideShare
1,323
Embed Views
268

Actions

Likes
0
Downloads
6
Comments
0

5 Embeds 268

http://java.dzone.com 249
http://www.dzone.com 7
https://twitter.com 6
http://architects.dzone.com 5
http://news.google.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • - The automative resource of the Internet - Originally in print, then Gopher in 1994, Web in 1996
  • - Our environment is highly distributed. When you visit Edmunds.com you’re interacting one or more of our 30 web apps spread out across a couple hundred hosts. - The website itself is built on Apache Tomcat, Solr, MongoDB, and Oracle Coherence. - Internally, you’ll also find ActiveMQ, Oracle, and some lingering WebLogic services we’ll soon be doing away with. - We rely heavily on a mix of different tools to build and support all this: chef, jenkins, CloudStack, AppDynamics, Splunk, to name a few. - But I’m getting ahead of myself because how we got to this architecture is part of the tale on how Edmunds came to embrace a DevOps mindset.
  • - So then where does our story start? - Let me be up front: WE STUMBLED. WE PERFECTED THE FACEPALM. - The specifics of our situations when the shit hit the fan may have felt unique, but they’re not. - We learned from our mistakes with the intent of getting better. - Let’s talk facepalms...
  • - This may be familiar... - In 2005, we had 30 servers. In 2006, we burst up to 300 and held steady for a few years with slow growth. - In 2009, we saw radical jump in server deployment - We grew in number of servers, but not in the number of admins - We had Kickstart, but that’s only good at bootstrap time - BladeLogic + AnthillPro seemed a good solution, but there were major issues - Growth is painful
  • - One very specific breakdown in our history that stands out to me. - 2007 - Edmunds 2.0: Introducing CMS for the business - All content was locked to a monthly release cycle - Six months of functional testing, without any performance validation. - Two months before launch, performance testing uncovered scalability issues. - Ops response: double application infrastructure and throw a hardware cache appliance. - Breakdown in relationships between Dev/Ops lead to major business costs. - Fast forward to 2009; remember that big jump in the number of servers we were deploying?
  • - 2010 Edmunds Redesign: Complete rewrite of all website code + modular breakout of applications. - Good collaboration between Dev/Ops to understand requirements on all sides. - But QA + BETA were build brick-by-brick, and not easily reproducible. - Armed with BladeLogic + AnthillPro, build/deploy was more automated but weren’t coupled together! - Production environment took 3 months to build while BETA served the new website. - We started to realize that the real challenge wasn’t technology but culture .
  • We wanted to stop working like this...
  • and start building like this.
  • We really wanted to get out of here.
  • - And go here - This is the Daily Pint! Let me buy you a beer! - This is where the wildest of ideas are born - Disagreements are worked through with positive jest and jeers - It is where we talked it over
  • - Then we’d take it here! - THE MOST UNDER RATED TOOL YOU ALREADY HAVE. - Floor-to-ceiling whiteboards where we worked out our ideas. - We talked gaps in handoffs, failure rates due to manual builds, linking tools in together - “self-service”, Automated testing, and much much more. - What happened those was no “ops”, no “dev”. We were technologists working to solve problems with no boundaries of roles in the way. - Our proposal: tear down silos. - We did just that!
  • - So who and how did this happen? - TechLeads who spent too much time in war rooms started chewing on the problem together. - Identified gaps in provisioning/config management and app deployment tools. - Scott McNealy was right about hardware/software dependencies. - Two teams, Production Engineering & Automation Engineering set about to provide tools which bridged the divide. - (ProdEng = Ops) + (AutoEng = Dev) == How we really started gaining inroads. (NOT IDEAL!) - Members of both these teams shed traditional views on what they were supposed to do and just did it. - The result were improved relationships, better tooling, and a clearer perspective on how future projects could work.
  • - So we started linking all our tools together! - “Your tools don’t make your culture, but they do have an impact on the people who do.”
  • - We now talk about data that our tools provide us - You can talk from your gut, but you better back it up with data - We pushed ownership and accountability by leveraging what we found with data . - The metrics were clearly pointing out our failures, allowing us to learn how to prevent them in the future.
  • - Armed with a tighter toolchain and a new way of working together, we were once again about to be put to the test. - Edmunds began investing resources into “the cloud”. - Heavily virtualized since 2010, but no clear “cloud” offerings - Two teams, one objective: make edmunds.com work on $x cloud platform - Why two? DIVERSIFY.
  • - This was our first shot at a “new” project armed with our new practices + tooling - They were uncharted waters, even though we’d been virtualized for a few years. “Cloud” is a different beast. - But with familiar tooling + improved communications, these teams produced success results that were easily measured. - Environment build time down to less than a week. - Done with 95% of the same toolset for both cloud platforms.
  • - We’ve all spent our careers as firefighters. - Street cred with co-workers, bosses, executives as cool headed during a mess - So what about when there are less - or different - kinds of fires? - By increasing accountable individuals, more “self-service”, less fires == increased capacity for business acumen. - This is the business value of what we call DevOps is leading is to.
  • - To go from this to this... - Invest in addressing systemic issues around communication + partnerships, we increase our capacity to take on other challenges - No big secret, it’s been talked about by Damon Edwards, John Willis - Covered beautifully in “The Phoenix Project” - Technologist in the age of the Internet are no longer back-office workers keeping the lights on - We help shape the direction of our companies; direct impact on revenue in ways our field sees change now yearly. - We needed to change the way we work together to free ourselves for “bigger things”. - An exciting time to be working in our field!
  • - Okay, back to our cloud initiatives... - With this additional capacity, here’s a few things we learned to give value to our company - Cloud isn’t free; server sprawl can be expensive and lack of education with “self-service” becomes a major issue. - How much does it cost to operate your environment? It’s tough to calculate! - Licensing by host or CPUs is costly at scale, so look for alternatives to those things you pay a premium for. - Managing operating costs starts with understanding where the money is going!
  • - A great growing experience the last few years @ Edmunds. - No rose-tinted glasses to suggest we’ve solved all our problems! BUT WE GOT SOME BIG ONES! - And today we work a helluva lot more like this! - So, let’s take on the challenge of showing some metrics of success by adopting a DevOps culture...
  • - Application Availability has increased. Not the holy metrics of “four 9’s”, but a bump all the same! - The number of high-severity INCs has dropped 50% year-over-year - The number of TKTs filed has dropped 50% year-over-year --- Self-service is slick! - The MTTR of pre-production issues has drastically reduced from 5 days to 2 days and even faster than that in most situations. - The time it takes us to build runways has gone down from 3 months to 1 week! - Deeper inspection of our costs-per-host, we’re expecting to begin shaving off overall operating costs drastically for next year’s budget. - Team morale? Well...
  • We got out of here.
  • And into here, so it’s pretty good.
  • - Always more to be done! You’re never “finished” growing. - Devs on-call! (You build it, you run it!) - Reducing infrastructure footprint == reducing operating costs - More RESTful applications - Other cloud offerings?

Building DevOps with Beer & Whiteboards Building DevOps with Beer & Whiteboards Presentation Transcript

  • BUILDING DEVOPSWITHBEER ANDWHITEBOARDSJOHN MARTIN@tekBuddhaSTEVE BURTON@BurtonSays
  • CALL OF DUTY:DEV OPS
  • the challengeGAME SELECTDEVELOPERDEVELOPEROPERATIONSOPERATIONSDEVOPSDEVOPSNOOPSNOOPSAADEVOPSMISSION PARAMETERS:MISSION OBJECTIVESKILL YOUR COMPETITORS- DEVELOP, TEST, DEPLOY, OPERATE- AUTOMATION & BUSINESS AGILITYRECOMMENDED ESSENTIALSBEER, WHITEBOARDS, COMMUNICATION
  • but what is success?
  • “success is going from failure tofailure without losingenthusiasm”Winston Churchill
  • failure
  • mean time to innocence (MTTI)
  • mean time to resolution (MTTR)Weeks, Days, Hours or Minutes?
  • mean time between failure (MTBF)Weeks, Days, Hours or Minutes?
  • availability?99.9%The most meaningless metric in IT today.
  • business metrics> revenue> throughput> performance> productivity
  • Edmunds.comEXPERT CAR ADVICEFOUNDED IN 1966550 EMPLOYEES650K DAILY UNIQUES
  • whoamiSR DIRECTOR PRODUCTION ENGINEERINGA DECADE SUPPORTING JAVAARCHITECTURESFUELED BY METRICS, WHITEBOARDS,LOGS, AND BEER
  • Our environment.
  • Compelling EventsSource: http://is.gd/iJU4et
  • Growing Pains
  • Communication
  • 2010 RedesignSource: http://is.gd/L77vl1COMPLETE REWRITE OF PLATFORMQA & BETA WORKED GREAT!BETA BECOMES PROD3 MONTHS IN A WAR ROOM
  • NOT LIKE THISSource:http://is.gd/PFLRmW
  • LIKE THISSource: http://is.gd/iJU4et
  • OUT OF HERESource: http://is.gd/oFCXNH
  • IN TO HERESource: http://is.gd/iJU4et
  • ONE OF THEMOST UNDERRATED TOOLSYOU ALREADYHAVE.THE WHITEBOARD
  • TEARING IT DOWNSource: http://is.gd/Vrnwu4
  • The Toolshed
  • Communicating with MetricsSource: http://is.gd/L77vl1DATA DRIVEN CULTURECHECK THE GUTDRIVE ACCOUNTABILITYLEARN FROM FAILURE
  • CLOUDY SKIESSource: http://is.gd/arBZ4M
  • Putting It All TogetherSource: http://is.gd/L77vl1UNCHARTED WATERSFAMILIAR TOOLINGIMPROVED COMMUNICATIONSMEASURABLE SUCCESS STORIES
  • A Personal NoteSource: http://is.gd/L77vl1
  • A Personal NoteSource: http://is.gd/L77vl1
  • The Business PropositionSource: http://is.gd/L77vl1THE CLOUD ISN’T FREECOST PER HOST CAN GET SCARYLOOK FOR THE FREEBIES
  • AWESOMENESSSource: http://is.gd/iJU4et
  • Measuring SuccessSource: http://is.gd/L77vl1Before After Benefit $ SavingsApplication Availability % 99.91% 99.95% > 0.04% $167k revenue protection# of High Severity Incidents 21 10 < 50% $307k productivity# of Help desk Tickets 196 99 < 50%MTTR in Pre-Production 5 Days 2 Days < 45% $320k productivityTime To Build Runways 3 Months < 1 Week Seriously?!Operating Costs $$$$ TBDTeam Morale Bummered Beer
  • OUT OF HERESource: http://is.gd/oFCXNH
  • IN TO HERESource: http://is.gd/iJU4et
  • Source: http://is.gd/xKdI6EWHERE NEXT?
  • JOHN MARTINjmartin@edmunds.com@tekBuddhaSTEVE BURTONsburton@appdynamics.com@BurtonSaysQUESTIONS?We’re hiring!Stop by our booth!