DevOps and Performance - Why, How and Best Practices - DevOps Meetup Sydney


Published on

Talk given at DevOps Meetup in Sydney

Published in: Technology, Software
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Who knows what that is?

    It’s the Fifa World Cup Trophy
  • Teams are currently competing in the qualifications to compete in Brazil 2014
  • This is “my” austrian national team soccer team.

    Their GOAL is to qualify for Brazil 2014. After the many failed attempts in the past we hired a new coach who’s goal is to form a new team that PERFORMs good enough to qualify
  • In order to get there the team competed in many test games. Which gaves them a lot of confidence because they played against teams that were “easier” to beat.

    At the end of these tests we even started in the qualification with some wins against teams that we were expecting to win

    So – at the end of these “test and easy qualification games” we thought: “ALL GOOD – THE ROAD IS OPEN FOR 2014 – NOT ONLY WILL WE QUALIFY BUT WE ALSO BELIEVE WE HAVE SUCH A STRONG TEAM THAT WILL ALSO DO WELL AT THE WORLDCUP”
  • Then reality kicked in when we had our first “real competitor” – it was the first qualification against a team whos quality level is at a level that we have to expect at the world cup.
    The competing team was Germany – and – based on these images you can see how the game went
  • The coach is responsible to watch the game and see how things are going.
    Like in other sports – soccer has a couple of Key Performance Indicators such as Ball Possession, Fouls and the actual score

    The first 5 minutes actually didn’t look too bad
  • After the first 5 minutes the game changes – with germany taking over the game in their typical way. The KPIs make this very clear

    The coach is responsible to react based on these values and how the game wents
  • The coach should use more data for detailed analysis on what is going wrong in the game
  • One of his options is to substitute players – or even change tactics

    Does this succeed based on the KPIs that we have seen before?
  • Well – not always. Just replacing players – putting some in that are faster in chasing the ball doesn’t always help
  • Story
    New Build Deployed on Thursday Evening
    Everything runs smooth on Friday Daytime
    An Ad Campaign hits the Air Friday Night
    The site crashes under load -> ALERTS GO OFF
    Restarting Server -> SERVER DOESN’T START
    Adding more Servers-> PROBLEM REMAINS
    Calling in the “App Experts” and Pizza Delivery!
  • Well – I guess there is just not more to say about this. The attitude between these teams doesn’t help in solving issues any faster
  • DevOps and Performance - Why, How and Best Practices - DevOps Meetup Sydney

    1. DevOps and Performance Why, How & Best Practices @grabnerandi
    2. What you may have heard about Austrians
    3. And just very recently @ Euro Song Contest
    4. How we would like the world to see us 
    5. What we are also proud of 
    6. What you should check out
    7. The stuff we did when we were a Start Up and we All were Devs, Testers and Ops
    8. YOU ARE NOT ALONE: Popularity on Google
    9. Who is doing it? How many successful deployments can they do? 300 Deployments / Year 50-60 Deployments / Day 10+ Deployments / Day Every 11.6 seconds
    10. More on Amazons Story 75% fewer outages since 2006 90% fewer outage minutes ~0.001% of deployments cause a problem Instantaneous automatic rollback Deploying every 11.6s
    11. Testing is Important – and gives Confidence
    12. But are we ready for “The Real” world?
    13. Measure Performance during the game Ball Possession: 40 : 60 Fouls: 0 : 0 Score: 0 : 0 Minute 1 - 5
    14. Measure Performance during the game Minute 6 - 35 Ball Possession: 80 : 20 Fouls: 2 : 12 Score: 0 : 0
    15. Deep Dive Analysis
    16. Options “To Fix” the situation
    17. Not always a happy ending  Minute 90 Ball Possession: 80 : 20 Fouls: 4 : 25 Score: 3 : 0
    18. FRUSTRATED FANS!! 25
    19. How does that relate to Software?
    20. From Deploy to … Deploy Promotion/Event Problems Ops Playbook War Room Timeline
    21. The “War Room” – back then 'Houston, we have a problem‘ NASA Mission Control Center, Apollo 13, 1970
    22. The “War Room” – NOW Facebook – December 2012
    23. 3 Situations on WHY this happens, HOW to avoid it
    24. Image taken from
    25. #Disconnected Teams
    26. “Teamwork” between Dev and Ops SEV1 Problem in Production Need access to log files Where are they? Can’t get them Need to increase log level Can’t do! Can’t change config files in prod!
    27. “Solution”: Implement a Custom “On Demand” Remote Logger
    28. Implementation and Rollout Implemented Custom Logger Worked well in Load Testing
    29. What happened? ~ 1Mio Lock Exceptions in 30 mins
    30. Root Cause: A special WebSphere Setting! Log Service provides a synchronized log file across ALL JVMs Log Service provides a synchronized log file across ALL JVMs
    31. Metrics: # Log Messages, # Exceptions Share: Same Server Settings DevOps: Agree on Data for Troubleshooting
    32. #No “Agile” Deployment
    33. Adonair Load Spike resulted in Unavailability
    34. Alternative: “GoDaddy goes DevOps” 1h before SuperBowl KickOff 1h after Game ended
    35. Behind the Scenes
    36. Metrics: Availability Page Size, # Objects # Hosts, # Connections DevOps: “Feature” Switches
    37. #Push without a Plan
    38. Mobile Landing Page of Super Bowl Ad 434 Resources in total on that page: 230 JPEGs, 75 PNGs, 50 GIFs, … Total size of ~ 20MB
    39. redirects to ALL CSS and JS files are redirected to the www domain This is a lot of time “wasted” especially on high latency mobile connections
    40. Critical Pages not Optimized! Browse, Search and Product Info performs well … because they don’t follow best practices: 87 Requests, 28 Redirects, … Critical Pages such as Shopping Cart are very slow …
    41. Metrics: Load Time, # Resources (Images, …), # HTTP 3xx, 4xx, 5xx Dev: Build for Mobile Test: Test on Mobile Ops: Monitor Mobile
    42. # of Requests / User # of Log Messages # of Exceptions # Objects Allocated # Objects In Cache Cache Hit Ratio # of Images # of SQLs # SQLs per RequestAvailability # HTTP 3xx, 4xx Page Size
    43. 54
    44. Commit Stage • Compile • Execute Unit Test • Code Analysis • Build installers Automated Acceptance Testing Automated Capacity Testing Manual testing • Key showcases • Exploratory testing Release Unit & Integration Tests Functional Tests Performance Tests Production Monitoring Functional Tests
    45. If we do all that
    46. Which gives you more time for the real important things in life …
    47. Want MORE of these and more details?
    48. Recommended Book
    49. FREE Products & More Info • dynaTrace Enterprise – Full End-to-End Visibility in your Java, .NET, PHP Apps – Sign up for a 15 Days Free Trial on • dynaTrace AJAX Edition – Browser Diagnostics for IE + FF – Download @ • Our Blog: