Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How StepStone achieved a Cloud Centre of Excellence


Published on

What would you do if shortly after migrating the majority of your workloads to the public cloud, you began to struggle with unexpected cost increases, lack of visibility, and overrunning budgets by hundreds of thousands of dollars? This is what happened to StepStone, one of the largest online job boards in Europe.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

How StepStone achieved a Cloud Centre of Excellence

  2. 2. Let’s have some context…
  3. 3. WHO ARE STEPSTONE? • Successful online job board business in 20 countries. • 60 million visits / 600,000 jobs advertised per month. • 29 million resumes. • 24 million job alert subscribers. • 2,200+ employees. • Latest acquisition: Universum (Sweden). 3
  5. 5. WHO AM I? • 20 years experience in Software Engineering • Head of Development for NHS Jobs  The NHS employs 1.3 million staff (5th largest employer in the world) • Led migration of Jobsite to AWS  Key success criteria: “Don’t destroy the business”  “Right First Time” • Now Group AWS Programme Manager at StepStone. 5
  6. 6. Would you like to hear a story?
  7. 7. TOTALJOBS: FIRST TO MIGRATE TO AWS • Acquired by StepStone in 2012. • Deadline set by previous owners to vacate their Data Centre. • Decision made to migrate to AWS. • ‘Lift and shift’ / ‘Forklift strategy. • Successful – leading the way for further group adoption 7
  8. 8. OUR GROWTH IN AWS (UP TO JULY 2017) 8 Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 Jul-16 Oct-16 Jan-17 Apr-17 Jul-17 AWS Hosting Costs for StepStone Group StepStone Brand build starts StepStone Brand launches StepWeb Data Wizards & StepMatch Workray and Good & Co Career Junction Saongroup Jobsite
  9. 9. TOO FAST ! 9 Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 Jul-16 Oct-16 Jan-17 Apr-17 Jul-17 AWS Hosting Costs for StepStone Group Costs had been predicted to flatten out here for the year…
  10. 10. 2017: COST EXPLOSION ! • Budget considered ‘blown’ by mid 2017. • Major concern to stakeholders (e.g. CFO)  “Costs out of control”  “We didn’t have this with the Data Centre”  “How can we forecast with this happening?” • Finance teams frustrated with lack of accuracy. • No group oversight. 10
  11. 11. … AND ANOTHER THING …… • Brands operating in silos. • Development teams in UK, Germany, Poland, South Africa, South America, United States…. • Security and Cloud Best Architectural practices – are • they being used and checked? • DevOps: Are we building and releasing things in the best • way for the cloud? • Have we moved on from ‘lift and shift’ migrations? 11
  12. 12. How do you solve a problem like Maria that?
  13. 13. ESTABLISHING A CENTRE OF EXCELLENCE 13 Security Cost Control Best Practice
  14. 14. SECURITY • Standards agreed in sessions with the Account Owners across the entire Group. • Monitored / enforced by Group Security Team. • Examples:  MFA everywhere.  SSO.  Akamai integration (Application Firewall, DDOS protection) • Agreed timetables for implementation. • Areas to target agreed within the Group each quarter. 14
  15. 15. COST CONTROL • All accounts have a responsible Account Owner. • Monthly budget meetings backed with an agreed forecasting process. • Whole year forecasted and refined as the year progresses. • Any exceptions (typically 10% variance) followed up: Full explanation and plan of action required. • CloudHealth rolled out everywhere to enable cost tracking, analysing and alerting. (Enterprise level dashboard tools). • Group Target for Reservations… 15
  16. 16. COST CONTROL: RESERVATIONS EC2 capacity - % Reserved 16 0 10 20 30 40 50 60 70 2018-072018-062018-052018-042018-032018-022018-012017-122017-112017-102017-092017-082017-07 July 2017: 24% EC2 capacity reserved July 2018: 59% EC2 capacity reserved
  17. 17. COST CONTROL: RESERVATIONS II EC2 capacity - % Savings 17 0 10 20 30 40 50 60 70 80 2018-072018-062018-052018-042018-032018-022018-012017-122017-112017-102017-092017-08 July 2017: 50% Reservation Savings July 2018: 68% Reservation Savings
  18. 18. COST CONTROL: WHAT ABOUT SPOT? • Spot instances: EC2 savings of up to 90%. • Ideal for certain application types, e.g. Big Data • processing. • Case study: StepWeb. • 6 stage pipeline, 5.5 TB database, circa 100 x r3.8xlarge instances. • EMR was running monthly On Demand. • Migrated to Spot. • 87% savings. 18
  19. 19. COST CONTROL: HOW DID 2018 TURN OUT? • 2018 was a much better year. • We came in comfortably under budget with more accurate and explained forecasting. • ‘No surprises’ concept for Stakeholders (CTO, CFO, Finance Teams….) 19
  20. 20. BEST PRACTICE • Monthly Community of Practice (online) sessions: Account Owners, DevOps teams… anyone who is interested! • Quarterly Workshops (location rotates): More detailed presentations, including third-party guests. Broadcast live + recorded for later. • Presentations include: Reservations, Spot Instances, Security, DevOps, project walkthroughs…. And more. • Spreads Best Practice throughout the Group, learning from both inside and outside. • Slack used extensively, including 3rd party Guests for ‘instant’ access. 20
  21. 21. BUZZWORD BINGO: GAMIFICATION • Teams invited to enter our yearly competition • Entries assessed on all of the things I have mentioned! • Winning = £££££. 21
  23. 23. EVERYONE MUST GET CERTIFIED • All users who can make changes to Production environments need to be Certified to a minimum of Associate standard (EOY 2019) • Not a difficult standard for anyone ‘hands on’ but it ensures principles of cloud architecture and best design are understood. • Variety of training methods available. A Cloud Guru has greatest adoption right now. 23
  24. 24. NOW WE HAVE A CENTRE OF EXCELLENCE! • It has been a real adventure to get to this point. • We have Security, Cost and Best Practice under control. • We have adoption across the StepStone Group (multiple brands and countries). • We have the confidence of our Stakeholders. So we’re done, right? Right?! 24
  25. 25. What next?
  27. 27. RESILIENCE – FUTURE STEPS • Best Practice is a moving target. • We already build out in multiple AZs, Regions etc. • Ensure full Active / Active infrastructure everywhere. • Not just tested: CHAOS TESTED ( The ‘Terminate What You Like’ Test ) • Game Days: Ensure team sharpness. • AWS gives you the tools but correct implementation down to you. 27
  28. 28. Conclusions
  29. 29. WHAT HAVE WE LEARNED? • Establish regular Community of Practice sessions. Get the right people there! Spread the word! • Encourage presentations from all teams, and include the outside world. • Establish frameworks for Security and Cost Control that are easy to understand and use to ensure adoption. • Invest in the right tools to help (“CloudHealth is the difference between life and death” – DevOps Lead) • Continue to drive Best Practice to ensure the strongest architecture. 29
  30. 30. Thank you!