Successfully reported this slideshow.

How to run a global, cloud scale event for 10.000 people

0

Share

Loading in …3
×
1 of 74
1 of 74

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

How to run a global, cloud scale event for 10.000 people

  1. 1. How to run a global, cloud scale event for 10.000 people
  2. 2. How to run a global, cloud scale event for 10.000 people @ROBBOS81
  3. 3. 35 countries 10.000 participants 1340 web apps 4500 YouTube views 1530 resource groups 7 Azure DevOps organizations 4 Azure subscriptions 3 outages in Azure DevOps Half of the budget Dedicated Microsoft SRE’s on call 36 hours 4 million impressions Free community event
  4. 4. WHAT’S IN IT FOR YOU? This does not only work for community events Shows working together in a globally distributed team Technical design decisions Design with the end in mind
  5. 5. @ROBBOS81 WHAT IS GDBC? • Global DevOps BootCamp • Free • Community event • Saturday of learning
  6. 6. @ROBBOS81 ORIGINATION • DevOps as a topic • Issues with other bootcamps: • Only global by name • Create your own: • content • workshop material • Lots of work EVENT OUT OF THE BOX
  7. 7. @ROBBOS81 EVENT OUT OF THE BOX Worldwide event Same content Same exercises Global vibe
  8. 8. @ROBBOS81 EVENT OUT OF THE BOX FOR LOCAL ORGANIZERS Provide location WIFI Host the local event Local speaker Provide enough proctors Engage local community
  9. 9. • Global keynote 2018: Buck Hodges Director of Engineering, Azure DevOps 2019: Niall Murphy Global Head of Azure SRE 2017: Donovan Brown Cloud Advocate Manager, Methods and Practices EVENT OUT OF THE BOX FOR US
  10. 10. @ROBBOS81 EVENT OUT OF THE BOX FOR US • Global keynote • Local keynote • Content around a theme • Exercises for attendees • Scoreboard • Infrastructure • Communication • Marketing & Branding • Social media
  11. 11. GDBC CORE TEAM Team to create + run the global event Volunteers from the community Planning / meetings Sponsoring
  12. 12. SEARCH FOR VENUE ORGANIZERS Get local venues: • Google Forms + website • MVP Summit • Radio TFS Marketing: • Self-promotion • Local communities
  13. 13. IS KEY Build a community of local venues and GDBC Core Team • Slack • Community calls • Record and reshare • Centralized wiki
  14. 14. STAKEHOLDER MANAGEMENT
  15. 15. @ROBBOS81 WHO ARE WE DOING IT FOR? 90 venues worldwide
  16. 16. WHAT DO THE VENUES NEED? Attendees Content
  17. 17. @ROBBOS81 ATTENDEES Platform to handle: • Attendee registration • Local landing page • Social marketing Eventbrite • Platform as a service • Venue organizers
  18. 18. ATTENDEES Eventbrite • API is not complete • Co-admins are hard • Not everything can be automated Automate as much as possible • ConsoleApp with Selenium to add co-admins • Auto-invite to Slack
  19. 19. SAAS OVER PAAS OVER IAAS
  20. 20. REACHING ATTENDEES - SOCIAL
  21. 21. PREPARING THE CONTENT Keynotes Exercises Workspace Video content Styling Guide Theme
  22. 22. THEME 2018 THEME 2019THEME 2017
  23. 23. WORKSPACE
  24. 24. CHALLENGES •SSL certificate expired
  25. 25. CHALLENGES •SSL certificate expired •Flaky connection •Credential leak •Exception rate goes up •DDOS after CEO message •Supply chain attack
  26. 26. DDOS AFTER CEO MESSAGE
  27. 27. @ROBBOS81 COMMUNITY • What is GDBC • Webshop • Challenges
  28. 28. SETTING UP AZURE
  29. 29. @ROBBOS81 WORKING WEBSHOP + +
  30. 30. @ROBBOS81 WHAT DO WE NEED Resource group App Service SQL Server Database Application Insights Azure Active Directory • Venue admin account • Per team: • User account + user group • Service principal for Azure DevOps X 1200
  31. 31. REGIONALLY DIVIDED •Spread the load •Reduce latency for users
  32. 32. @ROBBOS81 USE WHAT IS BEST FOR YOU
  33. 33. AZURE DEVOPS TO THE RESCUE
  34. 34. • SQL Servers India: max capacity reached • Application Insights Central US unavailable • Resource types off by default
  35. 35. SQL SERVER LIMITS 20 per region per subscription 200 max. per subscription
  36. 36. 2000 role assignments per subscription
  37. 37. DON’T TRUST THE DEFAULT • During testing, creating a default SQL was a S1 Database • Cost: € 25 / month • During the last week, the default change to a DS3v1 • Cost: € 315 / month • We created 1200 databases…..
  38. 38. COST MANAGEMENT
  39. 39. AZURE HAS BEEN TACKLED
  40. 40. AZURE DEVOPS
  41. 41. @ROBBOS81 REQUIREMENTS FOR AZURE DEVOPS Organization Team project Git repository Build pipeline (CI) Deployment pipeline (CD) Service connection to Azure Artifact feed Azure Active Directory Link X 1200
  42. 42. @ROBBOS81 AZURE DEVOPS PROVISIONING Multiple organizations to spread the load: Use a service account for setup • Australia • Brazil • Canada • East Asia • West Europe • India • United Kingdom • United States
  43. 43. BUILD AGENTS 7 organizations 1200 teams ±200 teams per org 1-10 concurrent pipelines 150 sponsored pipelines per org:
  44. 44. @ROBBOS81
  45. 45. @ROBBOS81 Every day in preparation Azure Infrastructure Azure DevOps Certificates
  46. 46. @ROBBOS81
  47. 47. LEARNINGS AZURE DEVOPS PROVISIONING Hitting the service at scale can trigger some weird issues • Build pipeline outage • Two regions • Quick fix: 1 concurrent pipeline • Australia networking outage • Agent scale set network issue
  48. 48. @ROBBOS81 AZURE DEVOPS – PRODUCT TEAM Responsive team 24 hours 3 SRE’s assigned
  49. 49. SPONSOR BUY IN
  50. 50. AZURE DEVOPS
  51. 51. @ROBBOS81 EVENT DAY
  52. 52. @ROBBOS81 CHALLENGES WEBSITE 1. Explanation 2. Detect 3. Respond: Quick fix 4. Post-mortem 5. Recover
  53. 53. @ROBBOS81 CHALLENGE Docker containers to disrupt the webshop Start, stop, validate and scoring • Isolated • Own technology stack • Parameters injected
  54. 54. @ROBBOS81 CONTAINER REQUIREMENTS • Asynchronous • Fast • Scalable X 30.000
  55. 55. @ROBBOS81 WHERE TO RUN Azure Container Instances No initial setup Slow start of containers Soft limits Azure Kubernetes Services Provision cluster Limited to available hardware Limited to nodes, scale up
  56. 56. @ROBBOS81 MONITORING
  57. 57. @ROBBOS81 AKS CLUSTER MONITORING
  58. 58. YOU BUILD IT, YOU RUN IT 36 hours
  59. 59. RISK MANAGEMENT
  60. 60. @ROBBOS81 LEARNINGS 36 hours monitoring is hard! Find an SRE in a different time zone • Preparation is key • Isolation and independency • Caching and scaling really helps • Insights and control 4:00 AM
  61. 61. CUSTOMER HAPPINESS
  62. 62. TAKE AWAYS START WITH A VISION DESIGN WITH THE END IN MIND ENGAGE SPONSORS AND STAKEHOLDERS THINK BIG BUILD VS BUY RISK MANAGEMENT CUSTOMER HAPPINESS
  63. 63. @ROBBOS81 MOST IMPORTANTLY DON’T THINK, ACT!
  64. 64. @ROBBOS81 LOCAL DEVOPS BOOTCAMP https://localdevopsbootcamp.com • SSL certificate expired • Flaky connection • Credential leak • Exception rate goes up • DDOS after CEO message • Supply chain attack
  65. 65. https://xpir.it/LinksGDBC ROB BOS - @ROBBOS81
  66. 66. How to run a global, cloud scale event for 10.000 people @ROBBOS81

Editor's Notes

  • We started in New Zealand
    Stopped in Seattle
  • You just saw the intro video for the global devops bootcamp 2019. A community event we organized for the last three years. We want to tell you this story on how we run this over 35 countries with 10000 participants, 7 azure devops environment and causing 3 outages in Azure DevOps.
  • Who are you doing it for / with?
    Vision

    Working together in a globally distributed team
    Open source / daily job
    Clearly defined purpose
    Empowerement
    Isolated architecture as a starting point
    Communication

    Technical design decisions
    No big design up front
    Do think big
    Scalable
    Monetary restrictions
    Techical restrictions

    Design with the end in mind
  • 2017: What is DevOps
    2018: DevOps at Microsoft
    2019: SRE & DevOps
  • Team of volunteers to create content
    Volunteers
    Sponsoring in time/money/Azure Credits/Tweet wall/Snyk package scanner support
    Planning: Weekly meeting in MS Teams
  • Message from the CEO of Parts Unlimited
  • SWITCH SPEAKER

    What do we need to provision in Azure
  • Goal is a working webshop for a team of attendees

    Webshop: App Service / Sql DB / Application Insights

    Deliver working webshop for attendees
    + CI/CD pipeline in Azure DevOps
  • Webshop has been selected

    Team of 5 attendees
  • Started with Azure CLI
    Switched to ConsoleApp.exe
    Scalable, Repeatable process  Azure DevOps

  • Our own pipeline

    Azure Separated for fast iterations during preparation phase

  • SQL Server: 20 is a soft limit: resolved through support ticket (two days to late!) 1200 / 4 subs = 300 SQL Servers….
    No interesting limits on e.g. App Service Plans (100 per resource group). 200 is a hard limit

  • Crucial factor, Credit Card
  • Inception! What now?

    Rolling out webshops to production

    1 Azure DevOps Project to provision Azure DevOps team projects (x1200)
  • SWITCH SPEAKER
  • Per team needed
  • 7 Azure DevOps organizations: one for each supported region
    All attached to same AAD: so one account to rule them all
    Concurrent hosted pipelines: 100 or 150 per organization:
    200 teams per organization
  • 7 Azure DevOps organizations: one for each supported region
    All attached to same AAD: so one account to rule them all
    Concurrent hosted pipelines: 100 or 150 per organization:
    200 teams per organization

    Peak usage: 700 concurrent pipelines
  • Full pipeline overview
  • Certificate separate : timing issue with App Service Ready
  • Part 1: Export dataset = ConsoleApp.exe
    Provision Azure DevOps team projects + AD
    DNS here, because it takes a while to be ready

    Part 2: Init AzDo Team Project
    Part 3: Trigger all the builds (would incur cost)
  • Part 1: Export dataset = ConsoleApp.exe
    Provision Azure DevOps team projects + AD
    DNS here, because it takes a while to be ready

    Part 2: Init AzDo Team Project
    Part 3: Trigger all the builds (would incur cost)
  • Part 1: Export dataset = ConsoleApp.exe
    Provision Azure DevOps team projects + AD
    DNS here, because it takes a while to be ready

    Part 2: Init AzDo Team Project
    Part 3: Trigger all the builds (would incur cost)
  • Part 1: Export dataset = ConsoleApp.exe
    Provision Azure DevOps team projects + AD
    DNS here, because it takes a while to be ready

    Part 2: Init AzDo Team Project
    Part 3: Trigger all the builds (would incur cost)
  • Build pipeline outage in 2 regions: starting 400 pipelines (with 400 CD releases after them!) causes some load on Azure DevOps
    Scaled down to 1 concurrent pipeline on all regions

    Austalia: Networking issue on Scaleset  Npm restore failed
    Rene in call until 12 PM
  • This is the break
  • SWITCH SPEAKER
  • SWITCH SPEAKER

  • Switch to gdbc-challenge-com
    Secured by Azure AD integration
    Table storage for team state
  • ACI: nice, no orchestration needed, might be cheaper
    ACI: to slow to start a container
    ACI Constraint: max 300 container create per hour


  • Google Analytics
    AppIication Insights
  • Checking the load on the cluster – Region Europe
    We see that pods are starting, stopping isn’t that visible

    We use short lived pods, but the default garbage collection is on 12.000 pods, so that line doesn’t go down
  • Custom events with tracking ID
  • Slack channel + Bridge on Teams
    Slack channel #SRE
    Bridge on Microsoft Teams
    Command Center
  • Monitoring, be in control
    Fallbacks, caching and backups
  • Preparation is key
    Support tickets for raising SQL Servers limits took 2 weeks and where to late!
    Twitter preparation

    Event starts in New Zealand: that is 11 PM for us in NL

    New Zealand: 11:00 PM Started
    Europe: 09:00 AM
    West coast US ended at 02:00 AM
  • ×