-
1.
How to run a global, cloud scale event for
10.000 people
-
2.
How to run a global, cloud scale event for
10.000 people
@ROBBOS81
-
3.
35 countries
10.000 participants
1340 web apps
4500 YouTube views
1530 resource groups 7 Azure DevOps organizations
4 Azure subscriptions
3 outages in Azure DevOps
Half of the budget
Dedicated Microsoft SRE’s on call
36 hours
4 million impressions
Free community event
-
4.
WHAT’S IN IT FOR YOU?
This does not only work for community events
Shows working together in a globally distributed team
Technical design decisions
Design with the end in mind
-
5.
@ROBBOS81
WHAT IS GDBC?
• Global DevOps BootCamp
• Free
• Community event
• Saturday of learning
-
6.
@ROBBOS81
ORIGINATION
• DevOps as a topic
• Issues with other bootcamps:
• Only global by name
• Create your own:
• content
• workshop material
• Lots of work
EVENT OUT OF THE
BOX
-
7.
@ROBBOS81
EVENT OUT OF THE BOX
Worldwide event
Same content
Same exercises
Global vibe
-
8.
@ROBBOS81
EVENT OUT OF THE BOX
FOR LOCAL ORGANIZERS
Provide location
WIFI
Host the local event
Local speaker
Provide enough proctors
Engage local community
-
9.
• Global keynote
2018: Buck Hodges
Director of Engineering,
Azure DevOps
2019: Niall Murphy
Global Head of Azure SRE
2017: Donovan Brown
Cloud Advocate Manager,
Methods and Practices
EVENT OUT OF THE BOX
FOR US
-
10.
@ROBBOS81
EVENT OUT OF THE BOX
FOR US
• Global keynote
• Local keynote
• Content around a theme
• Exercises for attendees
• Scoreboard
• Infrastructure
• Communication
• Marketing & Branding
• Social media
-
11.
GDBC CORE TEAM
Team to create + run the global event
Volunteers from the community
Planning / meetings
Sponsoring
-
12.
SEARCH FOR VENUE ORGANIZERS
Get local venues:
• Google Forms + website
• MVP Summit
• Radio TFS
Marketing:
• Self-promotion
• Local communities
-
13.
IS KEY
Build a community of local venues and GDBC Core Team
• Slack
• Community calls
• Record and reshare
• Centralized wiki
-
14.
STAKEHOLDER MANAGEMENT
-
15.
@ROBBOS81
WHO ARE WE DOING IT FOR?
90 venues worldwide
-
16.
WHAT DO THE VENUES NEED?
Attendees
Content
-
17.
@ROBBOS81
ATTENDEES
Platform to handle:
• Attendee registration
• Local landing page
• Social marketing
Eventbrite
• Platform as a service
• Venue organizers
-
18.
ATTENDEES
Eventbrite
• API is not complete
• Co-admins are hard
• Not everything can be automated
Automate as much as possible
• ConsoleApp with Selenium to add co-admins
• Auto-invite to Slack
-
19.
SAAS OVER PAAS OVER IAAS
-
20.
REACHING ATTENDEES - SOCIAL
-
21.
PREPARING THE CONTENT
Keynotes
Exercises
Workspace
Video content
Styling Guide
Theme
-
22.
THEME 2018 THEME 2019THEME 2017
-
23.
WORKSPACE
-
24.
CHALLENGES
•SSL certificate expired
-
25.
CHALLENGES
•SSL certificate expired
•Flaky connection
•Credential leak
•Exception rate goes up
•DDOS after CEO message
•Supply chain attack
-
26.
DDOS AFTER CEO MESSAGE
-
27.
@ROBBOS81
COMMUNITY
• What is GDBC
• Webshop
• Challenges
-
28.
SETTING UP AZURE
-
29.
@ROBBOS81
WORKING WEBSHOP
+
+
-
30.
@ROBBOS81
WHAT DO WE NEED
Resource group
App Service
SQL Server Database
Application Insights
Azure Active Directory
• Venue admin account
• Per team:
• User account + user group
• Service principal for Azure DevOps
X 1200
-
31.
REGIONALLY DIVIDED
•Spread the load
•Reduce latency for users
-
32.
@ROBBOS81
USE WHAT IS BEST FOR YOU
-
33.
AZURE DEVOPS TO THE RESCUE
-
34.
• SQL Servers India: max capacity reached
• Application Insights Central US unavailable
• Resource types off by default
-
35.
SQL SERVER LIMITS
20 per region per subscription
200 max. per subscription
-
36.
2000
role assignments
per subscription
-
37.
DON’T TRUST THE DEFAULT
• During testing, creating a default SQL was a S1 Database
• Cost: € 25 / month
• During the last week, the default change to a DS3v1
• Cost: € 315 / month
• We created 1200 databases…..
-
38.
COST MANAGEMENT
-
39.
AZURE HAS BEEN TACKLED
-
40.
AZURE DEVOPS
-
41.
@ROBBOS81
REQUIREMENTS FOR AZURE DEVOPS
Organization
Team project
Git repository
Build pipeline (CI)
Deployment pipeline (CD)
Service connection to Azure
Artifact feed
Azure Active Directory Link
X 1200
-
42.
@ROBBOS81
AZURE DEVOPS PROVISIONING
Multiple organizations to spread the load:
Use a service account for setup
• Australia
• Brazil
• Canada
• East Asia
• West Europe
• India
• United Kingdom
• United States
-
43.
BUILD AGENTS
7
organizations
1200
teams
±200
teams per org
1-10
concurrent
pipelines
150
sponsored
pipelines
per org:
-
44.
@ROBBOS81
-
45.
@ROBBOS81
Every day in preparation
Azure Infrastructure
Azure DevOps
Certificates
-
46.
@ROBBOS81
-
47.
LEARNINGS AZURE DEVOPS PROVISIONING
Hitting the service at scale can trigger some weird issues
• Build pipeline outage
• Two regions
• Quick fix: 1 concurrent pipeline
• Australia networking outage
• Agent scale set network issue
-
48.
@ROBBOS81
AZURE DEVOPS – PRODUCT TEAM
Responsive team
24 hours 3 SRE’s assigned
-
49.
SPONSOR BUY IN
-
50.
AZURE DEVOPS
-
51.
@ROBBOS81
EVENT DAY
-
52.
@ROBBOS81
CHALLENGES WEBSITE
1. Explanation
2. Detect
3. Respond: Quick fix
4. Post-mortem
5. Recover
-
53.
@ROBBOS81
CHALLENGE
Docker containers to disrupt the webshop
Start, stop, validate and scoring
• Isolated
• Own technology stack
• Parameters injected
-
54.
@ROBBOS81
CONTAINER REQUIREMENTS
• Asynchronous
• Fast
• Scalable
X 30.000
-
55.
@ROBBOS81
WHERE TO RUN
Azure Container Instances
No initial setup
Slow start of containers
Soft limits
Azure Kubernetes Services
Provision cluster
Limited to available hardware
Limited to nodes, scale up
-
56.
@ROBBOS81
MONITORING
-
57.
@ROBBOS81
AKS CLUSTER MONITORING
-
58.
YOU BUILD IT, YOU RUN IT
36
hours
-
59.
RISK MANAGEMENT
-
60.
@ROBBOS81
LEARNINGS
36 hours monitoring is hard!
Find an SRE in a different time zone
• Preparation is key
• Isolation and independency
• Caching and scaling really helps
• Insights and control
4:00 AM
-
61.
CUSTOMER HAPPINESS
-
62.
TAKE AWAYS
START WITH A VISION
DESIGN WITH THE END IN MIND
ENGAGE SPONSORS AND
STAKEHOLDERS
THINK BIG
BUILD VS BUY
RISK MANAGEMENT
CUSTOMER HAPPINESS
-
63.
@ROBBOS81
MOST IMPORTANTLY
DON’T THINK,
ACT!
-
64.
@ROBBOS81
LOCAL DEVOPS BOOTCAMP
https://localdevopsbootcamp.com
• SSL certificate expired
• Flaky connection
• Credential leak
• Exception rate goes up
• DDOS after CEO message
• Supply chain attack
-
65.
https://xpir.it/LinksGDBC
ROB BOS - @ROBBOS81
-
66.
How to run a global, cloud scale event for 10.000
people
@ROBBOS81
We started in New Zealand
Stopped in Seattle
You just saw the intro video for the global devops bootcamp 2019. A community event we organized for the last three years. We want to tell you this story on how we run this over 35 countries with 10000 participants, 7 azure devops environment and causing 3 outages in Azure DevOps.
Who are you doing it for / with?
Vision
Working together in a globally distributed team
Open source / daily job
Clearly defined purpose
Empowerement
Isolated architecture as a starting point
Communication
Technical design decisions
No big design up front
Do think big
Scalable
Monetary restrictions
Techical restrictions
Design with the end in mind
2017: What is DevOps
2018: DevOps at Microsoft
2019: SRE & DevOps
Team of volunteers to create content
Volunteers
Sponsoring in time/money/Azure Credits/Tweet wall/Snyk package scanner support
Planning: Weekly meeting in MS Teams
Message from the CEO of Parts Unlimited
SWITCH SPEAKER
What do we need to provision in Azure
Goal is a working webshop for a team of attendees
Webshop: App Service / Sql DB / Application Insights
Deliver working webshop for attendees
+ CI/CD pipeline in Azure DevOps
Webshop has been selected
Team of 5 attendees
Started with Azure CLI
Switched to ConsoleApp.exe
Scalable, Repeatable process Azure DevOps
Our own pipeline
Azure Separated for fast iterations during preparation phase
SQL Server: 20 is a soft limit: resolved through support ticket (two days to late!) 1200 / 4 subs = 300 SQL Servers….
No interesting limits on e.g. App Service Plans (100 per resource group). 200 is a hard limit
Crucial factor, Credit Card
Inception! What now?
Rolling out webshops to production
1 Azure DevOps Project to provision Azure DevOps team projects (x1200)
SWITCH SPEAKER
Per team needed
7 Azure DevOps organizations: one for each supported region
All attached to same AAD: so one account to rule them all
Concurrent hosted pipelines: 100 or 150 per organization:
200 teams per organization
7 Azure DevOps organizations: one for each supported region
All attached to same AAD: so one account to rule them all
Concurrent hosted pipelines: 100 or 150 per organization:
200 teams per organization
Peak usage: 700 concurrent pipelines
Full pipeline overview
Certificate separate : timing issue with App Service Ready
Part 1: Export dataset = ConsoleApp.exe
Provision Azure DevOps team projects + AD
DNS here, because it takes a while to be ready
Part 2: Init AzDo Team Project
Part 3: Trigger all the builds (would incur cost)
Part 1: Export dataset = ConsoleApp.exe
Provision Azure DevOps team projects + AD
DNS here, because it takes a while to be ready
Part 2: Init AzDo Team Project
Part 3: Trigger all the builds (would incur cost)
Part 1: Export dataset = ConsoleApp.exe
Provision Azure DevOps team projects + AD
DNS here, because it takes a while to be ready
Part 2: Init AzDo Team Project
Part 3: Trigger all the builds (would incur cost)
Part 1: Export dataset = ConsoleApp.exe
Provision Azure DevOps team projects + AD
DNS here, because it takes a while to be ready
Part 2: Init AzDo Team Project
Part 3: Trigger all the builds (would incur cost)
Build pipeline outage in 2 regions: starting 400 pipelines (with 400 CD releases after them!) causes some load on Azure DevOps
Scaled down to 1 concurrent pipeline on all regions
Austalia: Networking issue on Scaleset Npm restore failed
Rene in call until 12 PM
This is the break
SWITCH SPEAKER
SWITCH SPEAKER
Switch to gdbc-challenge-com
Secured by Azure AD integration
Table storage for team state
ACI: nice, no orchestration needed, might be cheaper
ACI: to slow to start a container
ACI Constraint: max 300 container create per hour
Google Analytics
AppIication Insights
Checking the load on the cluster – Region Europe
We see that pods are starting, stopping isn’t that visible
We use short lived pods, but the default garbage collection is on 12.000 pods, so that line doesn’t go down
Custom events with tracking ID
Slack channel + Bridge on Teams
Slack channel #SRE
Bridge on Microsoft Teams
Command Center
Monitoring, be in control
Fallbacks, caching and backups
Preparation is key
Support tickets for raising SQL Servers limits took 2 weeks and where to late!
Twitter preparation
Event starts in New Zealand: that is 11 PM for us in NL
New Zealand: 11:00 PM Started
Europe: 09:00 AM
West coast US ended at 02:00 AM