SlideShare a Scribd company logo
1 of 38
Download to read offline
How people build software!
MySQL Infrastructure
Testing Automation 

@ GitHub
IkeWalker
GitHub
Boston MySQL Meetup
December 11, 2017
1
!
How people build software!
Agenda
• About
• MySQL @ GitHub
• Automation
• Backup/restores
• Failovers
• Schema migrations
2
!
How people build software!
About me
• Database Architect
• Working with MySQL since 2006
• Organizer of Boston MySQL Meetup
github.com/ikewalker
@iowalker
3
!
How people build software! 4
• The world’s largest Octocat T-shirt and stickers store
• And water bottles
• And hoodies
• We also do stuff related to things
• Word is new swag is coming up
GitHub
How people build software!
GitHub
• 66M repositories
• 24M developers
• 117K businesses
• More than a million teams
• World’s largest open source hosting
• Alexa top 100
• Critical path in build flows
5
!
How people build software!
MySQL at GitHub
• GitHub stores repositories in git, and uses MySQL
as the backend database for all related metadata:
• Repository metadata, users, issues, pull
requests, comments etc.
• Website/API/Auth/more all use MySQL.
• We run a few (growing number of) clusters, totaling
over 100 MySQL servers.
• The setup isn’t very large but very busy.
6
!
How people build software!
MySQL at GitHub
• Our MySQL servers must be available, responsive
and in good state
• GitHub has 99.95% SLA
• Availability issues must be handled quickly, as
automatically as possible.
7
!
How people build software!
github/database-infrastructure
• @ggunson, @jessbreckenridge, @jonahberquist,
@shlomi-noach, @tomkrouper, @gtowey
• Concerned with:
• Data availability
• Data integrity
8
!
How people build software!
Testing
9
!
How people build software!
Backups/restores
that ^
10
How people build software!
Your data
It’s important
11
!
How people build software!
Restores
• Dedicated restore servers.
• One per cluster.
• Continuously restores, catches up with replication,
restores, catches up with replication, restores, …
• Sending a “success” event at the end of each cycle.
• We monitor for number of “success” events in past
24-ish hours, per cluster.
12
!
How people build software! 13
!
!
!
!
!
production replicas
auto-restore replica
master
!
auto-restore replicas
""""""
backup replica
How people build software!
Restores
• New host provisioning uses same flow as restore.
• A human may kick a restore/reclone manually.
• Chatops: 

.mysql backup-restore -H restore.this.host -r
14
!
How people build software!
Restore failure
• A specific backup/restore may fail because
computers.
• No reason for panic.
• Previous backup/restores proven to be working
• At most we lose time
• Two sequential failures, or failures across clusters
are incidents to be investigated
15
!
How people build software!
Restore: delayed replica
• One delayed replica per cluster
• Lagging at 4 hours
• Chatops: .mysql panic
16
!
How people build software!
Failovers
^ that, too
17
How people build software!
MySQL setup @ GitHub
• Plain-old single writer master-replicas
asynchronous replication.
• Not yet semi-sync
• Cross DC, multiple data centers
• 5.7, RBR
• Servers with special roles: production replica,
backup, auto-restore, migration-test, analytics, …
• 2-3 tiers of replication
• Occasional cluster split (functional sharding)
• Very dynamic, always changing
18
!
How people build software!
Points of failure
• Master failure, sev1
• Intermediate masters failure
19
!
! !
!
!
!
! !
!
!
How people build software!
orchestrator
• Topology discovery
• Refactoring
• Failovers for masters and intermediate masters
• Open source, Apache 2 license
• github.com/github/orchestrator
20
!
How people build software!
orchestrator failovers @ GitHub
• Automated master & intermediate master failovers
for all clusters.
• On failover, runs GitHub-specific hooks
• Grabbing VIP/DNS
• Updating server role
• Kicking services (e.g. pt-heartbeat)
• Notifying chat
• Running puppet
21
!
How people build software!
Testing cluster
• Dedicated testing cluster in production
• Does not take production traffic
• “load-test” traffic
• Resembles a production topology:
• OS, MySQL Versions
• Data centers
• Server roles
• DNS
• Proxy
• Used for many of our deployment tests
22
!
How people build software!
Failover testing
• Multiple times per day:
• Setup the cluster in desired topology layout
• Inject failure (kill/block/reject)
• Wait, expect recovery
• Check topology:
• Expect new master, correct DNS changes,
replica capacity, …
• Restore old master from backup
• (an implicit backup/restore test)
• “success/failure” event
23
!
How people build software!
Failover in production
• We expect < 30s failover
• Intermediate master failover has low impact on
subset of users, depending on cluster/DC/server
• Master failover implies outage
• Planned master switchover takes a few seconds
24
!
How people build software!
A moment of reflection
25
How people build software!
What builds trust in failovers?
A testing environment?
26
!
How people build software!
Chaos testing in production
• First steps into regular testing
• Manual
• Supported by our peers
• Learning, understanding impact
27
!
How people build software!
Tests that go wrong
• Many things can go wrong
• Corrupt replication
• Invalidated servers
• Unassigned DNS
• Cleanups
28
!
How people build software!
Schema migrations
29
How people build software!
Is your data correct?
The data you see is merely a ghost of your original data
30
!
How people build software!
gh-ost
• Young. 16 months old.
• In production at GitHub since born.
• Software
• Bugs
• Development
• Bugs
31
How people build software!
gh-ost testing
• gh-ost works perfectly well on our data
• Tested, re-tested, and tested again
• Full coverage of production tables
32
How people build software!
gh-ost testing servers
• Dedicated servers that run continuous tests
33
How people build software! 34
!
!
!
#
!
!
production replicas
testing replica
master
!
gh-ost testing replicas
!
!
!
#
!
!
production replicas
testing replica
master
!
How people build software!
gh-ost testing
• Trivial ENGINE=INNODB migration
• Stop replication
• Cut-over, cut-back
• Checksum both tables, compare
• Checksum failure: stop the world, alert
• Success/failure: event
• Drop ghost table
• Catch up
• Next table
35
How people build software!
gh-ost development cycle
• Work on branch

.deploy gh-ost/mybranch to prod/mysql_role=ghost_testing
• Let continuous tests run
• Depending on nature of change, observe hours/days/more.
• Merge
• Tests run regardless of deployed branch
36
How people build software!
Conclusion
• Backup & restore
• Failovers
• Schema migrations
37
How people build software!
Thank you!
Questions?
github.com/ikewalker
@iowalker
38
!

More Related Content

What's hot

Severalnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part IISeveralnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part IISeveralnines
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance TuningFromDual GmbH
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreFilipe Silva
 
MySQL Performance - Best practices
MySQL Performance - Best practices MySQL Performance - Best practices
MySQL Performance - Best practices Ted Wennmark
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMario Beck
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8 Ted Wennmark
 
Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7MarketingArrowECS_CZ
 
MySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesMySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesFromDual GmbH
 
MySQL configuration - The most important Variables
MySQL configuration - The most important VariablesMySQL configuration - The most important Variables
MySQL configuration - The most important VariablesFromDual GmbH
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Fran Navarro
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMonica Li
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageIBM Power Systems
 
Presenta completaoow2013
Presenta completaoow2013Presenta completaoow2013
Presenta completaoow2013Fran Navarro
 

What's hot (20)

Severalnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part IISeveralnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part II
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance Tuning
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
 
MySQL Performance - Best practices
MySQL Performance - Best practices MySQL Performance - Best practices
MySQL Performance - Best practices
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8
 
Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7
 
Super cluster oracleday cl 7
Super cluster oracleday cl 7Super cluster oracleday cl 7
Super cluster oracleday cl 7
 
MySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesMySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architectures
 
MySQL configuration - The most important Variables
MySQL configuration - The most important VariablesMySQL configuration - The most important Variables
MySQL configuration - The most important Variables
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
Security a SPARC M7 CPU
Security a SPARC M7 CPUSecurity a SPARC M7 CPU
Security a SPARC M7 CPU
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major Announcements
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
 
Presenta completaoow2013
Presenta completaoow2013Presenta completaoow2013
Presenta completaoow2013
 

Similar to MySQL Infrastructure Testing Automation at GitHub

OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsNETWAYS
 
Crash reports pycodeconf
Crash reports pycodeconfCrash reports pycodeconf
Crash reports pycodeconflauraxthomson
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the ApplicationAsh Winter
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at ParseMongoDB
 
Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environmentvmaximiuk
 
Tuenti Release Workflow
Tuenti Release WorkflowTuenti Release Workflow
Tuenti Release WorkflowTuenti
 
Ensuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldEnsuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldYelp Engineering
 
5 Common Mistakes You are Making on your Website
 5 Common Mistakes You are Making on your Website 5 Common Mistakes You are Making on your Website
5 Common Mistakes You are Making on your WebsiteAcquia
 
Automatize everything
Automatize everythingAutomatize everything
Automatize everythingBoris Bucha
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC RiversideMichael Kennedy
 
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...Chef
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...
Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...
Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...Blackboard APAC
 
Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)lauraxthomson
 

Similar to MySQL Infrastructure Testing Automation at GitHub (20)

Migrating big data
Migrating big dataMigrating big data
Migrating big data
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy Hawkins
 
Case study
Case studyCase study
Case study
 
Crash reports pycodeconf
Crash reports pycodeconfCrash reports pycodeconf
Crash reports pycodeconf
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the Application
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environment
 
Tuenti Release Workflow
Tuenti Release WorkflowTuenti Release Workflow
Tuenti Release Workflow
 
Ensuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldEnsuring Consistency in a Replicated World
Ensuring Consistency in a Replicated World
 
5 Common Mistakes You are Making on your Website
 5 Common Mistakes You are Making on your Website 5 Common Mistakes You are Making on your Website
5 Common Mistakes You are Making on your Website
 
Dibi Conference 2012
Dibi Conference 2012Dibi Conference 2012
Dibi Conference 2012
 
Automatize everything
Automatize everythingAutomatize everything
Automatize everything
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC Riverside
 
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...
Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...
Machine Data to Readable Reports - System Monitoring, Alerting and Reporting ...
 
Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

MySQL Infrastructure Testing Automation at GitHub

  • 1. How people build software! MySQL Infrastructure Testing Automation 
 @ GitHub IkeWalker GitHub Boston MySQL Meetup December 11, 2017 1 !
  • 2. How people build software! Agenda • About • MySQL @ GitHub • Automation • Backup/restores • Failovers • Schema migrations 2 !
  • 3. How people build software! About me • Database Architect • Working with MySQL since 2006 • Organizer of Boston MySQL Meetup github.com/ikewalker @iowalker 3 !
  • 4. How people build software! 4 • The world’s largest Octocat T-shirt and stickers store • And water bottles • And hoodies • We also do stuff related to things • Word is new swag is coming up GitHub
  • 5. How people build software! GitHub • 66M repositories • 24M developers • 117K businesses • More than a million teams • World’s largest open source hosting • Alexa top 100 • Critical path in build flows 5 !
  • 6. How people build software! MySQL at GitHub • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata: • Repository metadata, users, issues, pull requests, comments etc. • Website/API/Auth/more all use MySQL. • We run a few (growing number of) clusters, totaling over 100 MySQL servers. • The setup isn’t very large but very busy. 6 !
  • 7. How people build software! MySQL at GitHub • Our MySQL servers must be available, responsive and in good state • GitHub has 99.95% SLA • Availability issues must be handled quickly, as automatically as possible. 7 !
  • 8. How people build software! github/database-infrastructure • @ggunson, @jessbreckenridge, @jonahberquist, @shlomi-noach, @tomkrouper, @gtowey • Concerned with: • Data availability • Data integrity 8 !
  • 9. How people build software! Testing 9 !
  • 10. How people build software! Backups/restores that ^ 10
  • 11. How people build software! Your data It’s important 11 !
  • 12. How people build software! Restores • Dedicated restore servers. • One per cluster. • Continuously restores, catches up with replication, restores, catches up with replication, restores, … • Sending a “success” event at the end of each cycle. • We monitor for number of “success” events in past 24-ish hours, per cluster. 12 !
  • 13. How people build software! 13 ! ! ! ! ! production replicas auto-restore replica master ! auto-restore replicas """""" backup replica
  • 14. How people build software! Restores • New host provisioning uses same flow as restore. • A human may kick a restore/reclone manually. • Chatops: 
 .mysql backup-restore -H restore.this.host -r 14 !
  • 15. How people build software! Restore failure • A specific backup/restore may fail because computers. • No reason for panic. • Previous backup/restores proven to be working • At most we lose time • Two sequential failures, or failures across clusters are incidents to be investigated 15 !
  • 16. How people build software! Restore: delayed replica • One delayed replica per cluster • Lagging at 4 hours • Chatops: .mysql panic 16 !
  • 17. How people build software! Failovers ^ that, too 17
  • 18. How people build software! MySQL setup @ GitHub • Plain-old single writer master-replicas asynchronous replication. • Not yet semi-sync • Cross DC, multiple data centers • 5.7, RBR • Servers with special roles: production replica, backup, auto-restore, migration-test, analytics, … • 2-3 tiers of replication • Occasional cluster split (functional sharding) • Very dynamic, always changing 18 !
  • 19. How people build software! Points of failure • Master failure, sev1 • Intermediate masters failure 19 ! ! ! ! ! ! ! ! ! !
  • 20. How people build software! orchestrator • Topology discovery • Refactoring • Failovers for masters and intermediate masters • Open source, Apache 2 license • github.com/github/orchestrator 20 !
  • 21. How people build software! orchestrator failovers @ GitHub • Automated master & intermediate master failovers for all clusters. • On failover, runs GitHub-specific hooks • Grabbing VIP/DNS • Updating server role • Kicking services (e.g. pt-heartbeat) • Notifying chat • Running puppet 21 !
  • 22. How people build software! Testing cluster • Dedicated testing cluster in production • Does not take production traffic • “load-test” traffic • Resembles a production topology: • OS, MySQL Versions • Data centers • Server roles • DNS • Proxy • Used for many of our deployment tests 22 !
  • 23. How people build software! Failover testing • Multiple times per day: • Setup the cluster in desired topology layout • Inject failure (kill/block/reject) • Wait, expect recovery • Check topology: • Expect new master, correct DNS changes, replica capacity, … • Restore old master from backup • (an implicit backup/restore test) • “success/failure” event 23 !
  • 24. How people build software! Failover in production • We expect < 30s failover • Intermediate master failover has low impact on subset of users, depending on cluster/DC/server • Master failover implies outage • Planned master switchover takes a few seconds 24 !
  • 25. How people build software! A moment of reflection 25
  • 26. How people build software! What builds trust in failovers? A testing environment? 26 !
  • 27. How people build software! Chaos testing in production • First steps into regular testing • Manual • Supported by our peers • Learning, understanding impact 27 !
  • 28. How people build software! Tests that go wrong • Many things can go wrong • Corrupt replication • Invalidated servers • Unassigned DNS • Cleanups 28 !
  • 29. How people build software! Schema migrations 29
  • 30. How people build software! Is your data correct? The data you see is merely a ghost of your original data 30 !
  • 31. How people build software! gh-ost • Young. 16 months old. • In production at GitHub since born. • Software • Bugs • Development • Bugs 31
  • 32. How people build software! gh-ost testing • gh-ost works perfectly well on our data • Tested, re-tested, and tested again • Full coverage of production tables 32
  • 33. How people build software! gh-ost testing servers • Dedicated servers that run continuous tests 33
  • 34. How people build software! 34 ! ! ! # ! ! production replicas testing replica master ! gh-ost testing replicas ! ! ! # ! ! production replicas testing replica master !
  • 35. How people build software! gh-ost testing • Trivial ENGINE=INNODB migration • Stop replication • Cut-over, cut-back • Checksum both tables, compare • Checksum failure: stop the world, alert • Success/failure: event • Drop ghost table • Catch up • Next table 35
  • 36. How people build software! gh-ost development cycle • Work on branch
 .deploy gh-ost/mybranch to prod/mysql_role=ghost_testing • Let continuous tests run • Depending on nature of change, observe hours/days/more. • Merge • Tests run regardless of deployed branch 36
  • 37. How people build software! Conclusion • Backup & restore • Failovers • Schema migrations 37
  • 38. How people build software! Thank you! Questions? github.com/ikewalker @iowalker 38 !