Building 2TB Highly Available
MySQL Database
Alex Gorbachev


Insight-Out Database Symposium
Tokyo, 2011
Alex Gorbachev

    • CTO,  The Pythian Group
    • Blogger

    • OakTable Network member
    • Oracle ACE Director

    • BattleAgainstAnyGuess.com

    • President, Oracle RAC SIG




2                           © 2009/2010 Pythian
Why Companies Trust Pythian
    • Recognized Leader:
    •   Global industry-leader in remote database administration services and consulting for Oracle,
        Oracle Applications, MySQL and SQL Server
    •   Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and
        MDS Inc. to help manage their complex IT deployments

    • Expertise:
    •   One of the world’s largest concentrations of dedicated, full-time DBA expertise.

    • Global Reach & Scalability:
    •   24/7/365 global remote support for DBA and consulting, systems administration, special
        projects or emergency response




3
8                                               © 2011 Pythian
Agenda

    • Migration
     •   Schema, data, application code
    • HA   infrastructure
     •   Options available
     •   Implemented - Heartbeat cold failover cluster
    • Acceptance      testing
     •   How we simulated failures
    • DR   setup & backups
     •   Replication between two data-centers
     •   2 TB on MySQL - that’s not a simple e-commerce web-site


4                                    © 2009/2010 Pythian
Project profile

    • Document      management solution
     •   Archival & retrieve
    • Web   front-end
    • Critical availability requirements

    • 1 TB 2 years ago, grown to 2+ TB by now




5                               © 2009/2010 Pythian
Migration from Oracle RDB

    • MySQL     Migration Toolkit
     •   RDB has a package to connect via Oracle TNS
     •   Java


    • Create     and review schema

    • Pump      the data (1TB)




6                                   © 2009/2010 Pythian
Schema conversion

    •   Integer sizes mismatch - smallint, mediumint, decimal(10.2), etc...
    •   DATE VMS => DATE or DATETIME
    •   MEDIUMBLOB + LONGBLOB
    •   no DEFERRABLE constraints in MySQL
    •   character set / VARCHAR behavior (trailing space)
    •   Sequences => AUTO INCREMENT
    •   InnoDB storage => file per table
        •   want Oracle tablespaces there!
        •   page size 16 KB
    •   No stored procedures and modules conversion


7                                       © 2009/2010 Pythian
1 TB data move

    • ARCHIVE       part
     •   Separate and load in advance - 800 GB
    • LIVE   part
     •   200 GB - 30 hours
    • MySQL     migration toolkit
     •   agent mode to speed up data transfer
    • Speeding      up
     •   Disable binlogs
     •   Build indexes and constraints later
    • Our    bottleneck - single threaded MySQL Migration Toolkit

8                                    © 2009/2010 Pythian
Hardware

    • Primary        data-center
     •   2 x IBM x3850 Servers
         •   Each in different chassis
         •   4 quad core Intel XEON E7330, 2.4 GHz
         •   16 GB RAM
     •   Storage IBM DS4700 Express Model 72
         •   Fiber-channel
         •   RAID5 with 6 300GB disks +spare = 1.5 TB
    • DR      data-center
     •   1 x IBM x3850 Servers
     •   Same storage

9                                        © 2009/2010 Pythian
Primary DC HA: Options
 • MySQL        replication
     •   - Can loose some data (seconds), not reliable
     •   - Double storage requirements
     •   + potential to scale out
 • DRBD        replication
     •   - Performance impact in SYNC mode
     •   - Double storage requirements
     •   - no scale out (primary + mirror only)
     •   + reliable
 • Third-party        replication
     •   - additional cost and additional vendor
     •   + more reliable than standard replication

10                                    © 2009/2010 Pythian
Primary DC HA: cold failover cluster

     • Heartbeat controls resources
     • Shared storage
      •   LUN’s accessible from two servers
      •   ext3 - mounted on active node *only*
      •   no LVM - LVM is not clustered
     • Virtual   IP / VIP
      •   Up only on one node
     • MySQL     5.0.67 instance is running on active node
      •   read-write data - must be InnoDB
      •   read-only data - can be MyISAM


11                                    © 2009/2010 Pythian
Heartbeat - simple clustering solution

     • Linux-HA.org




12                         © 2009/2010 Pythian
Heartbeat and network infrastructure
          Chassis 1                                                                   Chassis 2

                              Switch 3                                                                   Switch 4




         Single NIC used

                                                                Management Switch                                    Single NIC used

                             Data

                                                                                                              Data
                                    RSA                                                           RSA
                                    Port                                                          Port

                                           Management
                                              Port                                   Management
                                                                                        Port

                                                        Crossover DB9
                                                               Female

                                                                HA Backup - RS485

                                                                        HA – CAT5


                           Database Server 1                            Single NIC
                                                                        crossover       Database Server 2




13                                                             © 2009/2010 Pythian
Heartbeat and network infrastructure

     • Private   heartbeat network
      •   Cross-over ethernet patch-cord
      •   ++ Simple $100 switch - works great
      •   --- Expensive switch and VLAN - no good
     • Serial   link heartbeat
      •   Redundant to ethernet
     • Access    to RSA2 cards
      •   Remote reset and remote power off / lights-out
      •   Dedicated management network and management switches




14                                   © 2009/2010 Pythian
Shared storage setup

     • Linux   multipathing MPIO
      •   2 HBA’s per server
      •   2 controllers on SAN box
     • Added  the 2nd SAN box (cheap SATA disks)
     • errors=panic in mount options
      •   default is make it read-only
     • SANLUN’s visible from both nodes
     • NEVER MOUNT FILESYSTEM ON BOTH NODES!!!
      •   ext3 is not clustered




15                                       © 2009/2010 Pythian
Heartbeat and monitoring

     • Heartbeat        1.0
     •   Starts and stops resources in sequence
     •   Failure detected during start
     •   No resources monitoring - required Heartbeat 2.0
         •   Not sure if 2.0 is stable enough
     • mon      1.2.0 Service Monitoring Daemon
     •   mon.wiki.kernel.org
     •   Stable
     •   Has number of “monitors” out-of-the-box
     •   Can write custom monitors


16                                         © 2009/2010 Pythian
Heartbeat resources

          Start sequence (stop is reverse)
     1.   Virtual / floating IP
     2.   SAN mount points
     3.   MySQL daemon / instance
     4.   mon
     5.   mon-shadow


          mon monitors all resource and initiates a failover
          mon-shadow monitors and restarts mon only
          mon monitors and restarts mon-shadow

17                              © 2009/2010 Pythian
“mon” monitors

     • msql-mysql.monitor

     • fping.monitor

     • freespace.monitor    custom mount point monitor
     • mon.monitor



      On resource failure - goes to standby role.
      Other potential options - stop heartbeat or reboot or
      reset.




18                              © 2009/2010 Pythian
Improving failover

     • innodb_max_dirty_pages_pct=5   in my.cnf
     • service_startup_timeout=60 in /etc/init.d/mysql

     • Heartbeat resource manager retries offline 10 times
      •   /usr/lib64/heartbeat/ResourceManager => ${HA_STOPRETRYMAX=10}

      •   Changed to one
     • mysql.pid- don’t place it on shared storage
     • mon didn’t have timeout functionality
      •   Hacked the perl script and added timeout




19                                      © 2009/2010 Pythian
Other gotchas

     • Standard    MySQL monitor improvement
      •   Added insert/delete from a dummy table
     • Standard    /etc/init.d/mysql is not POSIX compliant
      •   mysql start returns error when MySQL is already up
      •   mysql stop returns error when MySQL is already down
     • SELINUX=disabled

     • innodb-flush-method= O_DSYNC or O_DIRECT
     • ibmrsa-telnet STONITH plug-in has a bug
      •   http://lists.community.tummy.com/pipermail/linux-ha/2008-June/
          033279.html
     • Heartbeat’s     test suite - BasicSanityCheck
20                                   © 2009/2010 Pythian
Acceptance testing - 42 individual tests (1)

     • Node    down
      •   power-off, halt command, cpu overload
     • Network     tests
      •   (ifconfig) -Heartbeat NIC down, app NIC down, management NIC
          down
      •   spam serial link - cat /dev/zero >/dev/ttyS0
      •   pulling heartbeat cables - one at a time and together
     • Storage    tests
      •   freeze IO - dmsetup suspend --noflush lunmultipathproddb-01
      •   pull cables (one HBA and both HBA ports)
      •   mess up mount points between two servers

21                                    © 2009/2010 Pythian
Acceptance testing - 42 individual tests (2)

     • MySQL    daemon test
     •   MySQL dies - kill -9 {mysqld_pid} {mysql_safe_pid}
     •   MySQL hangs - kill -STOP {mysqld_pid}
     •   MySQL can’t connect (max connections)
     • “mon”     tests
     •   kill -9, kill -STOP, manual start on wrong node (including shadow)
     • Heartbeat
     •   kill -9, kill -STOP
     •   Stopping and starting
     •   Graceful switchover between the nodes


22                                   © 2009/2010 Pythian
• Split    into LIVE and ARCHIVE
     Backup infrastructure              •   LIVE - InnoDB 200-500GB
                                        •   ARCHIVE - MyISAM 2 TB
                                      • ARCHIVE      backup - production
                                        •   can lock + rsync
                                        •   no LVM => no snapshot
                                        •   storage snapshot is expensive
                                      • LIVE    backup - on slave
                                        •   FLUSH ... WITH READ LOCK
                                        •   Stop slave SQL thread
                                        •   LVM snapshot or RSYNC
                                      • Restore
                                        •   LIVE first as a whole instance
                                        •   ARCHIVE later - it’s MyISAM
23                       © 2009/2010 Pythian
Disaster recovery infrastructure




24                        © 2009/2010 Pythian
Where are we 3 years after migration?
     • Data size grown to 2+ TB
     • HB Cluster saved out behind number of times
      •   Various system failure
      •   failover takes only 2-3 minutes
     • Several    times switched over to DR
      •   Planned power outages and other maintenance
     • HB   Cluster helped a lot with maintenance
      •   OS patching - switchover takes tens of seconds
     • Recovery     has been verified and tested
     • Plans?
      •   MySQL 5.5, RedHat Cluster Suite or HB 2.0 (consolidate other DBs)


25                                    © 2009/2010 Pythian
Q&A

     Please fill in your evaluations!




     Email me - gorbachev@pythian.com
     Read my blog - http://www.pythian.com
     Follow me on Twitter - @AlexGorbachev
     Join Pythian fan club on Facebook & LinkedIn

26                             © 2009/2010 Pythian

[INSIGHT OUT 2011] A25 2 TB highly available mysql solution(alex)

  • 1.
    Building 2TB HighlyAvailable MySQL Database Alex Gorbachev Insight-Out Database Symposium Tokyo, 2011
  • 2.
    Alex Gorbachev • CTO, The Pythian Group • Blogger • OakTable Network member • Oracle ACE Director • BattleAgainstAnyGuess.com • President, Oracle RAC SIG 2 © 2009/2010 Pythian
  • 3.
    Why Companies TrustPythian • Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments • Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. • Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response 3 8 © 2011 Pythian
  • 4.
    Agenda • Migration • Schema, data, application code • HA infrastructure • Options available • Implemented - Heartbeat cold failover cluster • Acceptance testing • How we simulated failures • DR setup & backups • Replication between two data-centers • 2 TB on MySQL - that’s not a simple e-commerce web-site 4 © 2009/2010 Pythian
  • 5.
    Project profile • Document management solution • Archival & retrieve • Web front-end • Critical availability requirements • 1 TB 2 years ago, grown to 2+ TB by now 5 © 2009/2010 Pythian
  • 6.
    Migration from OracleRDB • MySQL Migration Toolkit • RDB has a package to connect via Oracle TNS • Java • Create and review schema • Pump the data (1TB) 6 © 2009/2010 Pythian
  • 7.
    Schema conversion • Integer sizes mismatch - smallint, mediumint, decimal(10.2), etc... • DATE VMS => DATE or DATETIME • MEDIUMBLOB + LONGBLOB • no DEFERRABLE constraints in MySQL • character set / VARCHAR behavior (trailing space) • Sequences => AUTO INCREMENT • InnoDB storage => file per table • want Oracle tablespaces there! • page size 16 KB • No stored procedures and modules conversion 7 © 2009/2010 Pythian
  • 8.
    1 TB datamove • ARCHIVE part • Separate and load in advance - 800 GB • LIVE part • 200 GB - 30 hours • MySQL migration toolkit • agent mode to speed up data transfer • Speeding up • Disable binlogs • Build indexes and constraints later • Our bottleneck - single threaded MySQL Migration Toolkit 8 © 2009/2010 Pythian
  • 9.
    Hardware • Primary data-center • 2 x IBM x3850 Servers • Each in different chassis • 4 quad core Intel XEON E7330, 2.4 GHz • 16 GB RAM • Storage IBM DS4700 Express Model 72 • Fiber-channel • RAID5 with 6 300GB disks +spare = 1.5 TB • DR data-center • 1 x IBM x3850 Servers • Same storage 9 © 2009/2010 Pythian
  • 10.
    Primary DC HA:Options • MySQL replication • - Can loose some data (seconds), not reliable • - Double storage requirements • + potential to scale out • DRBD replication • - Performance impact in SYNC mode • - Double storage requirements • - no scale out (primary + mirror only) • + reliable • Third-party replication • - additional cost and additional vendor • + more reliable than standard replication 10 © 2009/2010 Pythian
  • 11.
    Primary DC HA:cold failover cluster • Heartbeat controls resources • Shared storage • LUN’s accessible from two servers • ext3 - mounted on active node *only* • no LVM - LVM is not clustered • Virtual IP / VIP • Up only on one node • MySQL 5.0.67 instance is running on active node • read-write data - must be InnoDB • read-only data - can be MyISAM 11 © 2009/2010 Pythian
  • 12.
    Heartbeat - simpleclustering solution • Linux-HA.org 12 © 2009/2010 Pythian
  • 13.
    Heartbeat and networkinfrastructure Chassis 1 Chassis 2 Switch 3 Switch 4 Single NIC used Management Switch Single NIC used Data Data RSA RSA Port Port Management Port Management Port Crossover DB9 Female HA Backup - RS485 HA – CAT5 Database Server 1 Single NIC crossover Database Server 2 13 © 2009/2010 Pythian
  • 14.
    Heartbeat and networkinfrastructure • Private heartbeat network • Cross-over ethernet patch-cord • ++ Simple $100 switch - works great • --- Expensive switch and VLAN - no good • Serial link heartbeat • Redundant to ethernet • Access to RSA2 cards • Remote reset and remote power off / lights-out • Dedicated management network and management switches 14 © 2009/2010 Pythian
  • 15.
    Shared storage setup • Linux multipathing MPIO • 2 HBA’s per server • 2 controllers on SAN box • Added the 2nd SAN box (cheap SATA disks) • errors=panic in mount options • default is make it read-only • SANLUN’s visible from both nodes • NEVER MOUNT FILESYSTEM ON BOTH NODES!!! • ext3 is not clustered 15 © 2009/2010 Pythian
  • 16.
    Heartbeat and monitoring • Heartbeat 1.0 • Starts and stops resources in sequence • Failure detected during start • No resources monitoring - required Heartbeat 2.0 • Not sure if 2.0 is stable enough • mon 1.2.0 Service Monitoring Daemon • mon.wiki.kernel.org • Stable • Has number of “monitors” out-of-the-box • Can write custom monitors 16 © 2009/2010 Pythian
  • 17.
    Heartbeat resources Start sequence (stop is reverse) 1. Virtual / floating IP 2. SAN mount points 3. MySQL daemon / instance 4. mon 5. mon-shadow mon monitors all resource and initiates a failover mon-shadow monitors and restarts mon only mon monitors and restarts mon-shadow 17 © 2009/2010 Pythian
  • 18.
    “mon” monitors • msql-mysql.monitor • fping.monitor • freespace.monitor custom mount point monitor • mon.monitor On resource failure - goes to standby role. Other potential options - stop heartbeat or reboot or reset. 18 © 2009/2010 Pythian
  • 19.
    Improving failover • innodb_max_dirty_pages_pct=5 in my.cnf • service_startup_timeout=60 in /etc/init.d/mysql • Heartbeat resource manager retries offline 10 times • /usr/lib64/heartbeat/ResourceManager => ${HA_STOPRETRYMAX=10} • Changed to one • mysql.pid- don’t place it on shared storage • mon didn’t have timeout functionality • Hacked the perl script and added timeout 19 © 2009/2010 Pythian
  • 20.
    Other gotchas • Standard MySQL monitor improvement • Added insert/delete from a dummy table • Standard /etc/init.d/mysql is not POSIX compliant • mysql start returns error when MySQL is already up • mysql stop returns error when MySQL is already down • SELINUX=disabled • innodb-flush-method= O_DSYNC or O_DIRECT • ibmrsa-telnet STONITH plug-in has a bug • http://lists.community.tummy.com/pipermail/linux-ha/2008-June/ 033279.html • Heartbeat’s test suite - BasicSanityCheck 20 © 2009/2010 Pythian
  • 21.
    Acceptance testing -42 individual tests (1) • Node down • power-off, halt command, cpu overload • Network tests • (ifconfig) -Heartbeat NIC down, app NIC down, management NIC down • spam serial link - cat /dev/zero >/dev/ttyS0 • pulling heartbeat cables - one at a time and together • Storage tests • freeze IO - dmsetup suspend --noflush lunmultipathproddb-01 • pull cables (one HBA and both HBA ports) • mess up mount points between two servers 21 © 2009/2010 Pythian
  • 22.
    Acceptance testing -42 individual tests (2) • MySQL daemon test • MySQL dies - kill -9 {mysqld_pid} {mysql_safe_pid} • MySQL hangs - kill -STOP {mysqld_pid} • MySQL can’t connect (max connections) • “mon” tests • kill -9, kill -STOP, manual start on wrong node (including shadow) • Heartbeat • kill -9, kill -STOP • Stopping and starting • Graceful switchover between the nodes 22 © 2009/2010 Pythian
  • 23.
    • Split into LIVE and ARCHIVE Backup infrastructure • LIVE - InnoDB 200-500GB • ARCHIVE - MyISAM 2 TB • ARCHIVE backup - production • can lock + rsync • no LVM => no snapshot • storage snapshot is expensive • LIVE backup - on slave • FLUSH ... WITH READ LOCK • Stop slave SQL thread • LVM snapshot or RSYNC • Restore • LIVE first as a whole instance • ARCHIVE later - it’s MyISAM 23 © 2009/2010 Pythian
  • 24.
  • 25.
    Where are we3 years after migration? • Data size grown to 2+ TB • HB Cluster saved out behind number of times • Various system failure • failover takes only 2-3 minutes • Several times switched over to DR • Planned power outages and other maintenance • HB Cluster helped a lot with maintenance • OS patching - switchover takes tens of seconds • Recovery has been verified and tested • Plans? • MySQL 5.5, RedHat Cluster Suite or HB 2.0 (consolidate other DBs) 25 © 2009/2010 Pythian
  • 26.
    Q&A Please fill in your evaluations! Email me - gorbachev@pythian.com Read my blog - http://www.pythian.com Follow me on Twitter - @AlexGorbachev Join Pythian fan club on Facebook & LinkedIn 26 © 2009/2010 Pythian