Your SlideShare is downloading. ×
[INSIGHT OUT 2011] A25 2 TB highly available mysql solution(alex)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

[INSIGHT OUT 2011] A25 2 TB highly available mysql solution(alex)

523
views

Published on

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
523
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Building 2TB Highly AvailableMySQL DatabaseAlex GorbachevInsight-Out Database SymposiumTokyo, 2011
  • 2. Alex Gorbachev • CTO, The Pythian Group • Blogger • OakTable Network member • Oracle ACE Director • BattleAgainstAnyGuess.com • President, Oracle RAC SIG2 © 2009/2010 Pythian
  • 3. Why Companies Trust Pythian • Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments • Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. • Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response38 © 2011 Pythian
  • 4. Agenda • Migration • Schema, data, application code • HA infrastructure • Options available • Implemented - Heartbeat cold failover cluster • Acceptance testing • How we simulated failures • DR setup & backups • Replication between two data-centers • 2 TB on MySQL - that’s not a simple e-commerce web-site4 © 2009/2010 Pythian
  • 5. Project profile • Document management solution • Archival & retrieve • Web front-end • Critical availability requirements • 1 TB 2 years ago, grown to 2+ TB by now5 © 2009/2010 Pythian
  • 6. Migration from Oracle RDB • MySQL Migration Toolkit • RDB has a package to connect via Oracle TNS • Java • Create and review schema • Pump the data (1TB)6 © 2009/2010 Pythian
  • 7. Schema conversion • Integer sizes mismatch - smallint, mediumint, decimal(10.2), etc... • DATE VMS => DATE or DATETIME • MEDIUMBLOB + LONGBLOB • no DEFERRABLE constraints in MySQL • character set / VARCHAR behavior (trailing space) • Sequences => AUTO INCREMENT • InnoDB storage => file per table • want Oracle tablespaces there! • page size 16 KB • No stored procedures and modules conversion7 © 2009/2010 Pythian
  • 8. 1 TB data move • ARCHIVE part • Separate and load in advance - 800 GB • LIVE part • 200 GB - 30 hours • MySQL migration toolkit • agent mode to speed up data transfer • Speeding up • Disable binlogs • Build indexes and constraints later • Our bottleneck - single threaded MySQL Migration Toolkit8 © 2009/2010 Pythian
  • 9. Hardware • Primary data-center • 2 x IBM x3850 Servers • Each in different chassis • 4 quad core Intel XEON E7330, 2.4 GHz • 16 GB RAM • Storage IBM DS4700 Express Model 72 • Fiber-channel • RAID5 with 6 300GB disks +spare = 1.5 TB • DR data-center • 1 x IBM x3850 Servers • Same storage9 © 2009/2010 Pythian
  • 10. Primary DC HA: Options • MySQL replication • - Can loose some data (seconds), not reliable • - Double storage requirements • + potential to scale out • DRBD replication • - Performance impact in SYNC mode • - Double storage requirements • - no scale out (primary + mirror only) • + reliable • Third-party replication • - additional cost and additional vendor • + more reliable than standard replication10 © 2009/2010 Pythian
  • 11. Primary DC HA: cold failover cluster • Heartbeat controls resources • Shared storage • LUN’s accessible from two servers • ext3 - mounted on active node *only* • no LVM - LVM is not clustered • Virtual IP / VIP • Up only on one node • MySQL 5.0.67 instance is running on active node • read-write data - must be InnoDB • read-only data - can be MyISAM11 © 2009/2010 Pythian
  • 12. Heartbeat - simple clustering solution • Linux-HA.org12 © 2009/2010 Pythian
  • 13. Heartbeat and network infrastructure Chassis 1 Chassis 2 Switch 3 Switch 4 Single NIC used Management Switch Single NIC used Data Data RSA RSA Port Port Management Port Management Port Crossover DB9 Female HA Backup - RS485 HA – CAT5 Database Server 1 Single NIC crossover Database Server 213 © 2009/2010 Pythian
  • 14. Heartbeat and network infrastructure • Private heartbeat network • Cross-over ethernet patch-cord • ++ Simple $100 switch - works great • --- Expensive switch and VLAN - no good • Serial link heartbeat • Redundant to ethernet • Access to RSA2 cards • Remote reset and remote power off / lights-out • Dedicated management network and management switches14 © 2009/2010 Pythian
  • 15. Shared storage setup • Linux multipathing MPIO • 2 HBA’s per server • 2 controllers on SAN box • Added the 2nd SAN box (cheap SATA disks) • errors=panic in mount options • default is make it read-only • SANLUN’s visible from both nodes • NEVER MOUNT FILESYSTEM ON BOTH NODES!!! • ext3 is not clustered15 © 2009/2010 Pythian
  • 16. Heartbeat and monitoring • Heartbeat 1.0 • Starts and stops resources in sequence • Failure detected during start • No resources monitoring - required Heartbeat 2.0 • Not sure if 2.0 is stable enough • mon 1.2.0 Service Monitoring Daemon • mon.wiki.kernel.org • Stable • Has number of “monitors” out-of-the-box • Can write custom monitors16 © 2009/2010 Pythian
  • 17. Heartbeat resources Start sequence (stop is reverse) 1. Virtual / floating IP 2. SAN mount points 3. MySQL daemon / instance 4. mon 5. mon-shadow mon monitors all resource and initiates a failover mon-shadow monitors and restarts mon only mon monitors and restarts mon-shadow17 © 2009/2010 Pythian
  • 18. “mon” monitors • msql-mysql.monitor • fping.monitor • freespace.monitor custom mount point monitor • mon.monitor On resource failure - goes to standby role. Other potential options - stop heartbeat or reboot or reset.18 © 2009/2010 Pythian
  • 19. Improving failover • innodb_max_dirty_pages_pct=5 in my.cnf • service_startup_timeout=60 in /etc/init.d/mysql • Heartbeat resource manager retries offline 10 times • /usr/lib64/heartbeat/ResourceManager => ${HA_STOPRETRYMAX=10} • Changed to one • mysql.pid- don’t place it on shared storage • mon didn’t have timeout functionality • Hacked the perl script and added timeout19 © 2009/2010 Pythian
  • 20. Other gotchas • Standard MySQL monitor improvement • Added insert/delete from a dummy table • Standard /etc/init.d/mysql is not POSIX compliant • mysql start returns error when MySQL is already up • mysql stop returns error when MySQL is already down • SELINUX=disabled • innodb-flush-method= O_DSYNC or O_DIRECT • ibmrsa-telnet STONITH plug-in has a bug • http://lists.community.tummy.com/pipermail/linux-ha/2008-June/ 033279.html • Heartbeat’s test suite - BasicSanityCheck20 © 2009/2010 Pythian
  • 21. Acceptance testing - 42 individual tests (1) • Node down • power-off, halt command, cpu overload • Network tests • (ifconfig) -Heartbeat NIC down, app NIC down, management NIC down • spam serial link - cat /dev/zero >/dev/ttyS0 • pulling heartbeat cables - one at a time and together • Storage tests • freeze IO - dmsetup suspend --noflush lunmultipathproddb-01 • pull cables (one HBA and both HBA ports) • mess up mount points between two servers21 © 2009/2010 Pythian
  • 22. Acceptance testing - 42 individual tests (2) • MySQL daemon test • MySQL dies - kill -9 {mysqld_pid} {mysql_safe_pid} • MySQL hangs - kill -STOP {mysqld_pid} • MySQL can’t connect (max connections) • “mon” tests • kill -9, kill -STOP, manual start on wrong node (including shadow) • Heartbeat • kill -9, kill -STOP • Stopping and starting • Graceful switchover between the nodes22 © 2009/2010 Pythian
  • 23. • Split into LIVE and ARCHIVE Backup infrastructure • LIVE - InnoDB 200-500GB • ARCHIVE - MyISAM 2 TB • ARCHIVE backup - production • can lock + rsync • no LVM => no snapshot • storage snapshot is expensive • LIVE backup - on slave • FLUSH ... WITH READ LOCK • Stop slave SQL thread • LVM snapshot or RSYNC • Restore • LIVE first as a whole instance • ARCHIVE later - it’s MyISAM23 © 2009/2010 Pythian
  • 24. Disaster recovery infrastructure24 © 2009/2010 Pythian
  • 25. Where are we 3 years after migration? • Data size grown to 2+ TB • HB Cluster saved out behind number of times • Various system failure • failover takes only 2-3 minutes • Several times switched over to DR • Planned power outages and other maintenance • HB Cluster helped a lot with maintenance • OS patching - switchover takes tens of seconds • Recovery has been verified and tested • Plans? • MySQL 5.5, RedHat Cluster Suite or HB 2.0 (consolidate other DBs)25 © 2009/2010 Pythian
  • 26. Q&A Please fill in your evaluations! Email me - gorbachev@pythian.com Read my blog - http://www.pythian.com Follow me on Twitter - @AlexGorbachev Join Pythian fan club on Facebook & LinkedIn26 © 2009/2010 Pythian