Scaling my sql_in_3d


Published on

Different organizations mean different things when they talk about scaling. Sarah will offer some tips about a few different ways that this term is thrown around for MySQL databases. Each different dimension – data volume, read volume, and write volume – present different challenges to the operations and development staff working with the system.

Published in: Technology, Sports
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Relational data bases are good at storing lots of small bits of information related to lots of other small bits of information. MySQL is particularly good at fast reads , can be made to do fast writes and is a lovely store for large datasets
  • API SHOULD KNOW THE DIFF proxy as an option if API not possible. scale writes. write own interfaces? check on it. btree index larger than memory -- writes get _slow_ Memcached – good for short lived data and caching
  • Consistently inno now. Myisam still has some good uses. Append only non volatile many reads. Volume manager
  • Large objects I personally say shouldn’t be in a database. They can be handled more gracefully in a key value store like a filesystem… to keep them in a database, keep them as a pointer to the filesystem. Lots of other options in the nosql space
  • Volume managers. Volume managers Volume managers
  • Developers and operations need to be able to try things. hard to justify test databases with all the storage Rapidly stand up a slave and try
  • operationally this is a great big cudgel, but often cheaper than lots of staff of consulting time
  • innodb evolving. myisam is not being developed. there are specific cases where myisam is faster… single row retrieval and full table scans in a single table… mark callahan had a good post about this back in march. recent innodb plugin has data packing
  • cpu tradeoff on compressing data, but worth it for io myisam has compression. innodb has it in the latest plugin retrieve only the number of rows you need.
  • innodb clusters the data around primary keys so retrieval is less expensive. decrease fragmentation by alter table tablename ENGINE=innodb will reorder data by primary key. alter tables can be expensive by locking.
  • If you’re worried about swapping, use huge page support for mysql (linux doesn’t swap out huge pages)
  • operationally this is a great big cudgel, but often cheaper than lots of staff of consulting time
  • journaled filesystmes and write back caching. make sure you’re optimizing for write speeds as well as recoverabilty. benchmark
  • The size of the table slows down the insertion of indexes by log N , assuming B-tree indexes. trx_commit -> still flush every second but there is risk of data loss cfq completely fair queueing deadline scheduler InnoDB read-ahead
  • •❑ disk•❑memory•❑up/down•❑cachehit -- baseline defiinition•❑replication •❑ mktablesync•❑mkchecksum
  • Scaling my sql_in_3d

    1. 1. scaling MySQL in 3d sarah novotny – [email_address] open databases and LAMP services www .BlueGecko . net
    2. 2. <ul><li>large datasets </li></ul><ul><li>high volume reads </li></ul><ul><li>high volume writes </li></ul>www .BlueGecko . net
    3. 3. <ul><li>things you’ve heard about scale </li></ul><ul><li>write 1 / read many </li></ul><ul><li>partitioning / sharding </li></ul><ul><li>multimaster / rings </li></ul><ul><li>memcached / nosql </li></ul>www .BlueGecko . net
    4. 4. <ul><li>storage choices </li></ul><ul><li>engine options </li></ul><ul><li>storage engine </li></ul><ul><li>filesystem </li></ul><ul><li>volume manager </li></ul><ul><li>hardware </li></ul>www .BlueGecko . net
    5. 5. <ul><li>large datasets </li></ul><ul><li>large objects </li></ul><ul><li>many rows </li></ul>www .BlueGecko . net
    6. 6. <ul><li>storage flexibility, reliability, clone-ability </li></ul>www .BlueGecko . net
    7. 7. www .BlueGecko . net
    8. 8. <ul><li>high volume reads </li></ul><ul><li>more memory </li></ul><ul><li>fast disks </li></ul><ul><li>more memory </li></ul>www .BlueGecko . net
    9. 9. www .BlueGecko . net myisam vs innodb
    10. 10. www .BlueGecko . net not to be obvious, but -- read less data! compress data (if you can) don’t use limit
    11. 11. <ul><li>use thoughtful primary keys </li></ul>www .BlueGecko . net
    12. 12. <ul><li>a </li></ul><ul><li>short </li></ul><ul><li>diversion </li></ul><ul><li>to swap or </li></ul><ul><li>not to swap </li></ul><ul><li>that is the </li></ul><ul><li>question </li></ul>www .BlueGecko . net
    13. 13. www .BlueGecko . net
    14. 14. <ul><li>high volume writes </li></ul><ul><li>choose your filesystem well </li></ul><ul><li>understand how your filesystem and raid controller work together </li></ul><ul><li>tune them to work in concert </li></ul>www .BlueGecko . net
    15. 15. <ul><li>facebook game case: </li></ul><ul><li>highly concurrent writes </li></ul><ul><li>low risk of -- </li></ul><ul><li>omg, i lost my most recent score! </li></ul>www .BlueGecko . net
    16. 16. <ul><li>shard data </li></ul><ul><li>innodb_log_flush_at_trx_commit=0 </li></ul><ul><li>benchmark i/o schedulers </li></ul>www .BlueGecko . net
    17. 17. <ul><li>free tools </li></ul><ul><li>innotop </li></ul><ul><li>maatkit </li></ul><ul><li>MySQL proxy </li></ul><ul><li>monitoring/trending </li></ul><ul><li>cacti templates </li></ul><ul><li>$monitoring_server </li></ul><ul><li> – the one you know </li></ul>www .BlueGecko . net
    18. 18. additional resources <ul><li> </li></ul><ul><ul><li>#mysql </li></ul></ul><ul><ul><li>#maatkit </li></ul></ul><ul><li> </li></ul><ul><li>HPM2e - Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, and Jeremy Zawodny </li></ul>www .BlueGecko . net
    19. 19. credits <ul><li>swap image </li></ul><ul><ul><li> </li></ul></ul><ul><li>special thanks to gabriel cain and mike hamrick for suggestions on content and slides </li></ul>www .BlueGecko . net
    20. 20. Blue Gecko and contact info <ul><li>[email_address] </li></ul><ul><li>[email_address] </li></ul><ul><li>@sarahnovotny </li></ul><ul><li>@bluegecko </li></ul><ul><li>senk on #mysql </li></ul>www .BlueGecko . net Blue Gecko provides Remote DBA services for companies around the world 7x24x365 support including monitoring, performance analysis, proactive maintenance and architectural guidance for small and large datasets.