1Copyright 2011 Severalnines AB Control your database infrastructure9th InstallmentMySQL Cluster Self-TrainingPart 8 – Designing a MySQL Cluster
2Copyright 2011 Severalnines AB Control your database infrastructureTopics• Node Placement• Capacity Planning and Dimensioning• Hardware recommendations• Best practice configuration• Storage calculations
3Copyright 2011 Severalnines AB Control your database infrastructureNode Placement• Data nodes should use dedicated instances– Heavy user of RAM, CPU, and DISK• API nodes (e.g. SQL node) should preferably be ondedicated instance– Heavy user of CPU, but little DISK– RAM usage dependant on workload• Management servers– Negligible use of CPU, DISK, RAM
4Copyright 2011 Severalnines AB Control your database infrastructureCo-location• Do not co-locate API nodes with Data nodes– They will compete for CPU– RAM usage for API nodes may grow, competing withresources of the Data node (causing swapping and nodefailures)• Don’t co-locate Management servers with Datanodes– You lose protection from split brain/network partitioning• API nodes and Management servers can be co-located
5Copyright 2011 Severalnines AB Control your database infrastructureCluster Size• Number of Data Nodes– Depends on Storage and Throughput requirements– Use Sizer (http://www.severalnines.com/sizer) to calculatestorage requirements for your data– At least two for redundancy• Number of API Nodes– Depends on the expected level of Throughput– At least two for redundancy– Usually recommended to have 2x API nodes compared to Datanodes (2 data nodes 4 API nodes). Especially for API nodesusing the synchronous NDB API (mysqld, Cluster/J)• Number of Management servers– Two for redundancy. Always!– Having one management server on every API node does notmake sense
6Copyright 2011 Severalnines AB Control your database infrastructureGood Initial Setup (1)STORAGE LAYER (NDBCLUSTER)ACCESS LAYERAPI nodendb_mgmdAPI nodendb_mgmdndbmtd ndbmtdCLUSTER CONTROLmysqldcmon
7Copyright 2011 Severalnines AB Control your database infrastructureGood Initial Setup (2)• Easy to scale:– Data nodes can be added online (it is not easy but possible)– API nodes can be added online (as long as there are free[mysqld] slots in config.ini)• Can be extended– Replicating out to an InnoDB database forreportinghttp://johanandersson.blogspot.se/2012/09/mysql-cluster-to-innodb-replication.html– Using the Hadoop Applier (oracle)https://blogs.oracle.com/MySQL/entry/announcing_the_mysql_hadoop_applier• The suggested setup is only a starting point– The questions on the next slides might help determine if youneed more nodes
8Copyright 2011 Severalnines AB Control your database infrastructureGood Initial Setup (3)• Can you load in the data that you need?– YES: good– NO:• Can you add more RAM to the data nodes?If not, create a new cluster with four nodes. Try and load in thedata gain.• Can some of the less active tables use DISK DATA storage?Avoid DISK DATA tables for frequently used data• Use Severalnines Sizer (http://www.severalnines.com/sizer/)(capacity planning tool). Create the schema in NDB Cluster,run sizer and import the result to a spreadsheet. Manipulate therow count.• Use sizer to verify growth scenarios
9Copyright 2011 Severalnines AB Control your database infrastructureGood Initial Setup (4)• Can you handle the throughput you need?– verify with Bencher (www.severalnines.com/bencher/)– YES: good– NO:• Are the data nodes the bottleneck?Run:top –Hd1Any of the data nodes threads running >90%?YES: create a new cluster with 2x the number of nodes.NO: The APIs can be the bottleneck. Add more API nodes• Tune schema and queries/requests (possible play with the NDBcluster connection pool as well)
10Copyright 2011 Severalnines AB Control your database infrastructureHardware for Data Nodes• 8 cores or more– Fast CPU and memory bus is important• As much RAM as you need– Memory tables and indexes for DISK DATA tables must fit in RAM• Disk Subsystem:– SATA2 is the absolute minimum (7200RPM), but not really suitable forproduction– Better options are:• SAS• SSD• AWS IOPS preferably– RAID 1+0 – requires 4 disks• Disk Storage Capacity– 10xDataMemory (for REDO LOG and LCP)• If you use Disk data tables– One disk for LCP– One disk for Tablespace (SSD could be an option)– One disk for UNDO/ REDO
11Copyright 2011 Severalnines AB Control your database infrastructureHardware for API Nodes• 8 cores or more– Fast CPU and memory bus is important• Disks– Replication servers:• Disk space must be dimensioned to store binary logs/relay logs• 5MB/s written into NDB binary logs will grow with 5MB/s– Disk is not important for the API Nodes• API nodes do not save any state information to disk (exceptsmall meta data like .frm files)
12Copyright 2011 Severalnines AB Control your database infrastructureNetwork• Network interconnect is important– Ethernet• 1Gig-E is most common• 10Gig-E is coming– Infiniband• IBOIP• Lower latency than Ethernet• Load-balancing– Hardware: F5, Extreme Summit, Cisco– Software: HAProxy, LVS
13Copyright 2011 Severalnines AB Control your database infrastructureStorage Calculations• Two things to consider– Disk space– Memory consumption
14Copyright 2011 Severalnines AB Control your database infrastructureDisk Space• One data node needs– 3xDM for LCP (3x for Headroom, 2x is on the limit)– 4-6xDM for Redo Log• 4x – read mostly applications• 6x – write intensive applications– Tablespace• Depends on how much data you plan to store on disk• Storage needed per table per node:2 x( #records x size_of_non_indexed_cols + 40B) x NoOfReplicas /#nodesNote: 40B is the record overhead– Store one or more backups• 1 x DM for each backup• This sums to >8x disk space than DataMemory
15Copyright 2011 Severalnines AB Control your database infrastructureDataMemory and IndexMemory• IndexMemory = 20B xsum_for_all_records• DataMemory / per table = 40B + avg_record_size• Per node:– DataMemory=SUM(DataMemory/table) x NO_OF_NODES /NO_OF_REPLICAS– IndexMemory=IndexMemoryx NO_OF_NODES /NO_OF_REPLICAS• Easy way:– www.severalnines.com/sizer• Provision a data model in clusterRun: ./sizer –aImport the csv data into the excel template.
16Copyright 2011 Severalnines AB Control your database infrastructureDisk Data tables• Not everything has to stay in RAM.– Log data, archives etc not frequently accessed can be storedin DISK DATA tables:http://johanandersson.blogspot.se/2012/04/mysql-cluster-disk-data-config.html– Indexedcolumnswillalwaysstay in RAM for DISK DATAtables.– Disk data access is notfast, butSSDshelps a lot.– Disk Data tablespacecanbeincreasedovertimeonline.
17Copyright 2011 Severalnines AB Control your database infrastructurePerformance Planning• Transaction capacity planning requires benchmarking– Throughput and Response times requirements affects thenumber of nodes, both data nodes and mysql servers.• Benchmark the common use cases– Severalnines Bencher allows to drive a high load and testindividual queries.– Jmeteretc can be used to drive web load– Try to simulate expected peak traffic.• Can the cluster handle the load? If not add resourcesonline where needed.
18Copyright 2011 Severalnines AB Control your database infrastructureComing next in Installment 10:Troubleshooting MySQL Cluster
19Copyright 2011 Severalnines AB Control your database infrastructureWe hope these training slides areuseful to you!Please visit our website to view thenext section of this training.For any questions, comments or feedback,please contact us at:firstname.lastname@example.orgThank you!