• Like
  • Save
Oracle my sql cluster cge
Upcoming SlideShare
Loading in...5
×
 

Oracle my sql cluster cge

on

  • 2,535 views

 

Statistics

Views

Total Views
2,535
Views on SlideShare
2,535
Embed Views
0

Actions

Likes
5
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • how can i download this
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Oracle my sql cluster cge Oracle my sql cluster cge Presentation Transcript

    • MySQL Cluster Carrier Grade EditionAlexander YuPrincipal Sales Consultant | MySQL Asia Pacific & Japan2011-07-20
    • Agenda / Topics • Oracle MySQL Strategy • MySQL Server Pluggable Storage Engine Architecture • High Availability Solutions • MySQL Cluster Carrier Grade – Internals – Geographical Replication – Scale Out – Backup & Restore •Q&A© 2011 Oracle Corporation 2
    • About MySQL • Founded, first release in 1995 • MySQL Acquired by Sun Microsystems Feb 2008 • Oracle Acquires Sun Microsystems Jan 2010 • +12M Product Installations • 65K+ Downloads Per Day • Part of the rapidly growing open source LAMP stack Customers across every major operating system, hardware vendor, geography, industry, and application type High Performance ▪ Reliable ▪ Easy to Use© 2011 Oracle Corporation 3
    • Oracle’s Strategy: Complete. Open. Integrated. • Built together • Tested together • Managed together • Serviced together • Based on open standards • Lower cost • Lower risk • More reliable© 2011 Oracle Corporation 4
    • Complete. Open. Integrated. MySQL Completes The Stack • Oracle never settles for being second best at any level of the stack • “Complete” means we meet most customer requirements at every level That’s why MySQL matters to Oracle and Oracle customers© 2011 Oracle Corporation 5
    • The “M” in the LAMP Stack Operating System L Application Server A Database M Scripting PFor© 2011 Oracle Only -- Oracle Confidential & Proprietary Internal Use Corporation 6
    • Investment in MySQL Rapid Innovation • Make MySQL a Better MySQL • #1 Open Source Database for Web Applications • Most Complete LAMP Stack • Telecom & Embedded • Develop, Promote and Support MySQL • Improve engineering, consulting and support • Leverage 24x7, World-Class Oracle Support • MySQL Community Edition • Source and binary releases • GPL license© 2011 Oracle Corporation 7
    • Oracle + MySQL Customers • Product Integration • Oracle GoldenGate (Complete!) • Oracle Enterprise Linux + Oracle VM (Complete!) HA Template Available • Oracle Secure Backup (CY 2011) • Oracle Audit Vault (CY 2011) • Oracle Enterprise Manager (CY 2011) • Support • Leverage 24x7, World-Class Oracle Support • MyOracle Support© 2011 Oracle Corporation 8
    • Serving Key Markets and Industry Leaders Powering Data Management on the Web & in the Network Web OEM / ISV’s SaaS, Hosting Telecommunications Enterprise 2.0© 2011 Oracle Corporation 9
    • MySQL in Communications http://www.mysql.com/industry/communications/resources.html#customer_case_studies© 2011 Oracle Corporation 10
    • MySQL Server Pluggable Storage Engine Architecture© 2011 Oracle Corporation 11
    • Pluggable Storage Engine Architecture MySQL Server Connectors Clients and Apps Native C API, JDBC, ODBC, .Net, PHP, Ruby, Python, VB, Perl Enterprise Management Services and Utilities Connection Pool Backup & Recovery Authentication – Thread Reuse – Connection Limits – Check Memory – Caches Security Replication Cluster Partitioning SQL Interface Parser Caches Optimizer Instance Manager DDL, DML, Stored Query Translation, Global and Engine Access Paths, Statistics Information_Schema Procedures, Views, Object Privileges Specific Caches and MySQL Workbench Triggers, Etc.. Buffers Pluggable Storage Engines Memory, Index and Storage Management InnoDB MyISAM Cluster Etc… Partners Community More.. Filesystems, Files and Logs Redo, Undo, Data, Index, Binary, Error, Query and Slow© 2011 Oracle Corporation 12
    • MySQL Cluster Architecture Shared-nothing distributed database with no SPOF: JDBC (Java) High Read & Write Performance & 99.999% uptime NDB API (C++) ClusterJ (Java) Clients OpenJPA (Java) PHP/P*/ODBC OpenLDAP MySQL Cluster Application Nodes SQL Nodes ClusterJ MGM Client NDB API (C++) MGM API (C) NDB API NDB APIMGM Node MGM Node MySQL Cluster Data Nodes© 2011 Oracle Corporation 13
    • Workload Qualification InnoDB vs MySQL Cluster Workload InnoDB MySQL Cluster No. Unless mainly Packaged Applications (i.e. standard business applications) Yes PK access Custom Applications Yes Yes OLTP Applications Yes Yes DSS Applications (i.e. Data Marts, Analytics, etc.) Yes No Content Management Yes Limited Support In-Network Telecoms Applications (HLR, HSS, SDP, etc) No Yes Web Session Management Yes Yes User Profile Management & AAA Yes Yes eCommerce Databases Yes Yes© 2011 Oracle Corporation 14
    • Feature Comparison InnoDB vs MySQL ClusterFeature Qualification InnoDB MySQL ClusterLatest MySQL 5.5 & InnoDB 1.1 Performance Enhancements Yes NoStorage Limits 64TB 2TB (a)Foreign Keys Yes NoMVCC Non-Blocking Reads Yes NoOptimized for Complex Multi-Table JOINs with Thousands of Accesses Yes No (b)Hash Indexes No YesCompressed Data Yes NoSupport for 8KB+ Row Sizes Yes Only via BLOBs ( c )Built-in Clustering Support for 99.999% HA No YesMinimum Number of Physical Hosts for Redundancy 2 (Active / Passive) 2 + 1 ( A/A & Mgmt) (d)Time to Recovery After Node Failure 30s - hours Sub-SecondReal-Time Performance No YesOption for In-Memory Storage of Tables with Disk Persistence No YesNon-SQL Access Methods to Data (i.e. NDB API) No YesWrite Scalability without Application Partitioning No Yes (e)Max Number of Nodes for Parallel Write Performance 1 48 (f)Conflict Resolution & Detection across Multiple Replication Masters No YesVirtualization Support Yes No© 2011 Oracle Corporation 15
    • Storage Engines Feature MyISAM NDB Archive InnoDB Memory Storage limits No Yes No 64TB Yes Transactions No Yes No Yes No Locking granularity Table Row Row Row Table MVCC snapshot read No No No Yes No Geospatial support Yes No Yes Yes No Data caches No Yes No Yes NA Index caches Yes Yes No Yes NA Compressed data Yes No Yes No No Storage cost (relative to other engines)‫‏‬ Small Med Small Med NA Memory cost (relative to other engines)‫‏‬ Low High Low High High Bulk insert speed High High Highest Med High Replication support Yes Yes Yes Yes Yes Foreign Key support No No No Yes No Built-in Cluster/High-availability support No Yes No No No Dynamically add and remove storage engines. Change the storage engine on a table with “ALTER TABLE …”© 2011 Oracle Corporation 16
    • Why Users Adopt MySQL Cluster MySQL Already in UseHigh Read/Write 99.999% MySQLThroughputReal Time Performance Scale-Out, On-Demand© 2011 Oracle Corporation 17
    • Why Users Buy MySQL Cluster CGE Standardized on Open SourceBlend of Web & Deploying Mission Critical ApplicationsTelecoms Capabilities HA MySQL Management & Monitoring Global 24x7 support Tools Embedding MySQL Cluster Real-Time, High Read/ Write Performance Scale-Out, Shared Nothing © 2011 Oracle Corporation 18
    • High Availability Solutions© 2011 Oracle Corporation 19
    • Selecting the Right HA Architecture© 2011 Oracle Corporation 20
    • Mapping HA Architecture to Applications Shared-Nothing, Data Clustered / Applications Geo-Replicated Replication Virtualized Cluster E-Commerce / Trading Session Management User Authentication / Accounting Feeds, Blogs, Wikis Data Refinery OLTP Data Warehouse/BI Content Management CRM / SCM Collaboration Packaged Software Telco Apps (HLR/HSS/SDP…)© 2011 Oracle Corporation 21
    • MySQL High Availability Solutions 9 5. 0 0 0 % • MySQL Replication 9 9. 0 0 0 % • MySQL Replication with Clustering Software 9 9. 9 0 0 % • DRBD with Clustering Software 9 9. 9 0 0 % • Shared Storage with Clustering Software (A/P - A/A) 9 9. 9 9 0 % • DRBD and Replication with Clustering Software 9 9. 9 9 0 % • Shared Storage and Replication with Clustering SW 9 9. 9 9 0 % • Shared Storage Replication 9 9. 9 9 0 % • Virtualised Environment 9 9. 9 9 9 % • MySQL Cluster 9 9. 9 9 9 % • MySQL Cluster & Replication 9 9. 9 9 9 % • MySQL Cluster Carrier Grade Edition© 2011 Oracle Corporation 22
    • MySQL Replication• Native in MySQL• Used for Scalability and HA• Asynchronous as standard• Semi-Synchronous support added in MySQL 5.5• Each slave adds minimal load on master Relay Log© 2011 Oracle Corporation 22
    • Replication Topologies Single Chain Circular Multiple Multi - Master Multi - Circular© 2011 Oracle Corporation 24
    • MySQL Replication Read Scalability Clients MySQL Replication Slaves Master • Used by leading web properties for scale-out • Reads are directed to slaves, writes to master • Delivers higher performance & scale with efficient resource utilization© 2011 Oracle Corporation 22
    • MySQL Replication Failure Detection & Failover • Linux Heartbeat implements heartbeat protocol between nodes • Failover initiated by Cluster Resource Manager (Pacemaker) if heartbeat message is not received • Virtual IP address failed over to ensure failover is transparent to apps© 2011 Oracle Corporation 22
    • Shared Disk Clusters A/P - A/A READS/WRITES Applications VIP Shared Storage • Reliability • High Availability - Commonly used solution - Data handled by a SAN or NAS and always available • Fault Tolerance - Automatic fail-over - No single point of failure with appropriate hardware • Simplified Management© 2011 Oracle Corporation 27
    • Distributed Replicated Block Device • DRBD creates transaction-safe hot standby configuration • MySQL updates written to block device on the Active Server • DRBD synchronously replicates updates to the Passive Server • Linux Heartbeat fails over from Active to Passive in event of failure© 2011 Oracle Corporation 28
    • Sharding aka Application Partitioning Master Clients Slave Reads Writes Partitioning Logic 1 2 3 4 5 Shards Slaves© 2011 Oracle Corporation 29
    • Oracle VM Template for MySQL Integrated & Tested OS, VM and Database Stack Oracle VM Oracle VM Oracle VMFastest, simplest & most reliable way to deploy virtualized, cloud- ready MySQL instances, certified Oracle VM Oracle VM for production use• Rapid DEPLOYMENT Oracle VM Server Pool• Increased RELIABILITY• Higher AVAILABILITY Oracle VM Servers• Lower COST© 2011 Oracle Corporation 30
    • Template Components Certified for Production Deployment Oracle VM Oracle VM Automatic Fault Detection & Recovery • Oracle Linux 5 Update 6 with the Unbreakable Enterprise Kernel • Oracle VM 2.2.1 Secure Live Migration (SSL) • Oracle VM Manager 2.1.5 Oracle VM Server Pool • Oracle Cluster File System 2 (OCFS2) Oracle VM Manager • MySQL Database 5.5.10 (Enterprise Edition) Oracle VM Servers Pre-Installed & Pre-Configured ocfs2 Full Integration & QA Testing SAN / iSCSI Single Point of Support© 2011 Oracle Corporation 31
    • Positioning Current Solutions Requirement MySQL Replication Heartbeat + DRBD Oracle VM Template MySQL Cluster Availability All Supported by MySQL All Supported by MySQL Platform Support Linux Oracle Linux Server Cluster Depends on Connector and Automated IP Failover No Yes Yes Configuration Automated Database No Yes Yes Yes Failover Automatic Data No Yes N/A - Shared Storage Yes Resynchronization Configuration Dependent, 60 Configuration Dependent, 60 Typical Failover Time User / Script Dependent 1 Second and Less seconds and Above seconds and Above No, Asynchronous and Semi- Synchronous Replication Yes N/A - Shared Storage Yes Synchronous Geographic Redundancy Yes Yes, via MySQL Replication Yes, via MySQL Replication Yes, via MySQL Replication Support Scalability One Active (primary), one One Active (primary), one Number of Nodes One Master, Multiple Slaves 255 Passive (secondary) Node Passive (secondary) Node Reads, via MySQL Reads, via MySQL Reads, via MySQL Built-in Load Balancing Yes, Reads and Writes Replication Replication Replication & During Failover Read-Intensive Workloads Yes Yes Yes Yes Yes, via Application-Level Yes, via Application-Level Yes, via Application-Level Write-Intensive Workloads Sharding to Multiple Active/ Sharding to Multiple Active/ Yes, via Auto-Sharding Sharding Passive Pairs Passive Pairs Scale On-Line (add nodes, No No No Yes repartition, etc.)© 2011 Oracle Corporation 32
    • MySQL Cluster Real-time Carrier Grade Database© 2011 Oracle Corporation 33
    • Customers & Applications • Web – User profile management – Session stores – eCommerce – On-Line Gaming – Application Servers • Telecoms – Subscriber Databases (HLR/HSS) – Service Delivery Platforms – VoIP, IPTV & VoD – Mobile Content Delivery – On-Line app stores and portals – IP Management – Payment Gateways http://www.mysql.com/industry/telecom/© 2011 Oracle Corporation 34
    • MySQL Cluster - NDB Storage Engine© 2011 Oracle Corporation 35
    • MySQL Cluster Architecture Shared-nothing distributed database with no SPOF: JDBC (Java) High Read & Write Performance & 99.999% uptime NDB API (C++) ClusterJ (Java) Clients OpenJPA (Java) PHP/P*/ODBC OpenLDAP MySQL Cluster Application Nodes SQL Nodes ClusterJ MGM Client NDB API (C++) MGM API (C) NDB API NDB APIMGM Node MGM Node MySQL Cluster Data Nodes© 2011 Oracle Corporation 36
    • MySQL Cluster Nodes SQL Based Applications JDBC/ODBC MySQL/ API API API Node Management SQL Node Node Node Client NDB API Data MySQL Cluster Data MGM API Node Node Management Node NDB API Data Data Node Node© 2011 Oracle Corporation 37
    • MySQL Cluster Nodes • Standard SQL Interface SQL Node • Scale-out for Performance (MySQL) • Enables Replication • High Performance NDB API • C, C++ & Java, LDAP, HTTP API (Application) • Developer’s Guide • Data Storage (Memory/Disk) Data Node • Automatic & User-Defined Partitioning • Local & Global Checkpoints (NDB Storage Engine) • Scale-out or scale-up for Capacity & Redundancy • Scale dynamically with on-line add node • Administration and Configuration Management • Arbitration • Use Two for Redundancy Node© 2011 Oracle Corporation 38
    • Replication Flexibility • Synchronous replication within a Cluster node group for HA • Bi-Direction asynchronous Cluster 1 Cluster 2 replication to remote Cluster for geographic redundancy • Asynchronous replication to non- Cluster databases for specialised activities such as report generation • Mix and match replication types MyISAM MyISAM InnoDB Synchronous replication Asynchronous replication© 2011 Oracle Corporation 39
    • MySQL Cluster Loads MySQL MySQL MySQL Community Cluster Cluster Server (GPL) CGE • MySQL Cluster software (Management & MySQL MySQL MySQL Server ≠ Server = Server Data Nodes) included with MySQL Community Server should not be used InnoDB ≠ InnoDB = InnoDB • MySQL Server included with MySQL ≠ Data Node = Data Node Cluster loads is different to regular ≠ Mgmt Node = Mgmt Node MySQL Server • Always use this special version of MySQL Server when accessing MySQL Cluster data • MySQL Cluster CGE downloaded from oem.mysql.com • GA GPL Community versions downloaded from www.mysql.com/downloads • In-development GPL Community versions downloaded from dev.mysql.com/ downloads/© 2011 Oracle Corporation 40
    • MySQL Cluster System Requirements System Component Requirement Hosts Maximum of 255 total nodes (48 Data Nodes)‫‏‬ COTS – Advanced TCA Hardware 32 & 64-bit x86 & SPARC Memory Varies on size of database, # of hosts, # of replicas Shared-Nothing - Memory & Disk Data Storage SCSI or RAID for I/O performance Network >1 Gigabit recommended, SCI supported Linux (Red Hat, SuSE), Solaris, HP-UX, Mac OSX, Operating System Windows, others…© 2011 Oracle Corporation 41
    • MySQL Cluster 6.2© 2011 Oracle Corporation 42
    • MySQL Cluster 6.3 http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-changes-5-1-ndb-6-3.html© 2011 Oracle Corporation 43
    • MySQL Cluster 7.0 –GA April 2009 http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php© 2011 Oracle Corporation 44
    • Scale out – multi core environments© 2011 Oracle Corporation 45
    • MySQL Cluster vs MySQL MEMORY: 30x Higher Throughput / 1/3rd the Latency on a single node • Table level locking inhibits MEMORY scalability beyond a single client connection • Check-pointing & logging enabled, MySQL Cluster still delivers durability • 4 socket server, 64GB RAM, running Linux© 2011 Oracle Corporation 46
    • Scale-Out Reads & Writes on Commodity Hardware • NDB API Performance 4.33 M Queries per second! • 8 Intel servers, dual-6-core CPUs @2.93 GHz, 24GB RAM • 2 Data Nodes per server • flexAsync benchmark – 16 parallel threads, each issuing 256 simultaneous transactions – Read / Write 100KB attribute • Interim results from 2 days testing – watch this space: mikaelronstrom.blogspot.com© 2011 Oracle Corporation 47
    • MySQL Cluster CGE 7.1 – Key Enhancements http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php© 2011 Oracle Corporation 48
    • MySQL Cluster 7.1 Momentum 1,000 Downloads per Day Windows GA Pro-active Cluster Monitoring Fully Automated “MySQL Cluster 7.1 gave us the Management perfect combination of extreme levels of transaction throughput, low 10x Higher Java latency & carrier-grade availability, Performance while reducing TCO” Phani Naik, Pyro Group© 2011 Oracle Corporation 49
    • MySQL Cluster 7.1: ndbinfo mysql> use ndbinfo • New database (ndbinfo) which mysql> show tables; presents real-time metric data +-------------------+ in the form of tables | Tables_in_ndbinfo | +-------------------+ • Exposes new information | blocks | together with providing a | config_params | simpler, more consistent way to | counters | access existing data | logbuffers | | logspaces | • Examples include: | memoryusage | • Resource usage (memory, buffers) | nodes | • Event counters (such as number of | resources | READ operations since last restart) | transporters | • Data node status and connection +-------------------+ status© 2011 Oracle Corporation 50
    • MySQL Cluster 7.1: ndbinfo • Example 1: Check memory usage/availability mysql> select * from ndbinfo.memoryusage; +---------+--------------+--------+------------+-----------+-------------+ | node_id | memory_type | used | used_pages | total | total_pages | +---------+--------------+--------+------------+-----------+-------------+ | 3 | Data memory | 917504 | 28 | 104857600 | 3200 | | 3 | Index memory | 221184 | 27 | 11010048 | 1344 | | 4 | Data memory | 917504 | 28 | 104857600 | 3200 | | 4 | Index memory | 221184 | 27 | 11010048 | 1344 | +---------+--------------+--------+------------+-----------+-------------+ • Note that there is a DATA_MEMORY and INDEX_MEMORY row for each data node in the cluster • If the Cluster is nearing the configured limit then increase the DataMemory and/or IndexMemory parameters in config.ini and then perform a rolling restart© 2011 Oracle Corporation 51
    • MySQL Cluster 7.1: ndbinfo • Example 2: Check how many table scans performed on each data node since the last restart mysql> select node_id as data node, val as Table Scans from ndbinfo.counters where counter_name=TABLE_SCANS; +-----------+-------------+ | data node | Table Scans | +-----------+-------------+ | 3 | 3 | | 4 | 4 | +-----------+-------------+ • You might check this if your database performance is lower than anticipated • If this figure is rising faster than you expected then examine your application to understand why there are so many table scans© 2011 Oracle Corporation 52
    • Latest news on MySQL Cluster 7.1 • As of MySQL Cluster 7.1.9a: • InnoDB plugin included • New view in ndbinfo: mysql> SELECT node_id, page_requests_direct_return AS hit, page_requests_wait_io AS miss, 100*page_requests_direct_return/(page_requests_direct_return+page_requests_wait_io) AS hit_rate FROM ndbinfo.diskpagebuffer; +---------+------+------+----------+ | node_id | hit | miss | hit_rate | +---------+------+------+----------+ | 3 | 6 | 3 | 66.6667 | | 4 | 10 | 3 | 76.9231 | +---------+------+------+----------+ • MEM2.3 includes new Cluster Advisor/graphs© 2011 Oracle Corporation 53
    • MySQL Enterprise Monitor 2.3© 2011 Oracle Corporation 54
    • Online Operations • Scale the cluster for throughput or capacity – Data and SQL Nodes • Repartition tables • Recover failed nodes • Upgrade / patch servers & OS • Upgrade / patch MySQL Cluster • Back-Up • Evolve the schema on-line, in real-time© 2011 Oracle Corporation 55
    • Real-Time, On-Line Schema Changes CREATE OFFLINE INDEX b ON t1(b); • Fully online – transaction response Query OK, 1356 rows affected (2.20 sec)‫‏‬ times unchanged • Add and remove indexes, add new columns and tables DROP OFFLINE INDEX b ON t1; • No temporary table creation Query OK, 1356 rows affected (2.03 sec)‫‏‬ • No recreation of data or deletion required CREATE ONLINE INDEX b ON t1(b); • Faster and better performing table Query OK, 0 rows affected (0.58 sec)‫‏‬ maintenance operations • Less memory and disk requirements DROP ONLINE INDEX b ON t1; Query OK, 0 rows affected (0.46 sec)‫‏‬ ALTER ONLINE TABLE t1 ADD COLUMN d INT; Query OK, 0 rows affected (0.36 sec)‫‏‬© 2011 Oracle Corporation 56
    • Performance I Flexibility I Simplification • SQL and NoSQL Access Methods to tables – SQL: complex queries, rich ecosystem of apps & expertise – Simple Key/Value interfaces bypassing SQL layer for blazing fast reads & writes – Real-time interfaces for micro-second latency – Developers free to work in their preferred environment© 2011 Oracle Corporation 57
    • Scaling Distributed Joins 7.2DM Adaptive Query Localization • ‘Complex’ joins traditionally slower in MySQL Cluster – Complex = lots of levels and interim results in JOIN • JOIN was implemented in the MySQL Server: – Nested Loop join – When data is needed, it must be fetched over the mysqld network from the Data Nodes; row by row – This causes latency and consumes resources • Can now push the execution down into the data Data Nodes nodes, greatly reducing the network trips AQL • 25x-40x performance gain in customer PoC! mysqld Data Nodes The existence, content and timing of future releases described here is included for information only and may be changed at Oracles discretion.http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html © 2011 Oracle Corporation 58
    • Adaptive Query Localization: Current Limitations • Columns to be joined – must use exactly the same data type – cannot be any of the BLOB or TEXT types – columns to be joined must be part of a table index or primary key • AQL can be disabled using the ndb_join_pushdown server system variable – enabled by default© 2011 Oracle Corporation 59
    • •<Insert Picture Here> Early Adopter Speaks!“Testing of Adaptive Query Localization has yielded over 20xhigher performance on complex queries within our application,enabling Docudesk to expand our use of MySQL Cluster into abroader range of highly dynamic web services.”Casey BrownManager, Development & DBA Services, Docudesk© 2011 Oracle Corporation 60
    • MySQL Cluster: SQL & NoSQL Combined Mix & Match! Same data accessed simultaneously through SQL & NoSQL interfaces• NoSQL – Multiple ways to bypass SQL, and maximize performance: • NDB API. C++ for highest performance, lowest latency • Cluster/J for optimized access in Java • NEW! Memcached. Use all your existing clients/applications© 2011 Oracle Corporation 61
    • Which to Choose ?© 2011 Oracle Corporation 62
    • Performance© 2011 Oracle Corporation 63
    • NoSQL With NDB API Best possible performance Clients • Application embeds the NDB API C++ interface library • NDB API make intelligent decision (where possible) about which data node to send queries to Applications with embedded NDB API Library – With a little planning in the schema design, achieve linear scalability • Used by all of the other application nodes (MySQL, LDAP, ClusterJ,…) • Best possible performance but requires > development skill • Favourite API for real-time network applications • Foundation for all interfaces MySQL Cluster Data Nodes© 2011 Oracle Corporation 64
    • NoSQL with memcached 7.2DM • Memcached is a distributed memory based hash-key/value store with no persistence to disk Memcached protocol • NoSQL, simple API, popular with developers • MySQL Cluster already provides scalable, in- memory performance with NoSQL (hashed) access as well as persistence • Provide the Memcached API but map to NDB API calls • Writes-in-place, so no need to invalidate cache • Simplifies architecture as caching & database integrated into 1 tier • Access data from existing relational tables© 2011 Oracle Corporation 65
    • NoSQL with Memcached 7.2DM Pre-GA version available from labs.mysql.com Flexible: Simple: • Deployment options set maidenhead 0 0 3 SL6 • Multiple Clusters STORED • Simultaneous SQL Access • Can still cache in Memcached server get maidenhead • Flat key-value store or map to multiple tables/ VALUE maidenhead 0 3 SL6 columns END© 2011 Oracle Corporation 66
    • MySQL Cluster Manager 1.1 Features Delivered as part of MySQL Cluster CGE 7.1© 2011 Oracle Corporation 67
    • How Does MySQL Cluster Manager Help ? Example: Initiating upgrade from MySQL Cluster 6.3 to 7.1 Before MySQL Cluster Manager With MySQL Cluster Manager •1 x preliminary check of cluster state upgrade cluster --package=7.1 mycluster; •8 x ssh commands per server •8 x per-process stop commands •4 x scp of configuration files (2 x mgmd & 2 x Total: 1 Command - mysqld) Unattended Operation •8 x per-process start commands •8 x checks for started and re-joined processes • Results •8 x process completion verifications • Reduces the overhead and complexity of •1 x verify completion of the whole cluster. managing database clusters •Excludes manual editing of each configuration file. • Reduces the risk of downtime resulting from Total: 46 commands - administrator error 2.5 hours of attended operation • Automates best practices in database cluster management© 2011 Oracle Corporation 68
    • Terms used by MySQL Cluster Manager • Site: the set of physical hosts which are to run Cluster processes to be managed by MySQL Cluster Manager. A site can include 1 or more Site clusters. Host Host Host Host • Cluster: represents a MySQL Cluster deployment. A Cluster contains 1 or more Cluster processes running on 1 or more hosts • Host: Physical machine, running the MySQL Process Process Process Process Process Process Process Cluster Manager agent Cluster • Agent: The MySQL Cluster Manager process running on each host Process Process Process • Process: an individual MySQL Cluster node; one of: ndb_mgmd, ndbd, ndbmtd, mysqld & agent agent agent agent ndbapi* • Package: A copy of a MySQL Cluster installation directory as downloaded from mysql.com, stored on each host *ndbapi is a special case, representing a slot for an external application process to connect to the cluster using the NDB API© 2011 Oracle Corporation 69
    • Example configuration mysql client • MySQL Cluster Manager agent runs on each physical host 7. mysqld 8. mysqld • No central process for Cluster Manager – 1. ndb_mgmd 2. ndb_mgmd agents co-operate, each one responsible agent agent for its local nodes • Agents are responsible for managing all 192.168.0.10 192.168.0.11 nodes in the cluster 3. ndbd 4. ndbd • Management responsibilities • Starting, stopping & restarting nodes 5. ndbd 6. ndbd • Configuration changes agent agent • Upgrades 192.168.0.12 192.168.0.13 • Host & Node status reporting • Recovering failed nodes n. mysqld MySQL Server (ID=n) n. ndb_mgmd Management Node (ID=n) n. ndbd Data Node (ID=n) agent MySQL Cluster Manager agent© 2011 Oracle Corporation 70
    • Creating & Starting a Cluster mysql 1.Define the site: client Mysql> create site --hosts=192.168.0.10,192.168.0.11, -> 192.168.0.12,192.168.0.13 mysite; 2.Expand the MySQL Cluster tar-ball(s) from mysql.com to known directory 7. mysqld 8. mysqld 3.Define the package(s): 1. ndb_mgmd 2. ndb_mgmd Mysql> add package --basedir=/usr/local/mysql_6_3_26 6.3; Mysql> add package --basedir=/usr/local/mysql_7_0_7 7.0; agent agent Note that the basedir should match the directory used in Step 2. 192.168.0.10 192.168.0.11 4.Create the Cluster Mysql> create cluster --package=6.3 3. ndbd 4. ndbd -> --processhosts=ndb_mgmd@192.168.0.10,ndb_mgmd@192.168.0.11, -> ndbd@192.168.0.12,ndbd@192.168.0.13, ndbd@192.168.0.12, -> ndbd@192.168.0.13,mysqld@192.168.9.10,mysqld@192.168.9.11 5. ndbd 6. ndbd -> mycluster; agent agent This is where you define what nodes/processes make up the Cluster and where they should run 192.168.0.12 192.168.0.13 5.Start the Cluster: Mysql> start cluster mycluster;© 2011 Oracle Corporation 71
    • Upgrade Cluster mysql client • Upgrade from MySQL Cluster 6.3.26 to 7.0.7: 7. mysqld 8. mysqld mysql> upgrade cluster --package=7.0 mycluster; 1. ndb_mgmd 2. ndb_mgmd agent agent • Automatically upgrades each node and restarts the process – in the correct order to avoid any loss of service 192.168.0.10 192.168.0.11 • Without MySQL Cluster Manager, the 3. ndbd 4. ndbd administrator must stop each process in turn, start the process with the new version and wait 5. ndbd 6. ndbd for the node to restart before moving onto the agent agent next one 192.168.0.12 192.168.0.13© 2011 Oracle Corporation 72
    • MySQL Cluster Manager GA 1st November 2010 Mgmt Mgmt Mgmt Mgmt 33 mysqld Node 34 mysqld Node 33 mysqld mysqld Node 34 mysqld mysqld Node Data Data Data Data Data Data 31 Node 32 Node 31 Node 32 Node 35 Node 36 Node • On-line add-node mysql> add hosts --hosts=192.168.0.35,192.168.0.36 mysite; mysql> add package --basedir=/usr/local/mysql_7_0_7 – hosts=192.168.0.35,192.168.0.36 7.0; mysql> add process -- processhosts=mysqld@192.168.0.33,mysqld@192.168.0.34,ndbd@192.1 68.0.35,ndbd@192.168.0.36 mycluster; mysql> start process --added mycluster; • Restart optimizations • Fewer nodes restarted on some parameter changes© 2011 Oracle Corporation 73
    • General Design Considerations • MySQL Cluster is designed for – Short transactions – Many parallel transactions • Utilize Simple access patterns to fetch data – Use efficient scans and batching interfaces • Analyze what your most typical use cases are – optimize for those Overall design goal Minimize network roundtrips for your most important requests!© 2011 Oracle Corporation 74
    • Best Practice : Primary Keys • To avoid problems with • Cluster 2 Cluster replication • Recovery • Application behavior (KEY NOT FOUND.. etc) • ALWAYS DEFINE A PRIMARY KEY ON THE TABLE! • A hidden PRIMARY KEY is added if no PK is specified. BUT.. • .. NOT recommended • The hidden primary key is for example not replicated (between Clusters)!! • There are problems in this area, so avoid the problems! • So always, at least have id BIGINT AUTO_INCREMENT PRIMARY KEY • Even if you dont “need” it for you applications© 2011 Oracle Corporation 75
    • Best Practice: Distribution Aware AppsSELECT SUM(population) FROM townsWHERE country=“UK”; • Partition selected using hash on Partition Key Partition Key • Primary Key by default Primary Key • User can override in table definition town country population Maidenhead UK 78000 • MySQL Server (or NDB API) will Paris France 2193031 attempt to send transaction to the Boston UK 58124 correct data node Boston USA 617594 • If all data for the transaction are in the same partition, less messaging -> fasterSELECT SUM(population) FROM townsWHERE town=“Boston”; • Aim to have all rows for high-running queries in same partition Partition Key Primary Key town country population Maidenhead UK 78000 Paris France 2193031 Boston UK 58124 Boston USA 617594© 2011 Oracle Corporation 76
    • Best Practice: Distribution Aware – Multiple Tables Partition Key Primary Key sub_id age gender • Extend partition awareness over 19724 25 male multiple tables 84539 43 female • Same rule – aim to have all data for 19724 16 female instance of high running transactions 74574 21 female in the same partition Partition Key Primary Key ALTER TABLE service_ids service sub_id svc_id PARTITION BY KEY(sub_id); twitter 19724 76325732 twitter 84539 67324782 facebook 19724 83753984 facebook 73642 87324793© 2011 Oracle Corporation 77
    • MySQL Cluster Internals© 2011 Oracle Corporation 78
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4- Node groups are created automatically F4 F2- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica) © 2011 Oracle Corporation 79
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 Px Partition P1 Data Node 2 P2 P3 Data Node 3 P4 Data Node 4 A fragment is a copy of a partition (aka fragment replica) Number of fragments = # of partitions * # of replicas© 2011 Oracle Corporation 80
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 Px Partition P1 Data Node 2 P2 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 81
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 Px Partition P1 Data Node 2 P2 F1 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 82
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 83
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 84
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 85
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 P4 Fx Primary Fragment Data Node 4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 86
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 P4 Fx Primary Fragment Data Node 4 F4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 87
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Fx Primary Fragment Data Node 4 F4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 88
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Fx Primary Fragment Data Node 4 F4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 89
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4- Node groups are created automatically F4 F2- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica) © 2011 Oracle Corporation 90
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4As long as one data node in each node group is running we have a complete F4 F2 Fx Secondary Fragment (fragment replica) copy of the data© 2011 Oracle Corporation 91
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4As long as one data node in each node group is running we have a complete F4 F2 Fx Secondary Fragment (fragment replica) copy of the data© 2011 Oracle Corporation 92
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4As long as one data node in each node group F4 F2 Fx Secondary Fragment (fragment replica)is running we have a complete copy of the data© 2011 Oracle Corporation 93
    • Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4 F4 F2- No complete copy of the data Fx Secondary Fragment (fragment replica)- Cluster shutdowns automatically © 2011 Oracle Corporation 94
    • Data Partitioning • Automatic distribution/partitioning – Primary Key hash value (partitioning by Key) • Transparent load balancing – Distribution awareness • Data Node chosen based on PK hash value – Or proximity (SQL Node - shared memory, localhost, remote host) • Support for user defined partitioning 4 Partitions * 2 Replicas = 8 Fragments • Key Concepts Table T1 Data Node 1 F1 F3 – Partition Px Partition • Horizontal P1 Node Group 1 Data Node 2 • # of partitions = # of data nodes P2 F3 F1 – Fragment P3 • Copy of a partition Data Node 3 F2 F4 – Replica P4 • Complete copy of the data Fx Primary Fragment Data Node 4 Node Group 2 – Node Group - Node groups are created automatically - # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica) F4 F2 • Groups data nodes (automatically) • Determined by the order in configuration file • # of groups = # of data nodes / # of replicas© 2011 Oracle Corporation 95
    • Internal Replication • Replication between Data Nodes • Synchronous Replication – To ensure minimal failover time – Data Nodes have the same information at the same point in time – Achieved by Two-phase commit protocol • Two-phase commit – 1. Prepare/update phase • All fragments (primary/secondary) gets updated – 2. Commit phase • The changes are committed – Every Data Node has Transaction Coordinator – One is elected to be the transaction coordinator – The information goes from the Transaction Coordinator (TC) to primary fragments and further to secondary fragments© 2011 Oracle Corporation 96
    • Internal Replication: Prepare Phase Data Node insert into T1 values (...) Data Node 1 Transaction Coordinator Transaction Coordinator 4 2 Local Query Handler 3 Local Query Handler ACC TUP 1. Calc hash on PK ACC TUP 2. Forward request to LQH Index F1 F2 where primary fragment is Index F2 F1 Memory 3. Prepare secondary fragment Memory Data Memory Data Memory 4. Prepare phase done© 2011 Oracle Corporation 97
    • Internal Replication: Commit Phase Data Node insert into T1 values (...) Data Node 4 Transaction Coordinator Transaction Coordinator 1 3 Local Query Handler 2 Local Query Handler ACC TUP ACC TUP Index F1 F2 Index F2 F1 Memory Memory Data Memory Data Memory© 2011 Oracle Corporation 98
    • Transactions • Transaction Coordinator – The elected TC starts the transaction – TC calculates a hash on the primary key – Each transaction contains one or more Read/Insert/Update or Delete Operations – Operations are forwarded to the LQH of the Data Node having the data for the operation • Isolation Level – Committed Read • Read both from primary and secondary fragment • No lock required • Update/Insert/Delete – Locks on index entry in ACC – Both primary and secondary fragments • Read exclusive/Read shared – Locks the index entry in ACC on primary and secondary fragments© 2011 Oracle Corporation 99
    • Scans • Full table • Ordered Index (range queries) • Scans are started on the nodes having the primary fragment • Table and Index Scans – Parallel on all Data Nodes • Scans with engine condition push down – Send the WHERE clause to be evaluated by Data Nodes – Returns a smaller result set back to the SQL Node, i.e., MySQL Server© 2011 Oracle Corporation 100
    • Indexes • Hash Index – Each NDB table always has a hash index • Usually on Primary Key – If no Primary Key is explicitly defined a hidden auto increment BIGINT PK is created – Maintained in Index Memory, takes up approx. 25B per record + PK size • Ordered Index – Using T-trees (balanced tree) – Maintained in Data Memory, takes up approx. 10B per record + key size – Can be created implicit or explicit • Created whenever DDL is used to create a non-unique index • Implicitly created when DDL defines a Primary Key – Can be suppressed by ‘USING HASH’ • Unique Indexes – Also implicitly creates ordered index. Can be suppressed by ‘USING HASH’ – Non-Unique index is implicitly created except when suppressed ‘USING HASH’© 2011 Oracle Corporation 101
    • Checkpointing and Logs • Global Checkpoint Protocol/Group Commit - GCP – REDO log, synchronized between the Data Nodes. – Writes transactions that have been recorded in the REDO log buffer to disk/REDO log – Frequency controlled by TimebetweenGlobalCheckpoints setting – Default is 2000ms – Size of the REDO log set by NumOfFragmentLogFiles© 2011 Oracle Corporation 102
    • Checkpointing and Logs • Local Checkpoint Protocol - LCP – Flushes the Data Nodes’ data to disk. After 2 LCP the REDO log is cut – Frequency controlled by TimebetweenLocalCheckpoints setting – Specifies the amount of data that can change before flushing to disk – Not a time! Base-2 logarithm of the number of 4-byte words – Ex: Default value of 20 means 4*2^20 = 4MB of data changes, value of 21 = 8MB© 2011 Oracle Corporation 103
    • Checkpointing and Logs • LCP and REDO Log are used to bring back the cluster online – System failure or planned shutdown – 1st Data Nodes are restored using the latest LCP – 2nd the REDO logs are applied until the latest GCP© 2011 Oracle Corporation 104
    • Failure Detection: Heartbeats • Node Failure – Heartbeat • Each Node is responsible for performing periodic heartbeat checks of other nodes – Requests/Response • Node makes request and the response serves as an indicator, i.e., heartbeat • Failed heartbeat/response – The Node detecting the failed Node reports the failure to the rest of the cluster Data Node 1 Data Node 2- Date Nodes are organized in a logical circle- Heartbeat messages are sent to the next Data Node in the circle Data Node 3 Data Node 4© 2011 Oracle Corporation 105
    • Failure Detection: Arbitration • Split brain scenario or network partitioning – Data Nodes lose communication with each other • Cluster splits into two • Who is in charge? • Arbitration – Management Node/Server or any other API node can act as the Arbitrator • Default is the Management server – Decides which part of the split cluster is to be running© 2011 Oracle Corporation 106
    • Failure Detection: Split brain scenario Data Node A Data Node B© 2011 Oracle Corporation 107
    • Failure Detection: Split brain scenario Who is in charge? Data Node A Data Node B A cannot communicate with B B cannot communicate with A Or did the network between A and B go down??© 2011 Oracle Corporation 108
    • Failure Detection: Split brain scenario The Arbitrator decides! Data Node A Data Node B MGM Node At least three Nodes are needed for minimum HA© 2011 Oracle Corporation 109
    • Failure Detection: Majority Rules! • Node 1, 3, 4 • Node 2 – One node from each node group? => Yes – One node from each node group? => No – All nodes from all node groups? => Yes – Shutdown – Continue as cluster. No arbitrator needed. Network split/shutdown Data Node 1 Data Node 2 Data Node 3 Data Node 4© 2011 Oracle Corporation 110
    • Single Node Failure • Data Node 2 fails • Node failure is detected. Fail over to primary fragment F3 to Data Node 1 – Fail over time dependent on underlying OS and configuration (usually sub second) • Restart and recovery of Data Node 2 – Data Node 2 recover F3, F1 from Data Node 1 and rejoins the cluster Node Group 1 Node Group 2 Data Node 1 Data Node 2 Data Node 1 Data Node 3 F1 F3 F2 F4 Data Node 2 Data Node 4 Data Node 3 Data Node 4 F3 F1 F4 F2 Physical View Logical View© 2011 Oracle Corporation 111
    • Node Recovery • Data Node 2 recovers by – Re-joining the heartbeat circle (announcing its return to the cluster) – Copy meta-data (table, index and cluster info) – Recover from most recent local checkpoint, LCP – Copy changed data from the primary node and regain primary status Running Data Node 1 Recovering Data Node 2 Primary Secondary Update, Insert, Delete Read Copied Data Copying As soon as there are two replicas then both are Update, Insert, Delete both of the operation Read Update, Insert, Delete from all existing fragments Read operations read only from one fragment© 2011 Oracle Corporation 112
    • Multi Node Failure • Data Node 1 and 4 fails • Data Node failure is detected. F1 handled by Data Node 2, F4 handled by Data Node 3 – Cluster available as long there is one node running in each node group • Restart and recovery of Data Node 1 and 4 – During recovery, there is degradation in cluster performance Node Group 1 Node Group 2 Data Node 1 Data Node 2 Data Node 1 Data Node 3 F1 F3 F2 F4 Data Node 3 Data Node 4 Data Node 2 Data Node 4 F3 F1 F4 F2 Physical View Logical View© 2011 Oracle Corporation 113
    • Disk Data Tables Table Space • Disk Data Tables Data File Data File – Not enough memory to store all data – Non-Indexed columns are stored on disk Data File • Indexed columns to disk on the roadmap • Tablespace – A disk data based table stores its data in a tablespace – Contains one more data files • Log File Groups – In order to facilitate rollback Log File Group – Undo data in one or more undo log files Undo File Undo File Undo File© 2011 Oracle Corporation 114
    • Disk Data Tables CREATE LOGFILE GROUP lg_1 • Create a log file group ADD UNDOFILE undo_1.dat INITIAL_SIZE 16M – Add one or more undo files UNDO_BUFFER_SIZE 2M • Verify that undo files are created ENGINE NDB; SELECT LOGFILE_GROUP_NAME, LOGFILE_GROUP_NUMBER, – /var/lib/mysql-cluster/ndb_nodeid_fs EXTRA FROM INFORMATION_SCHEMA.FILES WHERE FILE_NAME = undo_1.dat • Create a tablespace CREATE TABLESPACE ts_1 ADD DATAFILE data_1.dat – Associate with a log file group USE LOGFILE GROUP lg_1 INITIAL_SIZE 32M • Verify that data files are created ENGINE NDB; – /var/lib/mysql-cluster/ndb_nodeid_fs SELECT FILE_NAME, LOGFILE_GROUP_NAME, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE TABLESPACE_NAME = ts_1 AND FILE_TYPE = DATAFILE;© 2011 Oracle Corporation 115
    • Disk Data Tables • Create a log file group – Add one or more undo files • Verify that undo files are created – /var/lib/mysql-cluster/ndb_nodeid_fs • Create a tablespace – Associate with a log file group • Verify that data files are created – /var/lib/mysql-cluster/ndb_nodeid_fs • Create a disk data table CREATE TABLE DiskTable_1 ( memberId INT UNSIGNED NOT NULL PRIMARY KEY, lName VARCHAR(50) NOT NULL, fName VARCHAR(50) NOT NULL, dob DATE NOT NULL, joined DATE NOT NULL, INDEX(lName, fName) ) TABLESPACE ts_1 STORAGE DISK ENGINE NDB;© 2011 Oracle Corporation 116
    • MySQL Cluster Geographical Replication© 2011 Oracle Corporation 117
    • Scaling Across Data Centers Geographic Replication with Multi-Master Replication • Synchronous replication within a Cluster node group for HA • Bi-Direction asynchronous replication to remote Cluster for Cluster 1 Cluster 2 geographic redundancy • Master-slave or multi-master • Automated conflict detection and resolution • Asynchronous replication to non- Cluster databases for specialised activities such as report generation InnoDB InnoDB InnoDB • Mix and match replication types Synchronous replication Asynchronous replication© 2011 Oracle Corporation 118
    • Geographical Replication • Geographical Redundancy – Across data centers – Bring data closer to customers • Load Balancing across clusters Master Slave – Master cluster for writes Binlog Relay – Slave cluster for reads Binlog Replication Channel Binlog • Asynchronous Replication – Micro GCP – Slave batching • Various Topologies Data Node 1 Data Node 2 Data Node 1 Data Node 2 – Master - Master – Master - Slave – Ring, hub etc • Conflict resolution Data Node 3 Data Node 4 Data Node 3 Data Node 4 – Multi master, ring© 2011 Oracle Corporation 119
    • MySQL Replication Internals© 2011 Oracle Corporation 120
    • Geographical Replication • A SQL Node/MySQL Server is responsible for the replication – NDB Binlog injector thread Master Slave Binlog Relay – Subscribes to events in MySQL Cluster Binlog Replication Channel Binlog • Injects rows into the Binlog – All changes in the cluster – Row-based replication • Two replication channels Data Node 1 Data Node 2 Data Node 1 Data Node 2 – No single point of failure Data Node 3 Data Node 4 Data Node 3 Data Node 4 Binlog Binlog Replication Channel Relay Binlog Master Slave© 2011 Oracle Corporation 121
    • MySQL Cluster as Master DB VIP Read/Write replication to slaves failover to SQL Node A Master DB SQL Node A SQL Node B Binlog Binlog geo replication to DR site Relay Relay Binlog Binlog secondary geo replication channel primary geo replication channel Data Node 1 Data Node 2© 2011 Oracle Corporation 122
    • Disaster Recovery North South SQL Node (M-S) SQL Node (M-N) Binlog Binlog Relay Relay Primary Replication Channel Binlog Binlog Master DB Master DB Data Node 1 Data Node 2 Data Node 1 Data Node 2 SQL Node (S-S) SQL Node (S-N) Binlog Binlog Relay Relay Secondary Replication Channel Binlog Binlog© 2011 Oracle Corporation 123
    • Geographical Replication Examples • Master for writes • Slaves for reads Master Binlog Replication Channel Replication Channel Slave Slave Binlog Data Node 1 Data Node 2 Binlog Relay Relay Binlog Binlog Data Node 1 Data Node 2 Data Node 1 Data Node 2© 2011 Oracle Corporation 124
    • Geographical Replication Examples • Multi-Master Master Master • Conflict resolution Binlog Relay – timestamp (“value”) Binlog Binlog Replication Channel Data Node 1 Data Node 2 Data Node 1 Data Node 2 Data Node 3 Data Node 4 Data Node 3 Data Node 4© 2011 Oracle Corporation 125
    • Geographical Replication Examples • Migrate MySQL Cluster Data – To any storage engine • To InnoDB tables – CRM Slave Binlog • or MyISAM tables – Web Slave Data Node 1 Data Node 2 Binlog Master Slave/Master Binlog Relay Binlog Binlog Slave Data Node 3 Data Node 4 Binlog Replication Channel Slave Binlog© 2011 Oracle Corporation 126
    • MySQL Cluster Scale Out© 2011 Oracle Corporation 127
    • Cluster Configuration config.ini [NDBD DEFAULT] NoOfReplicas: 2 DataDir: /var/lib/mysql-cluster FileSystemPath: /var/lib/mysql-cluster # Data Memory, Index Memory, and String Memory DataMemory: 600M IndexMemory: 100M MGM Node BackupMemory: 64M [MGM DEFAULT] PortNumber: 1186 DataDir: /var/lib/mysql-cluster [TCP DEFAULT] SendBufferMemory=2M 192.168.100.1 ReceiveBufferMemory=1M Data Node 1 Data Node 2 [NDB_MGMD] HostName: 192.168.100.1 [NDBD] HostName: 192.168.100.2 192.168.100.2 192.168.100.3 [NDBD] HostName: 192.168.100.3 [NDBD] HostName: 192.168.100.4 Data Node 3 Data Node 4 [NDBD] HostName: 192.168.100.5 # # Note: The following can be MySQLD connections or # NDB API application connecting to the cluster # [API] 192.168.100.4 192.168.100.5 [API] [API] [API]© 2011 Oracle Corporation 128
    • Online Add Node • Scale storage capacity - Online – Start with e.g., 2 Data nodes and extend the size of the Cluster over time – Prior versions (<= 6.3) and most other vendors requires downtime to accomplish this • Scale transaction handling capacity – More Data Nodes -> more transactions can be handled – Scale the application layer online by adding more SQL Nodes (has always been online) • Online means no service interruption! • Other features – No extra memory is needed on existing Data Nodes – Range scan/scans are not disturbed – Geo replication is consistent and not affected© 2011 Oracle Corporation 129
    • Online Add Node (1) – add node group authid (PK)‫‏‬ fname lname country 1 Albert Camus France 2 Ernest Hemingway USA Application 3 Johann Goethe Germany 4 Junichiro Tanizaki Japan Node Group New Node Group authid (PK)‫‏‬ fname lname country 1 Albert Camus France 2 Ernest Hemingway USA 3 Johann Goethe Germany 4 Junichiro Tanizaki Japan© 2011 Oracle Corporation 130
    • Online Add Node (2)‫ – ‏‬copy data authid (PK)‫‏‬ fname lname country 1 Albert Camus France 2 Ernest Hemingway USA Application 3 Johann Goethe Germany 4 Junichiro Tanizaki Japan Node Group New Node Group authid (PK)‫‏‬ fname lname country authid (PK)‫‏‬ fname lname country 1 Albert Camus France authid (PK)‫‏‬ fname lname country 1 Albert Camus France 2 Ernest Hemingway USA 2 Ernest Hemingway USA 2 Ernest Hemingway USA 3 Johann Goethe Germany 4 Junichiro Tanizaki Japan 3 Johan Goethe Germany 4 Junichiro Tanizaki Japan 4 Junichiro Tanizaki Japan No extra space needed on existing nodes!© 2011 Oracle Corporation 131
    • Online Add Node (3)‫ – ‏‬switch distribution authid (PK)‫‏‬ fname lname country 1 Albert Camus France 2 Ernest Hemingway USA Application 3 Johan Goethe Germany 4 Junichiro Tanizaki Japan Node Group New Node Group authid (PK)‫‏‬ fname lname country 1 Albert Camus France authid (PK)‫‏‬ fname lname country 2 Ernest Hemingway USA 2 Ernest Hemingway USA 3 Johann Goethe Germany 4 Junichiro Tanizaki Japan 4 Junichiro Tanizaki Japan© 2011 Oracle Corporation 132
    • Online Add Node (4)‫ - ‏‬delete rows authid (PK)‫‏‬ fname lname country 1 Albert Camus France Dynamic scaling of a 2 Ernest Hemingway USA running Cluster – no Application 3 Johan Goethe Germany interruption to service 4 Junichiro Tanizaki Japan Node Group 1 Node Group 2 authid (PK)‫‏‬ fname lname country authid (PK)‫‏‬ fname lname country 1 Albert Camus France 2 Ernest Hemingway USA 3 Johann Goethe Germany 4 Junichiro Tanizaki Japan© 2011 Oracle Corporation 133
    • Online Add Node • Stop (all) ndb_mgmd (MGM Node) • Edit config.ini – Add Y new Data Nodes • Start ndb_mgmd (MGM Node) • For each Data Node in the cluster – Perform a restart, i.e., a rolling restart • For each Application and MySQL Server (API Node/SQL Node) in the cluster – Perform a restart, i.e., a rolling restart • Start the new Data Nodes (with --initial) • For each new Data Node – Do ndb_mgm -e “create nodegroup <nodeidX>, <nodeidY>” • nodeidX and nodeidY are the new Nodes • For each old table – ALTER TABLE <tablename> REORGANIZE PARTITIONS;© 2011 Oracle Corporation 134
    • MySQL Cluster Manager GA 1st November 2010 Mgmt Mgmt Mgmt Mgmt 33 mysqld Node 34 mysqld Node 33 mysqld mysqld Node 34 mysqld mysqld Node Data Data Data Data Data Data 31 Node 32 Node 31 Node 32 Node 35 Node 36 Node • On-line add-node mysql> add hosts --hosts=192.168.0.35,192.168.0.36 mysite; mysql> add package --basedir=/usr/local/mysql_7_0_7 – hosts=192.168.0.35,192.168.0.36 7.0; mysql> add process -- processhosts=mysqld@192.168.0.33,mysqld@192.168.0.34,ndbd@192.1 68.0.35,ndbd@192.168.0.36 mycluster; mysql> start process --added mycluster; • Restart optimizations • Fewer nodes restarted on some parameter changes© 2011 Oracle Corporation 135
    • MySQL Cluster Backup & Restore© 2011 Oracle Corporation 136
    • Backup • Online Backup for the Data only – Schema dump via mysqldump • What to configure – Backup path • Local disk/SAN/NAS – Backup files can then be picked up by external backup managers • Compressed backup – Space optimization/saving • Backup Initiation – Initiated via the Cluster Management Daemon CLI on the SQL Node – Backup snapshot at end of backup or start of backup • Backup command – $ ndb_mgm -e “START BACKUP SNAPSHOTSTART" • Backup files – <backup-path>/<set-of-backup-files>© 2011 Oracle Corporation 137
    • Backup: What files • Backup files per Data Node in <backup-path> – BACKUP-backup_id.node_id.ctl • Control information and metadata – BACKUP-backup_id-0.node_id.data • A data file containing the table records, which are saved on a per-fragment basis • Different nodes save different fragments during the backup – BACKUP-backup_id.node_id.log • Committed transactions that have not made it to the “.data” file – Backup Output: Waiting for completed, this may take several minutes Node 2: Backup 1 started from node 1 Node 2: Backup 1 started from node 1 completed StartGCP: 177 StopGCP: 180 #Records: 7362 #LogRecords: 0 Data: 453648 bytes Log: 0 bytes© 2011 Oracle Corporation 138
    • Restore: Prerequisites • Backup files in <backup-path> for all Data Nodes – BACKUP-backup_id.node_id.ctl • Control information and metadata – BACKUP-backup_id-0.node_id.data • A data file containing the table records, which are saved on a per-fragment basis • Different nodes save different fragments during the backup – BACKUP-backup_id.node_id.log • Committed transactions that have not made it to the “.data” file • A Running Cluster – All components of cluster started and running – Data Nodes (started with “--initial” to indicate a fresh start) – SQL Nodes and Cluster Management Daemons • Cluster running in Single User Mode – shell> ndb_mgm -e “ENTER SINGLE USER MODE 5” • Restore utility: “ndb_restore”© 2011 Oracle Corporation 139
    • Restore Backup: 3 Simple steps • 1st restore meta data/db schema – $ ndb_restore -c 10.0.0.1 -b 4 -m /usr/local/mysql/var/BACKUP/BACKUP-4/ • -m restore only meta data information – Metadata needs to be restored only once for all data nodes. Only need to execute above command once. • 2nd restore data/table records – $ ndb_restore -c 10.0.0.1 -n 2 -b 4 -r /usr/local/mysql/var/BACKUP/BACKUP-4/ – -r restores the data records and applies the logs – “-n” ID of the data node – “-b” backup# • 3rd restore for MySQL Cluster replication slaves/geographical replication – $ ndb_restore -c 10.0.0.1 -b 4 -e /usr/local/mysql/var/BACKUP/BACKUP-4/ – “-e” restores epoch needed for MySQL Cluster replication slave – . The row in mysql.ndb_apply_status with id 0 will be updated/inserted • The 2nd step needs to be executed for each Data Node. Steps 1 and 3 need only be executed once for the entire restore procedure© 2011 Oracle Corporation 140
    • Recommended Hardware© 2011 Oracle Corporation 141
    • The Perfect MySQL Server • 16-24 cores for 5.5 and above • x86_64 - 64 bit for more memory is important - Data/Memory ration 1/3 to 1/10 to good (rule of thumb) - The more the better • Linux or Solaris best, Windows and Unix also fine. • RAID 10 for most, RAID 5 OK if very read intensive • Hardware RAID battery backed up cache critical! - More disks are always better! - 4+ recommended, 8-16 can increase IO performance if needed • At least 2 x NICs for redundancy • Slaves should be as powerful as the Master • Oracle Sun X4170 for example© 2011 Oracle Corporation 142
    • MySQL Cluster Hardware Selection - RAM & CPU • Storage Layer (Data nodes) • One data node can (7.0+) use 8 cores • CPU: 2 x 4 core (Nehalem works really well). Faster CPU → faster processing of messages. • RAM: As much as you need • a 10GB data set will require 20GB of RAM (because of redundancy • Each node will then need 2 x 10 / # of data nodes. (2 data nodes → 10GB of RAM → 16GB RAM is good) • SQL Layer (MySQL Servers) • CPU: 2 – 16 cores • RAM: Not as important – 4GB enough (depends on connections and buffers)© 2011 Oracle Corporation 143
    • MySQL Cluster Hardware Selection Disk Subsystem for Checkpoints & Logs low-end mid-end high-end LCP LCP REDOLOG REDOLOG LCP / REDOLOG 1 x SATA 7200 RPM 1 x SAS 10K RPM 4 x SAS 10K RPM • For read-mostly • Heavy duty (many MB/s) • Heavy duty (many MB/s) • No redundancy • No redundancy • Disk redundancy (RAID1+0) (but other data node is (but other data node is hot swap the mirror) the mirror) • REDO, LCP, BACKUP – written sequentially in small chunks (256KB) • If possible, use Odirect = 1© 2011 Oracle Corporation 144
    • MySQL Cluster Hardware Selection Disk Data Storage Minimal recommended High-end LCP UNDOLOG REDOLOG (REDO LOG) UNDOLOG TABLESPACE 1 TABLESPACE TABLESPACE 2 2 x SAS 10K RPM (preferably) (REDO LOG / UNDO LOG) LCP 4 x SAS 10-15K RPM (preferably) • Use High-end for heavy read / write workloads (1000s of 10KB records per sec) of data (e.g. Content Delivery platforms) • SSD for TABLESPACE an option to consider • Having TABLESPACE on separate disk is good for read performance • Enable WRITE_CACHE on devices© 2011 Oracle Corporation 145
    • MySQL Cluster Hardware Selection Network • Dedicated >= 1Gb/s networking – On Oracle Sun CMT servers it may be necessary to bond 4 or more NICs together because typically many data nodes are on the same physical host. • Prevent network failures (NIC x 2, Bonding, dual switches) • Use dedicated network for cluster communication – Put Data nodes and MySQL Servers on e.g 10.0.1.0 network and let MySQL listen on a “public” interface. • No security layer to management node • Enable port 1186 access only from cluster nodes and administrators© 2011 Oracle Corporation 146
    • Resources to Get Started •MySQL Cluster 7.1, Architecture and New Features • http://www.mysql.com/why-mysql/white-papers/ mysql_wp_cluster7_architecture.php • MySQL Cluster Manager • http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster_manager.php • http://dev.mysql.com/doc/mysql-cluster-manager/1.1/en/ •MySQL Cluster Connector for Java white paper • http://www.mysql.com/why-mysql/white-papers/ mysql_wp_cluster_connector_for_java.php •MySQL Cluster 7.1 Evaluation Guide • http://www.mysql.com/why-mysql/white-papers/mysql_cluster_eval_guide.php •Getting Started with MySQL Cluster • http://www.mysql.com/products/database/cluster/get-started.html •MySQL Cluster on the Web • http://www.mysql.com/products/database/cluster/© 2011 Oracle Corporation 147
    • Resources to Get Started cont. Learn More – GA Release Architecture & New Features Guide www.mysql.com/cluster/ Evaluate MySQL Cluster 7.2 Quick Start Guides Download Today Linux, Solaris, http://dev.mysql.com/ downloads/cluster/ Windows http://tinyurl.com/5wkl4dy http://labs.mysql.com (memcached)© 2011 Oracle Corporation Session Management Webinar: http://tinyurl.com/3dmtf9r 148
    • More Products Releases Than Ever Before Continuous Innovation • MySQL Database 5.6 • MySQL Database 5.5 • MySQL Cluster 7.2 • MySQL Enterprise Backup 3.5 DMR* • MySQL Enterprise Monitor 2.3 and MySQL Labs! • MySQL Cluster Manager 1.1 All GA! • MySQL Workbench 5.2 GA!•MySQL Enterprise Monitor 2.2•MySQL Cluster 7.1• MySQL Cluster Manager 1.0 A Better MySQL All GA! *Development Milestone Release Q2 CY2010 Q3 CY2010 Q4 CY2010 Q1 CY2011 Q2 CY2011© 2011 Oracle Corporation 149
    • © 2011 Oracle Corporation 150
    • © 2011 Oracle Corporation 151
    • Case Studies© 2011 Oracle Corporation 152
    • Shopatron: eCommerce Platform • Applications – Ecommerce back-end, user authentication, order data & fulfilment, payment data & inventory tracking. Supports several thousand queries per second • Key business benefits – Scale quickly and at low cost to meet demand – Self-healing architecture, reducing TCO • Why MySQL? – Low cost scalability – High read and write throughput – Extreme availability“Since deploying MySQL Cluster as our eCommerce database, we have had continuous uptime with linear scalability enabling us to exceed our uptime requirements” — Sean Collier, CIO & COO, Shopatron Inc http://www.mysql.com/why-mysql/case-studies/mysql_cs_shopatron.php© 2011 Oracle Corporation 53 153
    • COMPANY OVERVIEW CUSTOMER PERSPECTIVE “Since deploying our latest AAA platform, the MySQL • UK-based retail and wholesale ISP & Hosting Services environment has delivered continuous uptime, enabling us to • 2010 awards for best home broadband and exceed our most stringent SLAs” customer service -- Geoff Mitchell Network Engineer • Acquired by BT in 2007 CHALLENGES / OPPORTUNITIES • Enter market for wholesale services, demanding more stringent SLAs • Re-architect AAA systems for data integrity & continuous availability to support billing sytems RESULTS • Consolidate data to for ease of reporting and • Continuous system availability, exceeding wholesale operating efficiency SLAs • Fast time to market • 2x faster time to market for new services • Agility and scale by separating database from applications SOLUTIONS • Improved management & infrastructure efficiency • MySQL Cluster through database consolidation • MySQL Server with InnoDB© 2011 Oracle Corporation 154
    • Summary MySQL Cluster • Web-Scale Performance with Carrier-Grade Availability • SQL, JSON/REST, memcached protocol and native Java access No Compromise • Scale-Out, Real Time Performance, 99.999% Uptime Proven • Deployed across telecoms networks • Powering mission-critical web and internet services© 2011 Oracle Corporation 155
    • MySQL Web Reference Architectures • 4 x Reference Architectures – Small – Medium – Large – Extra Large (Social Networking) • 4 x common platform components – User Authentication & Session Management – Content Management – eCommerce – Analytics© 2011 Oracle Corporation 156
    • Reference Architecture Metrics eCommerce Social Network Small Medium Large Large Queries/Second <100 <5,000 10,000+ 25,000+ Transactions/Second <100 <1,000 10,000+ 25,000+ Concurrent <100 <5,000 10,000+ 25,000+ Read Users Concurrent <10 <100 1,000+ 2,500+ Write Users Database Size Sessions <2 GB <10 GB 20+ GB 40+ GB eCommerce <2 GB <10 GB 20+ GB 40+ GB Analytics <10 GB <500 GB 1+ TB 2+ TB Content Management <10 GB <500 GB 1+ TB 2+ TB© 2011 Oracle Corporation 157
    • Best Practices Small Web Reference Architecture • If future scalability is required, start with the Medium Reference Architecture • Complex to tune multiple applications on shared hardware • Use default InnoDB storage engine for all workloads • Default MySQL storage engine • ACID Compliant, Transactional • MVCC & Row-Level Locking • Foreign Keys & constraints • If traffic volumes increase, scale session management first • Migrate Session Management to a dedicated MySQL server© 2011 Oracle Corporation 158
    • Small: Web Reference Architecture MySQL Master • Single server supporting all workloads • Data replicated to slaves for MySQL Enterprise Monitor back-up & analysis MySQL Replication Applications Sizing • Members/Authentication • Queries/Second: max < 2000 Backup Analytics • Transactions/Second: < 200 • eCommerce MySQL • Concurrent Read Users: < 200 Enterprise Backup • Content Management • Search • Concurrent Write Users: < 20 • Database Size: < 20GB Slave 1 Slave 2 Only deploy when future traffic growth is very limited© 2011 Oracle Corporation 159
    • Best Practices (1) Medium Web Reference Architecture • Server ratio: 8 application servers to each MySQL Server – More for PHP applications, less for Java – Add more slaves as the application tier scales • Content Management – Each slave can handle around 3,000 concurrent users – Each master can handle up to 30 slaves – MySQL Replication for high availability • Can include Heartbeat / Clusterware, depending on application failover requirements – Meta-data & indexing of content assets managed by MySQL – File System & physical storage manage content assets© 2011 Oracle Corporation 160
    • Best Practices (2) Medium Web Reference Architecture • Session Management & eCommerce – Both deployed onto InnoDB storage engine – Session data maintained for up to 1 hour in a dedicated partition, rolling partitions used to delete aged data • More users persist session data to provide greater personalization for repeat visitors – Data is captured in Analytics Datababase – Session Management: MySQL Replication with Heartbeat / Clusterware – eCommerce: OVM Template or OS-level HA for eCommerce – If web traffic grows, move Session Management to MySQL Cluster • HA and in-memory data management can reduce need for external HA mechanisms & memcached servers© 2011 Oracle Corporation 161
    • Medium: Web Reference Architecture Session Management eCommerce Content Management Memcache / Application Servers Memcache / Application Servers MySQL Master Heartbeat Mechanism Slave 1 Slave 2 Slave 3 Slave N Analytics MySQL MySQL Master XOR Enterprise Monitor MySQL Enterprise Backup Slave 1 Slave 2 Slave 3© 2011 Oracle Corporation 162
    • Best Practices Large Web Reference Architecture • Builds on best practices of Medium Web Ref Arch • Dedicated infrastructure for each workload, MySQL Replication, Memcached, etc. • Introduces Data Refinery • Aggregate data across the web components • Data cleansing • Builds Data Warehouse Dimensions • Supports higher volume content management and analytics • Introduces MySQL Cluster • Session Management and eCommerce© 2011 Oracle Corporation 163
    • Large: Web Reference Architecture Conceptual View Geographic Replication Session eCommerce Mgmt Session Mgmt eCommerce Data Refinery Data Refinery Content Mgmt Analytics East Coast Data Center West Coast Data Center© 2011 Oracle Corporation 164
    • Large: Web Reference Architecture Session Management eCommerce Data Content Management Refinery Memcache / Application Servers MySQL Servers MySQL Servers MySQL Master Node Group 1 Node Group 2 Node Group 1 Node Group 2 F1 F2 Slave N Node 3 F1 F2 Node 3 Node 3 Node 3 F3 F4 F3 F4 Node 4 Node 4 Node 4 F2 Node 4 F1 F2 F1 F3 F4 F3 F4 Slave 6 Slave 7 Slave 8 Slave 9 Slave 10 MySQL Cluster Data Nodes MySQL Cluster Data Nodes Slave 1 Slave 2 Slave 3 Slave 4 Slave 5 Analytics XOR MySQL MySQL Master Enterprise Monitor Distributed MySQL Enterprise Backup Slave 1 Slave 2 Slave 3 Storage© 2011 Oracle Corporation 165
    • Best Practices Social Networking Reference Architecture • Builds on best practices of Web Ref Archs • MySQL Cluster used for authentication & Look-Up table (Shard Catalog) • Introduces Sharding • Implemented at the application layer for scaling very high volume of writes • Data divided into smaller sets, distributed across low-cost hardware • Shards based on Hash of a single column – ie. User ID • Sharding is complex • Recommend the Architecture and Design Consulting Engagement • Sharding only used in a small percentage of workloads • Most Web 2.0 workloads are still read-intensive, ie record is read before updates applied© 2011 Oracle Corporation 166
    • Large: Social Networks App / Central Databases Memcache Shards Data Refinery Servers Customers MySQL Lookup 1% – 33% Enterprise Monitor MySQL Servers Node Group 1 Node Group 2 F1 F2 Node 3 Node 3 Slave 1 Slave 2 Slave N MySQL F3 F4 Enterprise Backup Customers 34% – 66% Analytics Node 4 Node 4 F1 F2 F3 F4 MySQL Cluster Data Nodes MySQL Master Authentication MySQL Servers Slave 1 Slave 2 Slave N Customers 67% – 100% Node Group 1 Node Group 2 Slave 1 Slave 2 Slave 3 F1 F2 Node 3 Node 3 F3 F4 Node 4 Node 4 F1 F2 F3 F4 MySQL Cluster Data Nodes Slave 1 Slave 2 Slave N© 2011 Oracle Corporation 167