MySQL Cluster Carrier Grade EditionAlexander YuPrincipal Sales Consultant | MySQL Asia Pacific & Japan2011-07-20
Agenda / Topics              • Oracle MySQL Strategy              • MySQL Server                Pluggable Storage Engine A...
About MySQL              •   Founded, first release in 1995              •   MySQL Acquired by Sun Microsystems Feb 2008  ...
Oracle’s Strategy:             Complete. Open. Integrated.                            • Built together                    ...
Complete. Open. Integrated.               MySQL Completes The Stack                                • Oracle never settles ...
The “M” in the LAMP Stack                    Operating                     System                                 L       ...
Investment in MySQL              Rapid Innovation           • Make MySQL a Better MySQL             • #1 Open Source Datab...
Oracle + MySQL Customers              • Product Integration                 • Oracle GoldenGate (Complete!)               ...
Serving Key Markets and Industry Leaders            Powering Data Management on the Web & in the Network                  ...
MySQL in Communications                            http://www.mysql.com/industry/communications/resources.html#customer_ca...
MySQL Server            Pluggable Storage Engine Architecture© 2011 Oracle Corporation                           11
Pluggable Storage Engine Architecture           MySQL Server                                                              ...
MySQL Cluster Architecture             Shared-nothing distributed database with no SPOF:                              JDBC...
Workload Qualification InnoDB vs MySQL Cluster       Workload                                                      InnoDB ...
Feature Comparison InnoDB vs MySQL ClusterFeature Qualification                                                    InnoDB ...
Storage Engines Feature                                         MyISAM              NDB      Archive    InnoDB     Memory ...
Why Users Adopt MySQL Cluster                            MySQL Already in UseHigh Read/Write                              ...
Why Users Buy MySQL Cluster CGE                               Standardized on Open SourceBlend of Web &     Deploying Miss...
High Availability Solutions© 2011 Oracle Corporation                   19
Selecting the Right HA Architecture© 2011 Oracle Corporation                           20
Mapping HA Architecture to Applications                                                                                   ...
MySQL High Availability Solutions          9 5. 0 0 0 %      • MySQL Replication          9 9. 0 0 0 %      • MySQL Replic...
MySQL Replication• Native in MySQL• Used for Scalability and HA• Asynchronous as standard• Semi-Synchronous support  added...
Replication Topologies                   Single           Chain            Circular                 Multiple        Multi ...
MySQL Replication                  Read Scalability        Clients                                              MySQL Repl...
MySQL Replication              Failure Detection & Failover  • Linux Heartbeat implements heartbeat protocol between nodes...
Shared Disk Clusters              A/P - A/A                                                      READS/WRITES             ...
Distributed Replicated Block Device • DRBD creates transaction-safe hot standby configuration • MySQL updates written to b...
Sharding aka Application Partitioning      Master                            Clients       Slave      Reads      Writes   ...
Oracle VM Template for MySQL              Integrated & Tested OS, VM and Database Stack                                   ...
Template Components            Certified for Production Deployment                                                        ...
Positioning Current Solutions            Requirement                 MySQL Replication              Heartbeat + DRBD      ...
MySQL Cluster              Real-time Carrier Grade Database© 2011 Oracle Corporation                        33
Customers & Applications              • Web                  –   User profile management                  –   Session stor...
MySQL Cluster - NDB Storage Engine© 2011 Oracle Corporation                          35
MySQL Cluster Architecture             Shared-nothing distributed database with no SPOF:                              JDBC...
MySQL Cluster Nodes        SQL Based Applications                              JDBC/ODBC                     MySQL/       ...
MySQL Cluster Nodes                                        • Standard SQL Interface                    SQL Node           ...
Replication Flexibility                                                                   • Synchronous replication within...
MySQL Cluster Loads         MySQL                    MySQL           MySQL        Community                 Cluster       ...
MySQL Cluster System Requirements               System Component                          Requirement                     ...
MySQL Cluster 6.2© 2011 Oracle Corporation         42
MySQL Cluster 6.3     http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-changes-5-1-ndb-6-3.html© 2011 O...
MySQL Cluster 7.0 –GA April 2009     http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php© 2011 ...
Scale out – multi core environments© 2011 Oracle Corporation                           45
MySQL Cluster vs MySQL MEMORY:              30x Higher Throughput / 1/3rd the Latency on a single node          • Table le...
Scale-Out Reads & Writes on Commodity Hardware                                      • NDB API Performance 4.33 M          ...
MySQL Cluster CGE 7.1 – Key Enhancements    http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php...
MySQL Cluster 7.1 Momentum                                                         1,000 Downloads per Day                ...
MySQL Cluster 7.1: ndbinfo          mysql> use ndbinfo         • New database (ndbinfo) which          mysql> show tables;...
MySQL Cluster 7.1: ndbinfo       • Example 1: Check memory usage/availability                      mysql> select * from nd...
MySQL Cluster 7.1: ndbinfo       • Example 2: Check how many table scans performed on each data node since the last restar...
Latest news on MySQL Cluster 7.1              • As of MySQL Cluster 7.1.9a:                  • InnoDB plugin included     ...
MySQL Enterprise Monitor 2.3© 2011 Oracle Corporation                    54
Online Operations              • Scale the cluster for throughput or capacity                  – Data and SQL Nodes       ...
Real-Time, On-Line Schema Changes                                            CREATE OFFLINE INDEX b ON t1(b);    • Fully o...
Performance I Flexibility I Simplification    • SQL and NoSQL Access Methods to tables         – SQL: complex queries, ric...
Scaling Distributed Joins                                                                                                 ...
Adaptive Query Localization: Current Limitations              • Columns to be joined                  – must use exactly t...
•<Insert Picture Here>                                                     Early Adopter Speaks!“Testing of Adaptive Query...
MySQL Cluster: SQL & NoSQL Combined                                                                      Mix & Match!     ...
Which to Choose ?© 2011 Oracle Corporation         62
Performance© 2011 Oracle Corporation   63
NoSQL With NDB API              Best possible performance                            Clients            • Application embe...
NoSQL with memcached                                                          7.2DM                                       ...
NoSQL with Memcached                                                         7.2DM              Pre-GA version available f...
MySQL Cluster Manager 1.1 Features   Delivered as part of MySQL Cluster CGE 7.1© 2011 Oracle Corporation                  ...
How Does MySQL Cluster Manager Help ?                  Example: Initiating upgrade from MySQL Cluster 6.3 to              ...
Terms used by MySQL Cluster Manager                                                                                       ...
Example configuration     mysql     client                                                                  • MySQL Cluste...
Creating & Starting a Cluster    mysql                                    1.Define the site:    client                    ...
Upgrade Cluster     mysql     client                                                • Upgrade from MySQL Cluster 6.3.26 to...
MySQL Cluster Manager              GA 1st November 2010                       Mgmt                   Mgmt                 ...
General Design Considerations          • MySQL Cluster is designed for               – Short transactions               – ...
Best Practice : Primary Keys          • To avoid problems with               • Cluster 2 Cluster replication              ...
Best Practice: Distribution Aware AppsSELECT SUM(population) FROM townsWHERE country=“UK”;                                ...
Best Practice: Distribution Aware – Multiple Tables        Partition Key          Primary Key        sub_id              a...
MySQL Cluster              Internals© 2011 Oracle Corporation     78
Automatic Data Partitioning                                                  4 Partitions * 2 Replicas = 8 Fragments      ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                       4 Partitions * 2 Replicas = 8 Fragments                 ...
Automatic Data Partitioning                                                  4 Partitions * 2 Replicas = 8 Fragments      ...
Automatic Data Partitioning                                        4 Partitions * 2 Replicas = 8 Fragments                ...
Automatic Data Partitioning                                        4 Partitions * 2 Replicas = 8 Fragments                ...
Automatic Data Partitioning                                        4 Partitions * 2 Replicas = 8 Fragments                ...
Automatic Data Partitioning                                        4 Partitions * 2 Replicas = 8 Fragments                ...
Data Partitioning         • Automatic distribution/partitioning             – Primary Key hash value (partitioning by Key)...
Internal Replication              • Replication between Data Nodes              • Synchronous Replication                 ...
Internal Replication: Prepare Phase                            Data Node              insert into T1 values (...)         ...
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Oracle my sql cluster cge
Upcoming SlideShare
Loading in...5
×

Oracle my sql cluster cge

2,561

Published on

Published in: Technology, Business
1 Comment
5 Likes
Statistics
Notes
  • how can i download this
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,561
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

Oracle my sql cluster cge

  1. 1. MySQL Cluster Carrier Grade EditionAlexander YuPrincipal Sales Consultant | MySQL Asia Pacific & Japan2011-07-20
  2. 2. Agenda / Topics • Oracle MySQL Strategy • MySQL Server Pluggable Storage Engine Architecture • High Availability Solutions • MySQL Cluster Carrier Grade – Internals – Geographical Replication – Scale Out – Backup & Restore •Q&A© 2011 Oracle Corporation 2
  3. 3. About MySQL • Founded, first release in 1995 • MySQL Acquired by Sun Microsystems Feb 2008 • Oracle Acquires Sun Microsystems Jan 2010 • +12M Product Installations • 65K+ Downloads Per Day • Part of the rapidly growing open source LAMP stack Customers across every major operating system, hardware vendor, geography, industry, and application type High Performance ▪ Reliable ▪ Easy to Use© 2011 Oracle Corporation 3
  4. 4. Oracle’s Strategy: Complete. Open. Integrated. • Built together • Tested together • Managed together • Serviced together • Based on open standards • Lower cost • Lower risk • More reliable© 2011 Oracle Corporation 4
  5. 5. Complete. Open. Integrated. MySQL Completes The Stack • Oracle never settles for being second best at any level of the stack • “Complete” means we meet most customer requirements at every level That’s why MySQL matters to Oracle and Oracle customers© 2011 Oracle Corporation 5
  6. 6. The “M” in the LAMP Stack Operating System L Application Server A Database M Scripting PFor© 2011 Oracle Only -- Oracle Confidential & Proprietary Internal Use Corporation 6
  7. 7. Investment in MySQL Rapid Innovation • Make MySQL a Better MySQL • #1 Open Source Database for Web Applications • Most Complete LAMP Stack • Telecom & Embedded • Develop, Promote and Support MySQL • Improve engineering, consulting and support • Leverage 24x7, World-Class Oracle Support • MySQL Community Edition • Source and binary releases • GPL license© 2011 Oracle Corporation 7
  8. 8. Oracle + MySQL Customers • Product Integration • Oracle GoldenGate (Complete!) • Oracle Enterprise Linux + Oracle VM (Complete!) HA Template Available • Oracle Secure Backup (CY 2011) • Oracle Audit Vault (CY 2011) • Oracle Enterprise Manager (CY 2011) • Support • Leverage 24x7, World-Class Oracle Support • MyOracle Support© 2011 Oracle Corporation 8
  9. 9. Serving Key Markets and Industry Leaders Powering Data Management on the Web & in the Network Web OEM / ISV’s SaaS, Hosting Telecommunications Enterprise 2.0© 2011 Oracle Corporation 9
  10. 10. MySQL in Communications http://www.mysql.com/industry/communications/resources.html#customer_case_studies© 2011 Oracle Corporation 10
  11. 11. MySQL Server Pluggable Storage Engine Architecture© 2011 Oracle Corporation 11
  12. 12. Pluggable Storage Engine Architecture MySQL Server Connectors Clients and Apps Native C API, JDBC, ODBC, .Net, PHP, Ruby, Python, VB, Perl Enterprise Management Services and Utilities Connection Pool Backup & Recovery Authentication – Thread Reuse – Connection Limits – Check Memory – Caches Security Replication Cluster Partitioning SQL Interface Parser Caches Optimizer Instance Manager DDL, DML, Stored Query Translation, Global and Engine Access Paths, Statistics Information_Schema Procedures, Views, Object Privileges Specific Caches and MySQL Workbench Triggers, Etc.. Buffers Pluggable Storage Engines Memory, Index and Storage Management InnoDB MyISAM Cluster Etc… Partners Community More.. Filesystems, Files and Logs Redo, Undo, Data, Index, Binary, Error, Query and Slow© 2011 Oracle Corporation 12
  13. 13. MySQL Cluster Architecture Shared-nothing distributed database with no SPOF: JDBC (Java) High Read & Write Performance & 99.999% uptime NDB API (C++) ClusterJ (Java) Clients OpenJPA (Java) PHP/P*/ODBC OpenLDAP MySQL Cluster Application Nodes SQL Nodes ClusterJ MGM Client NDB API (C++) MGM API (C) NDB API NDB APIMGM Node MGM Node MySQL Cluster Data Nodes© 2011 Oracle Corporation 13
  14. 14. Workload Qualification InnoDB vs MySQL Cluster Workload InnoDB MySQL Cluster No. Unless mainly Packaged Applications (i.e. standard business applications) Yes PK access Custom Applications Yes Yes OLTP Applications Yes Yes DSS Applications (i.e. Data Marts, Analytics, etc.) Yes No Content Management Yes Limited Support In-Network Telecoms Applications (HLR, HSS, SDP, etc) No Yes Web Session Management Yes Yes User Profile Management & AAA Yes Yes eCommerce Databases Yes Yes© 2011 Oracle Corporation 14
  15. 15. Feature Comparison InnoDB vs MySQL ClusterFeature Qualification InnoDB MySQL ClusterLatest MySQL 5.5 & InnoDB 1.1 Performance Enhancements Yes NoStorage Limits 64TB 2TB (a)Foreign Keys Yes NoMVCC Non-Blocking Reads Yes NoOptimized for Complex Multi-Table JOINs with Thousands of Accesses Yes No (b)Hash Indexes No YesCompressed Data Yes NoSupport for 8KB+ Row Sizes Yes Only via BLOBs ( c )Built-in Clustering Support for 99.999% HA No YesMinimum Number of Physical Hosts for Redundancy 2 (Active / Passive) 2 + 1 ( A/A & Mgmt) (d)Time to Recovery After Node Failure 30s - hours Sub-SecondReal-Time Performance No YesOption for In-Memory Storage of Tables with Disk Persistence No YesNon-SQL Access Methods to Data (i.e. NDB API) No YesWrite Scalability without Application Partitioning No Yes (e)Max Number of Nodes for Parallel Write Performance 1 48 (f)Conflict Resolution & Detection across Multiple Replication Masters No YesVirtualization Support Yes No© 2011 Oracle Corporation 15
  16. 16. Storage Engines Feature MyISAM NDB Archive InnoDB Memory Storage limits No Yes No 64TB Yes Transactions No Yes No Yes No Locking granularity Table Row Row Row Table MVCC snapshot read No No No Yes No Geospatial support Yes No Yes Yes No Data caches No Yes No Yes NA Index caches Yes Yes No Yes NA Compressed data Yes No Yes No No Storage cost (relative to other engines)‫‏‬ Small Med Small Med NA Memory cost (relative to other engines)‫‏‬ Low High Low High High Bulk insert speed High High Highest Med High Replication support Yes Yes Yes Yes Yes Foreign Key support No No No Yes No Built-in Cluster/High-availability support No Yes No No No Dynamically add and remove storage engines. Change the storage engine on a table with “ALTER TABLE …”© 2011 Oracle Corporation 16
  17. 17. Why Users Adopt MySQL Cluster MySQL Already in UseHigh Read/Write 99.999% MySQLThroughputReal Time Performance Scale-Out, On-Demand© 2011 Oracle Corporation 17
  18. 18. Why Users Buy MySQL Cluster CGE Standardized on Open SourceBlend of Web & Deploying Mission Critical ApplicationsTelecoms Capabilities HA MySQL Management & Monitoring Global 24x7 support Tools Embedding MySQL Cluster Real-Time, High Read/ Write Performance Scale-Out, Shared Nothing © 2011 Oracle Corporation 18
  19. 19. High Availability Solutions© 2011 Oracle Corporation 19
  20. 20. Selecting the Right HA Architecture© 2011 Oracle Corporation 20
  21. 21. Mapping HA Architecture to Applications Shared-Nothing, Data Clustered / Applications Geo-Replicated Replication Virtualized Cluster E-Commerce / Trading Session Management User Authentication / Accounting Feeds, Blogs, Wikis Data Refinery OLTP Data Warehouse/BI Content Management CRM / SCM Collaboration Packaged Software Telco Apps (HLR/HSS/SDP…)© 2011 Oracle Corporation 21
  22. 22. MySQL High Availability Solutions 9 5. 0 0 0 % • MySQL Replication 9 9. 0 0 0 % • MySQL Replication with Clustering Software 9 9. 9 0 0 % • DRBD with Clustering Software 9 9. 9 0 0 % • Shared Storage with Clustering Software (A/P - A/A) 9 9. 9 9 0 % • DRBD and Replication with Clustering Software 9 9. 9 9 0 % • Shared Storage and Replication with Clustering SW 9 9. 9 9 0 % • Shared Storage Replication 9 9. 9 9 0 % • Virtualised Environment 9 9. 9 9 9 % • MySQL Cluster 9 9. 9 9 9 % • MySQL Cluster & Replication 9 9. 9 9 9 % • MySQL Cluster Carrier Grade Edition© 2011 Oracle Corporation 22
  23. 23. MySQL Replication• Native in MySQL• Used for Scalability and HA• Asynchronous as standard• Semi-Synchronous support added in MySQL 5.5• Each slave adds minimal load on master Relay Log© 2011 Oracle Corporation 22
  24. 24. Replication Topologies Single Chain Circular Multiple Multi - Master Multi - Circular© 2011 Oracle Corporation 24
  25. 25. MySQL Replication Read Scalability Clients MySQL Replication Slaves Master • Used by leading web properties for scale-out • Reads are directed to slaves, writes to master • Delivers higher performance & scale with efficient resource utilization© 2011 Oracle Corporation 22
  26. 26. MySQL Replication Failure Detection & Failover • Linux Heartbeat implements heartbeat protocol between nodes • Failover initiated by Cluster Resource Manager (Pacemaker) if heartbeat message is not received • Virtual IP address failed over to ensure failover is transparent to apps© 2011 Oracle Corporation 22
  27. 27. Shared Disk Clusters A/P - A/A READS/WRITES Applications VIP Shared Storage • Reliability • High Availability - Commonly used solution - Data handled by a SAN or NAS and always available • Fault Tolerance - Automatic fail-over - No single point of failure with appropriate hardware • Simplified Management© 2011 Oracle Corporation 27
  28. 28. Distributed Replicated Block Device • DRBD creates transaction-safe hot standby configuration • MySQL updates written to block device on the Active Server • DRBD synchronously replicates updates to the Passive Server • Linux Heartbeat fails over from Active to Passive in event of failure© 2011 Oracle Corporation 28
  29. 29. Sharding aka Application Partitioning Master Clients Slave Reads Writes Partitioning Logic 1 2 3 4 5 Shards Slaves© 2011 Oracle Corporation 29
  30. 30. Oracle VM Template for MySQL Integrated & Tested OS, VM and Database Stack Oracle VM Oracle VM Oracle VMFastest, simplest & most reliable way to deploy virtualized, cloud- ready MySQL instances, certified Oracle VM Oracle VM for production use• Rapid DEPLOYMENT Oracle VM Server Pool• Increased RELIABILITY• Higher AVAILABILITY Oracle VM Servers• Lower COST© 2011 Oracle Corporation 30
  31. 31. Template Components Certified for Production Deployment Oracle VM Oracle VM Automatic Fault Detection & Recovery • Oracle Linux 5 Update 6 with the Unbreakable Enterprise Kernel • Oracle VM 2.2.1 Secure Live Migration (SSL) • Oracle VM Manager 2.1.5 Oracle VM Server Pool • Oracle Cluster File System 2 (OCFS2) Oracle VM Manager • MySQL Database 5.5.10 (Enterprise Edition) Oracle VM Servers Pre-Installed & Pre-Configured ocfs2 Full Integration & QA Testing SAN / iSCSI Single Point of Support© 2011 Oracle Corporation 31
  32. 32. Positioning Current Solutions Requirement MySQL Replication Heartbeat + DRBD Oracle VM Template MySQL Cluster Availability All Supported by MySQL All Supported by MySQL Platform Support Linux Oracle Linux Server Cluster Depends on Connector and Automated IP Failover No Yes Yes Configuration Automated Database No Yes Yes Yes Failover Automatic Data No Yes N/A - Shared Storage Yes Resynchronization Configuration Dependent, 60 Configuration Dependent, 60 Typical Failover Time User / Script Dependent 1 Second and Less seconds and Above seconds and Above No, Asynchronous and Semi- Synchronous Replication Yes N/A - Shared Storage Yes Synchronous Geographic Redundancy Yes Yes, via MySQL Replication Yes, via MySQL Replication Yes, via MySQL Replication Support Scalability One Active (primary), one One Active (primary), one Number of Nodes One Master, Multiple Slaves 255 Passive (secondary) Node Passive (secondary) Node Reads, via MySQL Reads, via MySQL Reads, via MySQL Built-in Load Balancing Yes, Reads and Writes Replication Replication Replication & During Failover Read-Intensive Workloads Yes Yes Yes Yes Yes, via Application-Level Yes, via Application-Level Yes, via Application-Level Write-Intensive Workloads Sharding to Multiple Active/ Sharding to Multiple Active/ Yes, via Auto-Sharding Sharding Passive Pairs Passive Pairs Scale On-Line (add nodes, No No No Yes repartition, etc.)© 2011 Oracle Corporation 32
  33. 33. MySQL Cluster Real-time Carrier Grade Database© 2011 Oracle Corporation 33
  34. 34. Customers & Applications • Web – User profile management – Session stores – eCommerce – On-Line Gaming – Application Servers • Telecoms – Subscriber Databases (HLR/HSS) – Service Delivery Platforms – VoIP, IPTV & VoD – Mobile Content Delivery – On-Line app stores and portals – IP Management – Payment Gateways http://www.mysql.com/industry/telecom/© 2011 Oracle Corporation 34
  35. 35. MySQL Cluster - NDB Storage Engine© 2011 Oracle Corporation 35
  36. 36. MySQL Cluster Architecture Shared-nothing distributed database with no SPOF: JDBC (Java) High Read & Write Performance & 99.999% uptime NDB API (C++) ClusterJ (Java) Clients OpenJPA (Java) PHP/P*/ODBC OpenLDAP MySQL Cluster Application Nodes SQL Nodes ClusterJ MGM Client NDB API (C++) MGM API (C) NDB API NDB APIMGM Node MGM Node MySQL Cluster Data Nodes© 2011 Oracle Corporation 36
  37. 37. MySQL Cluster Nodes SQL Based Applications JDBC/ODBC MySQL/ API API API Node Management SQL Node Node Node Client NDB API Data MySQL Cluster Data MGM API Node Node Management Node NDB API Data Data Node Node© 2011 Oracle Corporation 37
  38. 38. MySQL Cluster Nodes • Standard SQL Interface SQL Node • Scale-out for Performance (MySQL) • Enables Replication • High Performance NDB API • C, C++ & Java, LDAP, HTTP API (Application) • Developer’s Guide • Data Storage (Memory/Disk) Data Node • Automatic & User-Defined Partitioning • Local & Global Checkpoints (NDB Storage Engine) • Scale-out or scale-up for Capacity & Redundancy • Scale dynamically with on-line add node • Administration and Configuration Management • Arbitration • Use Two for Redundancy Node© 2011 Oracle Corporation 38
  39. 39. Replication Flexibility • Synchronous replication within a Cluster node group for HA • Bi-Direction asynchronous Cluster 1 Cluster 2 replication to remote Cluster for geographic redundancy • Asynchronous replication to non- Cluster databases for specialised activities such as report generation • Mix and match replication types MyISAM MyISAM InnoDB Synchronous replication Asynchronous replication© 2011 Oracle Corporation 39
  40. 40. MySQL Cluster Loads MySQL MySQL MySQL Community Cluster Cluster Server (GPL) CGE • MySQL Cluster software (Management & MySQL MySQL MySQL Server ≠ Server = Server Data Nodes) included with MySQL Community Server should not be used InnoDB ≠ InnoDB = InnoDB • MySQL Server included with MySQL ≠ Data Node = Data Node Cluster loads is different to regular ≠ Mgmt Node = Mgmt Node MySQL Server • Always use this special version of MySQL Server when accessing MySQL Cluster data • MySQL Cluster CGE downloaded from oem.mysql.com • GA GPL Community versions downloaded from www.mysql.com/downloads • In-development GPL Community versions downloaded from dev.mysql.com/ downloads/© 2011 Oracle Corporation 40
  41. 41. MySQL Cluster System Requirements System Component Requirement Hosts Maximum of 255 total nodes (48 Data Nodes)‫‏‬ COTS – Advanced TCA Hardware 32 & 64-bit x86 & SPARC Memory Varies on size of database, # of hosts, # of replicas Shared-Nothing - Memory & Disk Data Storage SCSI or RAID for I/O performance Network >1 Gigabit recommended, SCI supported Linux (Red Hat, SuSE), Solaris, HP-UX, Mac OSX, Operating System Windows, others…© 2011 Oracle Corporation 41
  42. 42. MySQL Cluster 6.2© 2011 Oracle Corporation 42
  43. 43. MySQL Cluster 6.3 http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-changes-5-1-ndb-6-3.html© 2011 Oracle Corporation 43
  44. 44. MySQL Cluster 7.0 –GA April 2009 http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php© 2011 Oracle Corporation 44
  45. 45. Scale out – multi core environments© 2011 Oracle Corporation 45
  46. 46. MySQL Cluster vs MySQL MEMORY: 30x Higher Throughput / 1/3rd the Latency on a single node • Table level locking inhibits MEMORY scalability beyond a single client connection • Check-pointing & logging enabled, MySQL Cluster still delivers durability • 4 socket server, 64GB RAM, running Linux© 2011 Oracle Corporation 46
  47. 47. Scale-Out Reads & Writes on Commodity Hardware • NDB API Performance 4.33 M Queries per second! • 8 Intel servers, dual-6-core CPUs @2.93 GHz, 24GB RAM • 2 Data Nodes per server • flexAsync benchmark – 16 parallel threads, each issuing 256 simultaneous transactions – Read / Write 100KB attribute • Interim results from 2 days testing – watch this space: mikaelronstrom.blogspot.com© 2011 Oracle Corporation 47
  48. 48. MySQL Cluster CGE 7.1 – Key Enhancements http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php© 2011 Oracle Corporation 48
  49. 49. MySQL Cluster 7.1 Momentum 1,000 Downloads per Day Windows GA Pro-active Cluster Monitoring Fully Automated “MySQL Cluster 7.1 gave us the Management perfect combination of extreme levels of transaction throughput, low 10x Higher Java latency & carrier-grade availability, Performance while reducing TCO” Phani Naik, Pyro Group© 2011 Oracle Corporation 49
  50. 50. MySQL Cluster 7.1: ndbinfo mysql> use ndbinfo • New database (ndbinfo) which mysql> show tables; presents real-time metric data +-------------------+ in the form of tables | Tables_in_ndbinfo | +-------------------+ • Exposes new information | blocks | together with providing a | config_params | simpler, more consistent way to | counters | access existing data | logbuffers | | logspaces | • Examples include: | memoryusage | • Resource usage (memory, buffers) | nodes | • Event counters (such as number of | resources | READ operations since last restart) | transporters | • Data node status and connection +-------------------+ status© 2011 Oracle Corporation 50
  51. 51. MySQL Cluster 7.1: ndbinfo • Example 1: Check memory usage/availability mysql> select * from ndbinfo.memoryusage; +---------+--------------+--------+------------+-----------+-------------+ | node_id | memory_type | used | used_pages | total | total_pages | +---------+--------------+--------+------------+-----------+-------------+ | 3 | Data memory | 917504 | 28 | 104857600 | 3200 | | 3 | Index memory | 221184 | 27 | 11010048 | 1344 | | 4 | Data memory | 917504 | 28 | 104857600 | 3200 | | 4 | Index memory | 221184 | 27 | 11010048 | 1344 | +---------+--------------+--------+------------+-----------+-------------+ • Note that there is a DATA_MEMORY and INDEX_MEMORY row for each data node in the cluster • If the Cluster is nearing the configured limit then increase the DataMemory and/or IndexMemory parameters in config.ini and then perform a rolling restart© 2011 Oracle Corporation 51
  52. 52. MySQL Cluster 7.1: ndbinfo • Example 2: Check how many table scans performed on each data node since the last restart mysql> select node_id as data node, val as Table Scans from ndbinfo.counters where counter_name=TABLE_SCANS; +-----------+-------------+ | data node | Table Scans | +-----------+-------------+ | 3 | 3 | | 4 | 4 | +-----------+-------------+ • You might check this if your database performance is lower than anticipated • If this figure is rising faster than you expected then examine your application to understand why there are so many table scans© 2011 Oracle Corporation 52
  53. 53. Latest news on MySQL Cluster 7.1 • As of MySQL Cluster 7.1.9a: • InnoDB plugin included • New view in ndbinfo: mysql> SELECT node_id, page_requests_direct_return AS hit, page_requests_wait_io AS miss, 100*page_requests_direct_return/(page_requests_direct_return+page_requests_wait_io) AS hit_rate FROM ndbinfo.diskpagebuffer; +---------+------+------+----------+ | node_id | hit | miss | hit_rate | +---------+------+------+----------+ | 3 | 6 | 3 | 66.6667 | | 4 | 10 | 3 | 76.9231 | +---------+------+------+----------+ • MEM2.3 includes new Cluster Advisor/graphs© 2011 Oracle Corporation 53
  54. 54. MySQL Enterprise Monitor 2.3© 2011 Oracle Corporation 54
  55. 55. Online Operations • Scale the cluster for throughput or capacity – Data and SQL Nodes • Repartition tables • Recover failed nodes • Upgrade / patch servers & OS • Upgrade / patch MySQL Cluster • Back-Up • Evolve the schema on-line, in real-time© 2011 Oracle Corporation 55
  56. 56. Real-Time, On-Line Schema Changes CREATE OFFLINE INDEX b ON t1(b); • Fully online – transaction response Query OK, 1356 rows affected (2.20 sec)‫‏‬ times unchanged • Add and remove indexes, add new columns and tables DROP OFFLINE INDEX b ON t1; • No temporary table creation Query OK, 1356 rows affected (2.03 sec)‫‏‬ • No recreation of data or deletion required CREATE ONLINE INDEX b ON t1(b); • Faster and better performing table Query OK, 0 rows affected (0.58 sec)‫‏‬ maintenance operations • Less memory and disk requirements DROP ONLINE INDEX b ON t1; Query OK, 0 rows affected (0.46 sec)‫‏‬ ALTER ONLINE TABLE t1 ADD COLUMN d INT; Query OK, 0 rows affected (0.36 sec)‫‏‬© 2011 Oracle Corporation 56
  57. 57. Performance I Flexibility I Simplification • SQL and NoSQL Access Methods to tables – SQL: complex queries, rich ecosystem of apps & expertise – Simple Key/Value interfaces bypassing SQL layer for blazing fast reads & writes – Real-time interfaces for micro-second latency – Developers free to work in their preferred environment© 2011 Oracle Corporation 57
  58. 58. Scaling Distributed Joins 7.2DM Adaptive Query Localization • ‘Complex’ joins traditionally slower in MySQL Cluster – Complex = lots of levels and interim results in JOIN • JOIN was implemented in the MySQL Server: – Nested Loop join – When data is needed, it must be fetched over the mysqld network from the Data Nodes; row by row – This causes latency and consumes resources • Can now push the execution down into the data Data Nodes nodes, greatly reducing the network trips AQL • 25x-40x performance gain in customer PoC! mysqld Data Nodes The existence, content and timing of future releases described here is included for information only and may be changed at Oracles discretion.http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html © 2011 Oracle Corporation 58
  59. 59. Adaptive Query Localization: Current Limitations • Columns to be joined – must use exactly the same data type – cannot be any of the BLOB or TEXT types – columns to be joined must be part of a table index or primary key • AQL can be disabled using the ndb_join_pushdown server system variable – enabled by default© 2011 Oracle Corporation 59
  60. 60. •<Insert Picture Here> Early Adopter Speaks!“Testing of Adaptive Query Localization has yielded over 20xhigher performance on complex queries within our application,enabling Docudesk to expand our use of MySQL Cluster into abroader range of highly dynamic web services.”Casey BrownManager, Development & DBA Services, Docudesk© 2011 Oracle Corporation 60
  61. 61. MySQL Cluster: SQL & NoSQL Combined Mix & Match! Same data accessed simultaneously through SQL & NoSQL interfaces• NoSQL – Multiple ways to bypass SQL, and maximize performance: • NDB API. C++ for highest performance, lowest latency • Cluster/J for optimized access in Java • NEW! Memcached. Use all your existing clients/applications© 2011 Oracle Corporation 61
  62. 62. Which to Choose ?© 2011 Oracle Corporation 62
  63. 63. Performance© 2011 Oracle Corporation 63
  64. 64. NoSQL With NDB API Best possible performance Clients • Application embeds the NDB API C++ interface library • NDB API make intelligent decision (where possible) about which data node to send queries to Applications with embedded NDB API Library – With a little planning in the schema design, achieve linear scalability • Used by all of the other application nodes (MySQL, LDAP, ClusterJ,…) • Best possible performance but requires > development skill • Favourite API for real-time network applications • Foundation for all interfaces MySQL Cluster Data Nodes© 2011 Oracle Corporation 64
  65. 65. NoSQL with memcached 7.2DM • Memcached is a distributed memory based hash-key/value store with no persistence to disk Memcached protocol • NoSQL, simple API, popular with developers • MySQL Cluster already provides scalable, in- memory performance with NoSQL (hashed) access as well as persistence • Provide the Memcached API but map to NDB API calls • Writes-in-place, so no need to invalidate cache • Simplifies architecture as caching & database integrated into 1 tier • Access data from existing relational tables© 2011 Oracle Corporation 65
  66. 66. NoSQL with Memcached 7.2DM Pre-GA version available from labs.mysql.com Flexible: Simple: • Deployment options set maidenhead 0 0 3 SL6 • Multiple Clusters STORED • Simultaneous SQL Access • Can still cache in Memcached server get maidenhead • Flat key-value store or map to multiple tables/ VALUE maidenhead 0 3 SL6 columns END© 2011 Oracle Corporation 66
  67. 67. MySQL Cluster Manager 1.1 Features Delivered as part of MySQL Cluster CGE 7.1© 2011 Oracle Corporation 67
  68. 68. How Does MySQL Cluster Manager Help ? Example: Initiating upgrade from MySQL Cluster 6.3 to 7.1 Before MySQL Cluster Manager With MySQL Cluster Manager •1 x preliminary check of cluster state upgrade cluster --package=7.1 mycluster; •8 x ssh commands per server •8 x per-process stop commands •4 x scp of configuration files (2 x mgmd & 2 x Total: 1 Command - mysqld) Unattended Operation •8 x per-process start commands •8 x checks for started and re-joined processes • Results •8 x process completion verifications • Reduces the overhead and complexity of •1 x verify completion of the whole cluster. managing database clusters •Excludes manual editing of each configuration file. • Reduces the risk of downtime resulting from Total: 46 commands - administrator error 2.5 hours of attended operation • Automates best practices in database cluster management© 2011 Oracle Corporation 68
  69. 69. Terms used by MySQL Cluster Manager • Site: the set of physical hosts which are to run Cluster processes to be managed by MySQL Cluster Manager. A site can include 1 or more Site clusters. Host Host Host Host • Cluster: represents a MySQL Cluster deployment. A Cluster contains 1 or more Cluster processes running on 1 or more hosts • Host: Physical machine, running the MySQL Process Process Process Process Process Process Process Cluster Manager agent Cluster • Agent: The MySQL Cluster Manager process running on each host Process Process Process • Process: an individual MySQL Cluster node; one of: ndb_mgmd, ndbd, ndbmtd, mysqld & agent agent agent agent ndbapi* • Package: A copy of a MySQL Cluster installation directory as downloaded from mysql.com, stored on each host *ndbapi is a special case, representing a slot for an external application process to connect to the cluster using the NDB API© 2011 Oracle Corporation 69
  70. 70. Example configuration mysql client • MySQL Cluster Manager agent runs on each physical host 7. mysqld 8. mysqld • No central process for Cluster Manager – 1. ndb_mgmd 2. ndb_mgmd agents co-operate, each one responsible agent agent for its local nodes • Agents are responsible for managing all 192.168.0.10 192.168.0.11 nodes in the cluster 3. ndbd 4. ndbd • Management responsibilities • Starting, stopping & restarting nodes 5. ndbd 6. ndbd • Configuration changes agent agent • Upgrades 192.168.0.12 192.168.0.13 • Host & Node status reporting • Recovering failed nodes n. mysqld MySQL Server (ID=n) n. ndb_mgmd Management Node (ID=n) n. ndbd Data Node (ID=n) agent MySQL Cluster Manager agent© 2011 Oracle Corporation 70
  71. 71. Creating & Starting a Cluster mysql 1.Define the site: client Mysql> create site --hosts=192.168.0.10,192.168.0.11, -> 192.168.0.12,192.168.0.13 mysite; 2.Expand the MySQL Cluster tar-ball(s) from mysql.com to known directory 7. mysqld 8. mysqld 3.Define the package(s): 1. ndb_mgmd 2. ndb_mgmd Mysql> add package --basedir=/usr/local/mysql_6_3_26 6.3; Mysql> add package --basedir=/usr/local/mysql_7_0_7 7.0; agent agent Note that the basedir should match the directory used in Step 2. 192.168.0.10 192.168.0.11 4.Create the Cluster Mysql> create cluster --package=6.3 3. ndbd 4. ndbd -> --processhosts=ndb_mgmd@192.168.0.10,ndb_mgmd@192.168.0.11, -> ndbd@192.168.0.12,ndbd@192.168.0.13, ndbd@192.168.0.12, -> ndbd@192.168.0.13,mysqld@192.168.9.10,mysqld@192.168.9.11 5. ndbd 6. ndbd -> mycluster; agent agent This is where you define what nodes/processes make up the Cluster and where they should run 192.168.0.12 192.168.0.13 5.Start the Cluster: Mysql> start cluster mycluster;© 2011 Oracle Corporation 71
  72. 72. Upgrade Cluster mysql client • Upgrade from MySQL Cluster 6.3.26 to 7.0.7: 7. mysqld 8. mysqld mysql> upgrade cluster --package=7.0 mycluster; 1. ndb_mgmd 2. ndb_mgmd agent agent • Automatically upgrades each node and restarts the process – in the correct order to avoid any loss of service 192.168.0.10 192.168.0.11 • Without MySQL Cluster Manager, the 3. ndbd 4. ndbd administrator must stop each process in turn, start the process with the new version and wait 5. ndbd 6. ndbd for the node to restart before moving onto the agent agent next one 192.168.0.12 192.168.0.13© 2011 Oracle Corporation 72
  73. 73. MySQL Cluster Manager GA 1st November 2010 Mgmt Mgmt Mgmt Mgmt 33 mysqld Node 34 mysqld Node 33 mysqld mysqld Node 34 mysqld mysqld Node Data Data Data Data Data Data 31 Node 32 Node 31 Node 32 Node 35 Node 36 Node • On-line add-node mysql> add hosts --hosts=192.168.0.35,192.168.0.36 mysite; mysql> add package --basedir=/usr/local/mysql_7_0_7 – hosts=192.168.0.35,192.168.0.36 7.0; mysql> add process -- processhosts=mysqld@192.168.0.33,mysqld@192.168.0.34,ndbd@192.1 68.0.35,ndbd@192.168.0.36 mycluster; mysql> start process --added mycluster; • Restart optimizations • Fewer nodes restarted on some parameter changes© 2011 Oracle Corporation 73
  74. 74. General Design Considerations • MySQL Cluster is designed for – Short transactions – Many parallel transactions • Utilize Simple access patterns to fetch data – Use efficient scans and batching interfaces • Analyze what your most typical use cases are – optimize for those Overall design goal Minimize network roundtrips for your most important requests!© 2011 Oracle Corporation 74
  75. 75. Best Practice : Primary Keys • To avoid problems with • Cluster 2 Cluster replication • Recovery • Application behavior (KEY NOT FOUND.. etc) • ALWAYS DEFINE A PRIMARY KEY ON THE TABLE! • A hidden PRIMARY KEY is added if no PK is specified. BUT.. • .. NOT recommended • The hidden primary key is for example not replicated (between Clusters)!! • There are problems in this area, so avoid the problems! • So always, at least have id BIGINT AUTO_INCREMENT PRIMARY KEY • Even if you dont “need” it for you applications© 2011 Oracle Corporation 75
  76. 76. Best Practice: Distribution Aware AppsSELECT SUM(population) FROM townsWHERE country=“UK”; • Partition selected using hash on Partition Key Partition Key • Primary Key by default Primary Key • User can override in table definition town country population Maidenhead UK 78000 • MySQL Server (or NDB API) will Paris France 2193031 attempt to send transaction to the Boston UK 58124 correct data node Boston USA 617594 • If all data for the transaction are in the same partition, less messaging -> fasterSELECT SUM(population) FROM townsWHERE town=“Boston”; • Aim to have all rows for high-running queries in same partition Partition Key Primary Key town country population Maidenhead UK 78000 Paris France 2193031 Boston UK 58124 Boston USA 617594© 2011 Oracle Corporation 76
  77. 77. Best Practice: Distribution Aware – Multiple Tables Partition Key Primary Key sub_id age gender • Extend partition awareness over 19724 25 male multiple tables 84539 43 female • Same rule – aim to have all data for 19724 16 female instance of high running transactions 74574 21 female in the same partition Partition Key Primary Key ALTER TABLE service_ids service sub_id svc_id PARTITION BY KEY(sub_id); twitter 19724 76325732 twitter 84539 67324782 facebook 19724 83753984 facebook 73642 87324793© 2011 Oracle Corporation 77
  78. 78. MySQL Cluster Internals© 2011 Oracle Corporation 78
  79. 79. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4- Node groups are created automatically F4 F2- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica) © 2011 Oracle Corporation 79
  80. 80. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 Px Partition P1 Data Node 2 P2 P3 Data Node 3 P4 Data Node 4 A fragment is a copy of a partition (aka fragment replica) Number of fragments = # of partitions * # of replicas© 2011 Oracle Corporation 80
  81. 81. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 Px Partition P1 Data Node 2 P2 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 81
  82. 82. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 Px Partition P1 Data Node 2 P2 F1 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 82
  83. 83. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 83
  84. 84. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 84
  85. 85. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 P4 Fx Primary Fragment Data Node 4 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 85
  86. 86. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 P4 Fx Primary Fragment Data Node 4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 86
  87. 87. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 P4 Fx Primary Fragment Data Node 4 F4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 87
  88. 88. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Fx Primary Fragment Data Node 4 F4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 88
  89. 89. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Fx Primary Fragment Data Node 4 F4 F2 Fx Secondary Fragment (fragment replica)© 2011 Oracle Corporation 89
  90. 90. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4- Node groups are created automatically F4 F2- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica) © 2011 Oracle Corporation 90
  91. 91. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4As long as one data node in each node group is running we have a complete F4 F2 Fx Secondary Fragment (fragment replica) copy of the data© 2011 Oracle Corporation 91
  92. 92. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4As long as one data node in each node group is running we have a complete F4 F2 Fx Secondary Fragment (fragment replica) copy of the data© 2011 Oracle Corporation 92
  93. 93. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4As long as one data node in each node group F4 F2 Fx Secondary Fragment (fragment replica)is running we have a complete copy of the data© 2011 Oracle Corporation 93
  94. 94. Automatic Data Partitioning 4 Partitions * 2 Replicas = 8 Fragments Table T1 Data Node 1 F1 F3 Px Partition P1 Node Group 1 Data Node 2 P2 F3 F1 P3 Data Node 3 F2 F4 P4 Node Group 2 Fx Primary Fragment Data Node 4 F4 F2- No complete copy of the data Fx Secondary Fragment (fragment replica)- Cluster shutdowns automatically © 2011 Oracle Corporation 94
  95. 95. Data Partitioning • Automatic distribution/partitioning – Primary Key hash value (partitioning by Key) • Transparent load balancing – Distribution awareness • Data Node chosen based on PK hash value – Or proximity (SQL Node - shared memory, localhost, remote host) • Support for user defined partitioning 4 Partitions * 2 Replicas = 8 Fragments • Key Concepts Table T1 Data Node 1 F1 F3 – Partition Px Partition • Horizontal P1 Node Group 1 Data Node 2 • # of partitions = # of data nodes P2 F3 F1 – Fragment P3 • Copy of a partition Data Node 3 F2 F4 – Replica P4 • Complete copy of the data Fx Primary Fragment Data Node 4 Node Group 2 – Node Group - Node groups are created automatically - # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica) F4 F2 • Groups data nodes (automatically) • Determined by the order in configuration file • # of groups = # of data nodes / # of replicas© 2011 Oracle Corporation 95
  96. 96. Internal Replication • Replication between Data Nodes • Synchronous Replication – To ensure minimal failover time – Data Nodes have the same information at the same point in time – Achieved by Two-phase commit protocol • Two-phase commit – 1. Prepare/update phase • All fragments (primary/secondary) gets updated – 2. Commit phase • The changes are committed – Every Data Node has Transaction Coordinator – One is elected to be the transaction coordinator – The information goes from the Transaction Coordinator (TC) to primary fragments and further to secondary fragments© 2011 Oracle Corporation 96
  97. 97. Internal Replication: Prepare Phase Data Node insert into T1 values (...) Data Node 1 Transaction Coordinator Transaction Coordinator 4 2 Local Query Handler 3 Local Query Handler ACC TUP 1. Calc hash on PK ACC TUP 2. Forward request to LQH Index F1 F2 where primary fragment is Index F2 F1 Memory 3. Prepare secondary fragment Memory Data Memory Data Memory 4. Prepare phase done© 2011 Oracle Corporation 97

×