Successfully reported this slideshow.
Your SlideShare is downloading. ×

ClustrixDB: how distributed databases scale out

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 28 Ad

ClustrixDB: how distributed databases scale out

ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.

ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to ClustrixDB: how distributed databases scale out (20)

Advertisement

More from MariaDB Corporation (14)

Recently uploaded (20)

Advertisement

ClustrixDB: how distributed databases scale out

  1. 1. MARIADB Clustrix Overview Robbie Mihalyi Matthew White Peter Friedenbach VP Engineering – Clustrix Director of Engineering Performance Architect
  2. 2. Scaling
  3. 3. Application Scaling  Scales Capacity  Provides Fault Tolerance  Dynamic and Flexible DBMS Server Load Balancer
  4. 4. Clustrix Scaling  Scales Reads & Writes  Provides Fault Tolerance  Dynamic and Flexible Clustrix Load Balancer MaxScale
  5. 5. SCALE-OUT RDBMS: CLUSTRIX ● Built from the ground-up - Shared nothing architecture ● All nodes are equal – any node can accept connections and can Read / Write ● Fine grained data distribution with data protection – built-in fault tolerance ● Distributed parallel query execution
  6. 6. SCALE-OUT RDBMS: CLUSTRIX ● To the application it looks like one Database System ● DDL and DML SQL statements are MySQL/MariaDB compatible ○ MySQL/MariaDB data types (including JSON support) ○ Referential integrity ○ Transactions – ACID compliant ○ Triggers and Stored Procedures ○ Complex Joins
  7. 7. SCALE-OUT RDBMS: CLUSTRIX ● Administration Features and Capabilities ○ On-line Schema changes ○ Replication ○ On-line Backup / Restore ● Multi-zone scale and availability
  8. 8. MARIADB Clustrix Technology Matthew White Director of Engineering - Clustrix MariaDB Corporation
  9. 9. Clustrix Technology DATA DISTRIBUTION
  10. 10. Terminology - REPRESENTATION 26 BASE REPRESENTATION Primary Key ID col1 col2 col3 1 16 36 JANUARY 2 17 35 FEBRUARY 3 18 34 MARCH 4 19 33 APRIL 5 20 32 MAY K1 REP Index (col2) col2 ID 32 5 33 4 34 3 35 2 36 1 K2 REP Index col(3, 1) col3 col1 ID APRIL 19 4 FEBRUARY 17 2 JANUARY 16 1 MARCH 18 3 MAY 20 5
  11. 11. Terminology - SLICES Slicing ● Each representation is split into slices ● The hash of keys mapped to slices ● Distribute representations independently Independent Key Distribution ● Adapts to diverse access patterns ● Allows for broader range query evaluation ● Query plans scale with node count
  12. 12. Terminology - REPLICAS Fault Tolerance ● Each slice has one or more copies ○ One node will never contain two replicas for the same slice ● Replicas created on-line without blocking writes ● Default replicas is 2 ● Can have replicas = number of nodes ○ Great for read-heavy tables ● Support for Availability Zones
  13. 13. ClustrixDB S1 S2 S2 S3 S3 S4 S4 S5 S5 Dynamic Data Distribution ● Tables auto-split into slices ● Every slice has a replica on another node ○ Slices are auto distributed, auto-protected S1 BillionsofRows Database Tables
  14. 14. ClustrixDB S1 S2 S3 S3 S4 S4 S5 Adding Nodes – Flex Up ● Easy and simple Flex Up & Flex Down ○ Single minimal interruption of service ● Data is automatically rebalanced across the cluster ○ Tables are online for reads & writes ● All servers handle writes + reads ○ Workload is spread across more servers after Flex Up ClustrixDB Overview 30 S1 ClustrixDB S2 S5
  15. 15. ClustrixDB S2 S5 S1 S2 S3 S3 S4 S4 S5 Loss of a Node – Automatic Fault Tolerance ● ClustrixDB detects the loss of a node ○ System automatically re-protects ○ Data is automatically redistributed ● Slices lost on the failed node are rapidly re-protected ○ Re-protection while tables are available for reads & writes ● Automated self-healing ○ After re-protect the cluster is fully protected and operational ClustrixDB Overview 31 S1 ClustrixDB S2 S5 S2 S5
  16. 16. Complexity Simplified Q: How do you ensure data stays well distributed in a clustered environment? • Initial Data Distributes the data into even slices across nodes • Failed Nodes Re-protects slices to ensure proper replicas exist • Data Growth Splits large slices into smaller slices • Flex-Up/Flex-Down Moves slices to leverage new nodes and/or evacuate nodes • Skewed Data Re-distributes the data to even out across nodes • Hot Slice Balancing Finds hot slices and balances then across nodes • Availability Zone Support Rebalancer will distribute replicas across zones …while the DB stays open for reads & writesPatent 8,543,538 Patent 8,554,726 A: You let the REBALANCER handle it!
  17. 17. Clustrix Technology QUERY PROCESSING
  18. 18. Query Processing Model SQL-based Applications HW or SW Load Balancer ● Load balancer spreads DB connections to all nodes ● A session is established on any node ● Session controls query execution  Parse  Plan  Compile UPDATE users SET online = 1 WHERE id = 8797; Session VM VM ID: 8797 |…| ONLINE:0 Fragment VM
  19. 19. Query Processing Model SQL-based Applications HW or SW Load Balancer ● Load balancer spreads DB connections to all nodes ● A session is established on any node ● Session controls query execution ○ Parse SQL ○ Generate the execution plan ○ Compile into fragments ○ Start the execution of initial fragment ○ Send fragments ○ Coordinate transaction completion GTM UPDATE users SET online = 1 WHERE id = 8797; Session VM VM ID: 8797 |…| ONLINE:1 ID: 8797 |…| ONLINE:1 Commit Commit Commit VM
  20. 20. Parallel Query Processing SQL-based Applications HW or SW Load Balancer VM VM SELECT SUM(amount) FROM donations;  Parse  Plan  Compile Session VM ● Load balancer spreads DB connections to all nodes ● A session is established on any node ● Session controls query execution ○ Parse SQL ○ Generate the execution plan ○ Compile into fragments ○ Lookup record(s) location ○ Send fragments ○ Coordinate transaction completion ● Aggregate in parallel on all nodes FragmentFragment
  21. 21. Parallel Query Processing SQL-based Applications VM VM Session VM ● Load balancer spreads DB connections to all nodes ● A session is established on any node ● Session controls query execution ○ Parse SQL ○ Generate the execution plan ○ Compile into fragments ○ Lookup record(s) location ○ Send fragments ○ Coordinate transaction completion ● Aggregate in parallel on all nodes AGGREGATE LOCALLY AGGREGATE LOCALLY AGGREGATE LOCALLY AGGREGATE RESULTS SELECT SUM(amount) FROM donations; HW or SW Load Balancer
  22. 22. Clustrix Technology ADMINISTRATION
  23. 23. Table Online Schema Change ● Allows reads & writes during ALTER TABLE operations Table QueueQueueQueue MYTABLE __building_MYTABLE Atomic Flip Reads & Writes ALTER TABLE mytable ADD (foo int);  Queue is created to track changes  Table is copied with new column  Queues are applied to new copy until empty  Atomic flip to new table
  24. 24. Table Online Schema Change ● Allows reads & writes during ALTER TABLE operations Table MYTABLE__retiring_MYTABLE Atomic Flip Reads & Writes ALTER TABLE mytable ADD (foo int);   Queue is created to track changes  Table is copied with new column  Queues are applied to new copy until empty  Atomic flip to new table  Original table is deleted
  25. 25. Backup, Replication & Disaster Recovery Asynchronous multi-point Replication ClustrixDB Parallel Backup Replicate to any cloud, any datacenter, anywhere
  26. 26. ClustrixGUI: Performance Monitoring
  27. 27. MARIADB Clustrix Live Demo Peter Friedenbach Performance Architect MariaDB Corporation
  28. 28. THANK YOU!

Editor's Notes

  • Title Slide for OpenWorks
  • Note that DB in MariaDB is not bolded
  • Note that DB in MariaDB is not bolded
  • Title Slide for OpenWorks
  • Note that DB in MariaDB is not bolded
  • Simple queries
    Fielded by any node
    Routed to data node

    Complex queries
    Split into query fragments
    Process fragments in parallel
  • Simple queries
    Fielded by any node
    Routed to data node

    Complex queries
    Split into query fragments
    Process fragments in parallel
  • Simple queries
    Fielded by any node
    Routed to data node

    Complex queries
    Split into query fragments
    Process fragments in parallel
  • Note that DB in MariaDB is not bolded
  • Note that DB in MariaDB is not bolded
  • Clustrix support MySQL replication both as master and slave – so you can replicate both ways.
    Within a cluster we saw earlier that all data has multiple copies

    For Disaster Recovery (when a whole region loses power) Clustrix has 2 options
    Fast Parallel Backup – This is in addition to slower MySqlDump backup
    Fast Parallel Replication – This is asynchronous across two Clustrix Clusters
  • Title Slide for OpenWorks
  • OpenWorks End Slide

×