Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ndb cluster 80_ycsb_mem

106 views

Published on

NDB Cluster, the World's fastest Key-Value Store

Published in: Software
  • Be the first to comment

Ndb cluster 80_ycsb_mem

  1. 1. Copyright © 2020 Oracle and/or its affiliates. MySQL NDB Cluster 8.0, YCSB In-Memory Benchmark MySQL Cluster Development Mikael Ronström
  2. 2. YCSB Benchmark Yahoo! Cloud Service Benchmark De-facto cloud benchmark Benchmark can not be changed NDB is #1 player in this realm NDB Cluster is the Fastest Distributed, In-memory, Transactional Database in the world!
  3. 3. • NDB Cluster 7.6.10 on YCSB benchmark • 50/50 read/write runs, Workload A, 10 Fields, 1kB Rows, uniform • 12 BM.DenseIO and 30 DM.Standard “old” X5 1.36 with 36 cores, Hyper Threading and 512MB RAM DenseIO have NVMe drives Oracle Linux 7 instances evenly across 3 AD (4 + 10 per domain) Setting
  4. 4. • 1 Data Node per DenseIO instance (52 cores, 8 NVMMe drives) 2 and 4 Data Node clusters in same Availability Domain 8 Data Node clusters split across 2 Availability Domains Node Groups split across Availability Domains • 2 MySQL Server per BM36.Standard instance One MySQL Server per CPU Socket/Numa node (36 CPUs / socket) Per MySQL Servers 1 YCSB clients co-located locked onto same Socket/Numa Node Benchmark setup
  5. 5. Benchmark runs • YCSB with 1kB Rows, 10 Fields with 100 Bytes each • Most runs with 10M and 100M rows due to time constraints Using varying number of threads and clients Testing throughput versus latency • Few runs with 300M and 600M rows • Different setups with Varying number of YCSB/MySQL Server pair counts Data nodes using 8, 16 and 32 data manager (LDM) threads 8 and 16 LDM running on 1 NUMA node NUMA off (memory allocated on local NUMA node) 32 LDM on both sockets, NUMA memory interlaced in these cases
  6. 6. YCSB and MySQL Cluster 7.6 set-up MySQL Server on BM.Standard 2 Server instances per host Data Nodes on DenseIO full duplication of data, 2 replicas strong consistent across both replicas ACID (read committed) YCSB JDBC driver, standard SQL used competitors use ClusterJ-ish NoSQL API unmodified downloaded binaries version 0.15.0, co-located with MySQL Server 1k byte rows, 10 columns (default config), uniform distribution YCSB JDBC YCSB JDBC NUMA0 NUMA1 BM36.Standard instance YCSB JDBC YCSB JDBC NUMA0 NUMA1 BM36.Standard instance … BM.DenseIO instances, 1 data node / instance
  7. 7. Copyright © 2019 Oracle and/or its affiliates. Scaling and Elasticity Scaling number of data nodes YCSB 0.15.0 with JDBC / SQL • 1kB records • Uniform distribution 2, 4 and 8 data nodes • replication factor 2 • strong consistency • ACID (read committed) 8 DenseIO across 2 AD • adding 400us network latency Best throughput and latency on market 1M 2M 3M 4M 2 4 8 (2 ADs) 1.4M 2.8M 3.7M Transactionspersecond Nodes
  8. 8. Copyright © 2019 Oracle and/or its affiliates. Scaling and Elasticity Real-time low latency SQL YCSB 4 data nodes with 300M and 600M rows using JDBC 99% SQL reads < 1ms • 95% < 0.9ms 99% SQL writes < 2ms • 95% < 1.7ms 1M Transactionpersecond 2 ms Same Throughput & Latency 300M rows 600M rows 1.25MTPS 1.25MTPS Reads Reads Writes Writes 1 ms
  9. 9. Scaling load 4 Data Nodes, optimize NDB for Throughput or Latency by adopting load generators Configuration (threads per client) 600M rows 64 threads x 20 clients 600M rows 256 threads x 20 clients 95th %tile Read Latency 0.8 ms 1.8 ms 99th %tile Read Latency 0.9 ms 2.4 ms 95th %tile Update Latency 1.8 ms 3.2 ms 99th %tile Update Latency 1.9 ms 3.9 ms Throughput Ops/s 1.3M 2.9M 1M 2M 3M Transactionpersecond Latency vs Throughput 2 ms 4 ms
  10. 10. Scaling number of rows 4 Data Nodes, number of rows in cluster has no performance impact! Configuration (threads per client) 300M rows 128 threads x 10 clients 600M rows 128 threads x 10 clients 95th %tile Read Latency 0.9 ms 0.9 ms 99th %tile Read Latency 1 ms 1 ms 95th %tile Update Latency 1.7 ms 1.7 ms 99th %tile Update Latency 2 ms 2 ms Throughput Ops/s 1.26M 1.25M 1M 2M 3M Transactionpersecond 2 ms 4 ms Same Throughput & Latency
  11. 11. Number of LDM - 16 LDM versus 32 LDM • Higher number of LDM threads will improve latency and scalability, but it can lower throughput when too many clients are used Load configuration client x threads Low Load / fewer clients 10 x 128 Low load / many clients 20 x 64 High Load 20 x 192 LDMs 16 32 16 32 16 32 Avg Read Latency (ms) 0.6 0.6 0.7 0.6 0.9 0.9 95th %tile Read Latency (ms) 0.9 0.9 1.3 0.8 1.6 1.3 99th %tile Read Latency (ms) 1.1 1.1 1.8 0.9 3.5 1.5 Avg Upd Latency (ms) 1.4 1.4 2.1 1.4 3.8 2.2 95th %tile Upd Latency (ms) 1.7 1.7 3.4 1.8 6.3 2.7 99th %tile Upd Latency (ms) 2 2 4.9 1.9 71 (!) 3.1 Throughput Ops/s 1.26M 1.25M 1.78M 1.3M 1.64M 2.5M
  12. 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Impact of local and remote NUMA memory access • Data Node threads on NUMA node 1 • Memory was allocated • on local node 1 • to remote node 0 • interleaved on both nodes • 20 clients x 128 threads • 100M rows • 120G DataMemory • Run interleaved! Optimum cost/performance ratio. • 10% loss @100% remote memory access • acceptable loss for interleaved memory access (50% / 50% local / remote memory access) • optimal performance @ 100% local access Configuration Memory Node other same interlaced Avg Read Latency (ms) 0.78 0.71 0.76 95th %tile Read Latency (ms) 1.3 1 1.2 99th %tile Read Latency (ms) 1.9 1.3 1.6 Avg Upd Latency (ms) 2.1 1.9 1.9 95th %tile Upd Latency (ms) 3.4 2.5 2.9 99th %tile Upd Latency (ms) 5.6 3.1 4.2 Throughput Ops/s 1.79M 1.99M 1.94M
  13. 13. • During benchmark runs and experiments NDB’s performance demonstrated robustness against many “misconfigurations” in the OS (interrupts, network, NUMA, etc.) • VARCHAR columns are 4x “faster” than BLOBs in this YCSB variant Outlook
  14. 14. Product Nodes TPS/OPS 32 227k 2 275k 3 715k 6 1.6M 8 1.6M 4 2.8M Copyright © 2019 Oracle and/or its affiliates. Scaling and Elasticity. YCSB Benchmark • YCSB : Yahoo Cloud Serving Benchmark • Developed at Yahoo for Cloud Scale workloads • Widely used to compare scale-out databases, NoSQL databases, and (non- durable) in-memory data grids • A series of NoSQL workload types are defined: • Workload A: 50% reads, 50% Updates •The YCSB Client cannot be changed •DB Vendors implement the DB Client interface in Java •The version and exact configuration matters
  15. 15. Conclusions • Number of rows has no impact on performance Performance instead depends on Number of client threads Number of Data Nodes Number of LDM threads per Data Node System can be optimised easily for latency versus throughput • Cluster scales well with number of data nodes splitting node groups across AD impact performance • NDB is the fastest transactional distributed in-memory database in the world

×