This document summarizes the results of a benchmark comparing the performance of several cloud database systems, including Cassandra, HBase, Sherpa, and MySQL. The benchmark uses a standard workload and measures key metrics like latency and throughput. Overall, Cassandra showed strong write performance but weaker reads. Sherpa delivered good read and write latency as well as high throughput. HBase read latency was poor. Later versions of Cassandra showed performance improvements over earlier versions.
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...confluent
1. The document discusses various architectures for running Kafka in a multi-datacenter environment including running Kafka natively in multiple datacenters, mirroring data between datacenters, and using hierarchical Zookeeper quorums.
2. Key considerations for multi-DC Kafka include replication settings, consumer reconfiguration needs during outages, and handling consumer offsets and processing state across datacenters.
3. Native multi-DC Kafka is preferred but mirroring can be an alternative approach for inter-region traffic when latency is over 30ms or datacenters cannot be combined into a single cluster. Asynchronous mirroring acts differently than a single Kafka cluster and impacts operations.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
MySQL Webinar, presented on the 25th of April, 2024.
Summary:
MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out.
With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration.
Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications.
In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.
In this tutorial, we cover the different deployment possibilities of the MySQL architecture depending on the business requirements for the data. We also deploy some architecture and see how to evolve to the next one.
The tutorial covers the new MySQL Solutions like InnoDB ReplicaSet, InnoDB Cluster, and InnoDB ClusterSet.
MySQL Enterprise Backup - BnR ScenariosKeith Hollman
A quick intro of what MEB is, but then a more hands-on approach to how to backup MySQL, what options are available and then how to restore accordingly.
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesKenny Gryp
MySQL InnoDB Cluster provides a complete high availability solution for MySQL. MySQL Shell includes AdminAPI which enables you to easily configure and administer a group of at least three MySQL server instances to function as an InnoDB cluster.
This talk includes best practices.
MySQL InnoDB Cluster / ReplicaSet - TutorialKenny Gryp
Tutorial on MySQL InnoDB Cluster and ReplicaSet, a fully integrated product built on MySQL technology, by MySQL.
MySQL InnoDB Cluster and ReplicaSet provide failover/high availability and scaling features baked in; providing an integrated end-to-end solution that is easy to use.
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Federico Razzoli
- MySQL 5.7 is no longer supported and will not receive any bugfixes or security updates after October 2023. Users need to upgrade to either MySQL 8.0 or MariaDB 10.11.
- MySQL is developed by Oracle while MariaDB has its own independent foundation. MariaDB aims to be compatible with MySQL but also has unique features like storage engines.
- Both MySQL 8.0 and MariaDB 10.11 are good options to upgrade to. Users should consider each product's unique features and governance model as well as test which one works better for their applications and use cases.
MySQL Database Architectures - High Availability and Disaster Recovery SolutionMiguel Araújo
MySQL InnoDB ClusterSet brings multi-datacenter capabilities to our solutions and makes it very easy to set up a disaster recovery architecture. Think multiple MySQL InnoDB Clusters into one single database architecture, fully managed from MySQL Shell and with full MySQL Router integration to make it easy to access the entire architecture.
This presentation covers the various solutions of MySQL for High Availability, Replication, and Disaster Recovery, with a special focus on InnoDB ClusterSet:
- The various features of InnoDB Clusterset
- How to setup MySQL InnoDB ClusterSet
- Ways to migrate from an existing MySQL InnoDB Cluster into MySQL InnoDB ClusterSet
- How to deal with various failures
- The various features of router integration make the connection to the database architecture easy.
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...confluent
1. The document discusses various architectures for running Kafka in a multi-datacenter environment including running Kafka natively in multiple datacenters, mirroring data between datacenters, and using hierarchical Zookeeper quorums.
2. Key considerations for multi-DC Kafka include replication settings, consumer reconfiguration needs during outages, and handling consumer offsets and processing state across datacenters.
3. Native multi-DC Kafka is preferred but mirroring can be an alternative approach for inter-region traffic when latency is over 30ms or datacenters cannot be combined into a single cluster. Asynchronous mirroring acts differently than a single Kafka cluster and impacts operations.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
MySQL Webinar, presented on the 25th of April, 2024.
Summary:
MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out.
With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration.
Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications.
In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.
In this tutorial, we cover the different deployment possibilities of the MySQL architecture depending on the business requirements for the data. We also deploy some architecture and see how to evolve to the next one.
The tutorial covers the new MySQL Solutions like InnoDB ReplicaSet, InnoDB Cluster, and InnoDB ClusterSet.
MySQL Enterprise Backup - BnR ScenariosKeith Hollman
A quick intro of what MEB is, but then a more hands-on approach to how to backup MySQL, what options are available and then how to restore accordingly.
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesKenny Gryp
MySQL InnoDB Cluster provides a complete high availability solution for MySQL. MySQL Shell includes AdminAPI which enables you to easily configure and administer a group of at least three MySQL server instances to function as an InnoDB cluster.
This talk includes best practices.
MySQL InnoDB Cluster / ReplicaSet - TutorialKenny Gryp
Tutorial on MySQL InnoDB Cluster and ReplicaSet, a fully integrated product built on MySQL technology, by MySQL.
MySQL InnoDB Cluster and ReplicaSet provide failover/high availability and scaling features baked in; providing an integrated end-to-end solution that is easy to use.
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Federico Razzoli
- MySQL 5.7 is no longer supported and will not receive any bugfixes or security updates after October 2023. Users need to upgrade to either MySQL 8.0 or MariaDB 10.11.
- MySQL is developed by Oracle while MariaDB has its own independent foundation. MariaDB aims to be compatible with MySQL but also has unique features like storage engines.
- Both MySQL 8.0 and MariaDB 10.11 are good options to upgrade to. Users should consider each product's unique features and governance model as well as test which one works better for their applications and use cases.
MySQL Database Architectures - High Availability and Disaster Recovery SolutionMiguel Araújo
MySQL InnoDB ClusterSet brings multi-datacenter capabilities to our solutions and makes it very easy to set up a disaster recovery architecture. Think multiple MySQL InnoDB Clusters into one single database architecture, fully managed from MySQL Shell and with full MySQL Router integration to make it easy to access the entire architecture.
This presentation covers the various solutions of MySQL for High Availability, Replication, and Disaster Recovery, with a special focus on InnoDB ClusterSet:
- The various features of InnoDB Clusterset
- How to setup MySQL InnoDB ClusterSet
- Ways to migrate from an existing MySQL InnoDB Cluster into MySQL InnoDB ClusterSet
- How to deal with various failures
- The various features of router integration make the connection to the database architecture easy.
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11Kenny Gryp
Oracle's MySQL solutions make it easy to setup various database architectures and achieve high availability with the introduction MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet meeting various high availability requirements. MySQL InnoDB ClusterSet provides a popular disaster recovery solution.
Completely built in-house and supported by Oracle, many enterprises large and small have adopted these solutions into business critical applications.
In this presentation the various database architecture solutions for high availability and disaster recovery will be covered and help you choose the right solutions based on your business requirements.
ROLE INTERNAL IP
mysql1 master / app 192.168.56.11
mysql2 replica 192.168.56.12
mysql3 n/a 192.168.56.13
The document outlines steps to migrate an asynchronous MySQL replication setup to a MySQL InnoDB Cluster configuration. It describes cloning data from mysql2 to mysql3, creating an InnoDB Cluster with mysql3, configuring asynchronous replication from mysql1 to mysql3, adding mysql2 to the cluster, and bootstrapping a MySQL Router.
Since the introduction of replication in MySQL, users have been trying to automate the promotion of a replica to a primary as well as automating the failover of TCP connections from one database server to another in the event of a database failure: planned or unplanned. For over a decade, users and organizations have designed various types of solutions to achieve this. Though, many of these solutions were done manually or were using third party software, mostly open source, to automate and integrate various architectures.
For more than 5 years now, MySQL offers complete and very easy-to-use solutions to set up database architectures that provide High-Availability and recently added Disaster Recovery capabilities. Completely built in-house and supported by Oracle, many enterprises large and small have adopted these solutions into business-critical applications.
Business requirements dictate what type of database architecture is required for your system. Disaster tolerance is key and can be measured at different levels: data loss, data availability, and uptime. In this session, the various MySQL Database Architecture solutions will be covered to help you choose the right solution based on your business requirements
MySQL InnoDB Cluster - Advanced Configuration & OperationsFrederic Descamps
The document discusses various methods for provisioning and monitoring new members joining a MySQL InnoDB cluster. It describes the incremental recovery and clone-based provisioning processes. It provides guidance on forcing the use of clone over incremental recovery for both provisioning and recovery scenarios. The document also discusses using MySQL Shell commands and Performance Schema tables to monitor the provisioning and recovery processes, as well as the overall health and performance of the cluster.
Oracle ACFS High Availability NFS Services (HANFS)Anju Garg
Oracle ACFS High Availability NFS Services (HANFS) allows Oracle ACFS clusters to configure highly available NFS servers. HANFS exposes NFS exports through Highly Available VIPs (HAVIPs) so that if a node hosting an export fails, the HAVIP and corresponding export will fail over to another node, providing uninterrupted NFS service. The document discusses configuring HANFS resources including ACFS file systems, HAVIPs, and ExportFS resources and verifying access to an exported file system from an NFS client.
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
This document provides an overview and comparison of MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet. It discusses the components, goals, and features of each solution. MySQL InnoDB Cluster uses Group Replication to provide high availability, automatic failover, and data consistency. MySQL InnoDB ReplicaSet uses asynchronous replication and provides availability and read scaling through manual primary/secondary configuration and failover. Both solutions integrate MySQL Shell, Router, and automatic member provisioning for easy management.
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
MySQL Clustering over InnoDB engines has grown a lot over the last decade. Galera began working with InnoDB early and then Group Replication came to the environment later, where the features are now rich and robust. This presentation offers a technical comparison of both of them.
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle DatabaseSandesh Rao
AHF Insights provides a unified reporting and correlation tool that collects diagnostic data from an AHF stack and generates an offline report. It captures information on system topology, insights like events and issues, and allows drilling down into specific components for root cause analysis. The report includes sections on the cluster summary, resources, databases, metrics, best practices, system changes, software recommendations, and detailed parameters for databases and kernels. It provides visualizations of events and metrics over time and detects operating system issues and configuration problems.
The document discusses two MySQL high availability solutions: MySQL InnoDB Cluster and MySQL NDB Cluster. MySQL InnoDB Cluster provides easy high availability built into MySQL with write consistency, read scalability, and application failover using MySQL Router. MySQL NDB Cluster is an in-memory database that provides automatic sharding, native access via several APIs, read/write consistency, and read/write scalability using the NDB storage engine. The document compares the two solutions and discusses their architectures and key features.
MySQL Group Replication: Handling Network Glitches - Best PracticesFrederic Descamps
The document discusses best practices for handling network glitches in group replication. It recommends checking replication status using Performance Schema and MySQL Shell to diagnose issues. It also suggests adapting group replication settings to faulty networks by increasing timeouts to avoid expels. These adaptations include increasing write concurrency and transaction size limits to handle higher latencies. The document also recommends configuring rejoin attempts and quorum timeout to deal with failures and prevent unstable members from interfering.
Take advantage of ScyllaDB’s wide column NoSQL features such as workload prioritization to balance the needs of OLTP and OLAP in the same cluster. Plus learn about the different compaction strategies and which one would be right for your workload. With additional insights on properly sizing your database and using open source tools for observability.
Up to MySQL 5.5, replication was not crash safe: after an unclean shutdown, it would fail with “duplicate key” or “row not found” error, or might generate silent data corruption. It looks like 5.6 is much better, right ? The short answer is maybe: in the simplest case, it is possible to achieve replication crash safety, but it is not the default setting. MySQL 5.7 is not much better, 8.0 has better defaults, but it is still not replication crash-safe by default, and it is still easy to get things wrong.
Crash safety is impacted by replication positioning (File+Position or GTID), type (single-threaded or MTS), MTS settings (Database or Logical Clock, and with or without slave preserve commit order), the sync-ing of relay logs, the presence of binary logs, log-slave-updates and the sync-ing of binary logs. This is very complicated stuff and even the manual is sometimes confused about it.
In this talk, I will explain the impact of the above and help you find the path to crash safety nirvana. I will also give details about replication internals, so you might learn a thing or two.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
Accelerating Virtual Machine Access with the Storage Performance Development ...Michelle Holley
Abstract: Although new non-volatile media inherently offers very low latency, remote access
using protocols such as NVMe-oF and presenting the data to VMs via virtualized interfaces such as virtio
adds considerable software overhead. One way to reduce the overhead is to use the Storage
Performance Development Kit (SPDK), an open-source software project that provides building blocks for
scalable and efficient storage applications with breakthrough performance. Comparing the software
paths for virtualizing block storage I/O illustrates the advantages of the SPDK-based approach. Empirical
data shows that using SPDK can improve CPU efficiency by up to 10 x and reduce latency up to 50% over
existing methods. Future enhancements for SPDK will make its advantages even greater.
Speaker Bio: Anu Rao is Product line manager for storage software in Data center Group. She helps
customer ease into and adopt open source Storage software like Storage Performance Development Kit
(SPDK) and Intelligent Software Acceleration-Library (ISA-L).
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
Todd Lipcon presents a solution to avoid full garbage collections (GCs) in HBase by using MemStore-Local Allocation Buffers (MSLABs). The document outlines that write operations in HBase can cause fragmentation in the old generation heap, leading to long GC pauses. MSLABs address this by allocating each MemStore's data into contiguous 2MB chunks, eliminating fragmentation. When MemStores flush, the freed chunks are large and contiguous. With MSLABs enabled, the author saw basically zero full GCs during load testing. MSLABs improve performance and stability by preventing GC pauses caused by fragmentation.
This document provides guidance on using Oracle's Exadata Cloud Service (ExaCS) or Exadata Cloud at Customer (ExaCC) to set up disaster recovery for an on-premises database using Oracle Data Guard or Active Data Guard. It outlines the key benefits of a hybrid cloud/on-premises configuration and provides a 10-step process for implementing this along with considerations for security, networking, and ongoing management after deployment. The document is intended to help technical audiences set up a cloud-based standby database for disaster recovery that follows Oracle Maximum Availability Architecture best practices.
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility for three different roles: Development, DBA, and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge gained by MySQL DBAs after years or focusing on a single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show a minimal but most effective set of options to improve MySQL performance. For illustrations, I will use real user stories gained from my Support experience and Percona Kubernetes operators for PXC and MySQL.
YCSB++ is a benchmarking tool that provides extensions to Yahoo!'s Cloud Serving Benchmark (YCSB) to test advanced features of scalable table stores. It allows for distributed, coordinated testing across client nodes using ZooKeeper. It also enables fine-grained, correlated monitoring of systems using the OTUS monitor. The tool is useful for understanding performance problems and debugging complex interactions between components in table stores. Two illustrative examples show how YCSB++ can analyze the tradeoff between fast inserts and weak consistency using batch writing, as well as benchmark features for high-speed ingest like bulk loading and table pre-splitting.
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyDataStax Academy
This document discusses strategies for avoiding read-modify-write operations in Cassandra databases. It presents several Cassandra features that allow updating data without explicit read-modify-writes, such as overwriting rows, using collections, and lightweight transactions. It also covers data modeling techniques like journaling, content-addressable storage, and modeling time-series data. The document concludes that Cassandra is well-suited for write-heavy workloads and provides tools to safely perform read-modify-writes when necessary.
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11Kenny Gryp
Oracle's MySQL solutions make it easy to setup various database architectures and achieve high availability with the introduction MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet meeting various high availability requirements. MySQL InnoDB ClusterSet provides a popular disaster recovery solution.
Completely built in-house and supported by Oracle, many enterprises large and small have adopted these solutions into business critical applications.
In this presentation the various database architecture solutions for high availability and disaster recovery will be covered and help you choose the right solutions based on your business requirements.
ROLE INTERNAL IP
mysql1 master / app 192.168.56.11
mysql2 replica 192.168.56.12
mysql3 n/a 192.168.56.13
The document outlines steps to migrate an asynchronous MySQL replication setup to a MySQL InnoDB Cluster configuration. It describes cloning data from mysql2 to mysql3, creating an InnoDB Cluster with mysql3, configuring asynchronous replication from mysql1 to mysql3, adding mysql2 to the cluster, and bootstrapping a MySQL Router.
Since the introduction of replication in MySQL, users have been trying to automate the promotion of a replica to a primary as well as automating the failover of TCP connections from one database server to another in the event of a database failure: planned or unplanned. For over a decade, users and organizations have designed various types of solutions to achieve this. Though, many of these solutions were done manually or were using third party software, mostly open source, to automate and integrate various architectures.
For more than 5 years now, MySQL offers complete and very easy-to-use solutions to set up database architectures that provide High-Availability and recently added Disaster Recovery capabilities. Completely built in-house and supported by Oracle, many enterprises large and small have adopted these solutions into business-critical applications.
Business requirements dictate what type of database architecture is required for your system. Disaster tolerance is key and can be measured at different levels: data loss, data availability, and uptime. In this session, the various MySQL Database Architecture solutions will be covered to help you choose the right solution based on your business requirements
MySQL InnoDB Cluster - Advanced Configuration & OperationsFrederic Descamps
The document discusses various methods for provisioning and monitoring new members joining a MySQL InnoDB cluster. It describes the incremental recovery and clone-based provisioning processes. It provides guidance on forcing the use of clone over incremental recovery for both provisioning and recovery scenarios. The document also discusses using MySQL Shell commands and Performance Schema tables to monitor the provisioning and recovery processes, as well as the overall health and performance of the cluster.
Oracle ACFS High Availability NFS Services (HANFS)Anju Garg
Oracle ACFS High Availability NFS Services (HANFS) allows Oracle ACFS clusters to configure highly available NFS servers. HANFS exposes NFS exports through Highly Available VIPs (HAVIPs) so that if a node hosting an export fails, the HAVIP and corresponding export will fail over to another node, providing uninterrupted NFS service. The document discusses configuring HANFS resources including ACFS file systems, HAVIPs, and ExportFS resources and verifying access to an exported file system from an NFS client.
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
This document provides an overview and comparison of MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet. It discusses the components, goals, and features of each solution. MySQL InnoDB Cluster uses Group Replication to provide high availability, automatic failover, and data consistency. MySQL InnoDB ReplicaSet uses asynchronous replication and provides availability and read scaling through manual primary/secondary configuration and failover. Both solutions integrate MySQL Shell, Router, and automatic member provisioning for easy management.
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
MySQL Clustering over InnoDB engines has grown a lot over the last decade. Galera began working with InnoDB early and then Group Replication came to the environment later, where the features are now rich and robust. This presentation offers a technical comparison of both of them.
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle DatabaseSandesh Rao
AHF Insights provides a unified reporting and correlation tool that collects diagnostic data from an AHF stack and generates an offline report. It captures information on system topology, insights like events and issues, and allows drilling down into specific components for root cause analysis. The report includes sections on the cluster summary, resources, databases, metrics, best practices, system changes, software recommendations, and detailed parameters for databases and kernels. It provides visualizations of events and metrics over time and detects operating system issues and configuration problems.
The document discusses two MySQL high availability solutions: MySQL InnoDB Cluster and MySQL NDB Cluster. MySQL InnoDB Cluster provides easy high availability built into MySQL with write consistency, read scalability, and application failover using MySQL Router. MySQL NDB Cluster is an in-memory database that provides automatic sharding, native access via several APIs, read/write consistency, and read/write scalability using the NDB storage engine. The document compares the two solutions and discusses their architectures and key features.
MySQL Group Replication: Handling Network Glitches - Best PracticesFrederic Descamps
The document discusses best practices for handling network glitches in group replication. It recommends checking replication status using Performance Schema and MySQL Shell to diagnose issues. It also suggests adapting group replication settings to faulty networks by increasing timeouts to avoid expels. These adaptations include increasing write concurrency and transaction size limits to handle higher latencies. The document also recommends configuring rejoin attempts and quorum timeout to deal with failures and prevent unstable members from interfering.
Take advantage of ScyllaDB’s wide column NoSQL features such as workload prioritization to balance the needs of OLTP and OLAP in the same cluster. Plus learn about the different compaction strategies and which one would be right for your workload. With additional insights on properly sizing your database and using open source tools for observability.
Up to MySQL 5.5, replication was not crash safe: after an unclean shutdown, it would fail with “duplicate key” or “row not found” error, or might generate silent data corruption. It looks like 5.6 is much better, right ? The short answer is maybe: in the simplest case, it is possible to achieve replication crash safety, but it is not the default setting. MySQL 5.7 is not much better, 8.0 has better defaults, but it is still not replication crash-safe by default, and it is still easy to get things wrong.
Crash safety is impacted by replication positioning (File+Position or GTID), type (single-threaded or MTS), MTS settings (Database or Logical Clock, and with or without slave preserve commit order), the sync-ing of relay logs, the presence of binary logs, log-slave-updates and the sync-ing of binary logs. This is very complicated stuff and even the manual is sometimes confused about it.
In this talk, I will explain the impact of the above and help you find the path to crash safety nirvana. I will also give details about replication internals, so you might learn a thing or two.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
Accelerating Virtual Machine Access with the Storage Performance Development ...Michelle Holley
Abstract: Although new non-volatile media inherently offers very low latency, remote access
using protocols such as NVMe-oF and presenting the data to VMs via virtualized interfaces such as virtio
adds considerable software overhead. One way to reduce the overhead is to use the Storage
Performance Development Kit (SPDK), an open-source software project that provides building blocks for
scalable and efficient storage applications with breakthrough performance. Comparing the software
paths for virtualizing block storage I/O illustrates the advantages of the SPDK-based approach. Empirical
data shows that using SPDK can improve CPU efficiency by up to 10 x and reduce latency up to 50% over
existing methods. Future enhancements for SPDK will make its advantages even greater.
Speaker Bio: Anu Rao is Product line manager for storage software in Data center Group. She helps
customer ease into and adopt open source Storage software like Storage Performance Development Kit
(SPDK) and Intelligent Software Acceleration-Library (ISA-L).
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
Todd Lipcon presents a solution to avoid full garbage collections (GCs) in HBase by using MemStore-Local Allocation Buffers (MSLABs). The document outlines that write operations in HBase can cause fragmentation in the old generation heap, leading to long GC pauses. MSLABs address this by allocating each MemStore's data into contiguous 2MB chunks, eliminating fragmentation. When MemStores flush, the freed chunks are large and contiguous. With MSLABs enabled, the author saw basically zero full GCs during load testing. MSLABs improve performance and stability by preventing GC pauses caused by fragmentation.
This document provides guidance on using Oracle's Exadata Cloud Service (ExaCS) or Exadata Cloud at Customer (ExaCC) to set up disaster recovery for an on-premises database using Oracle Data Guard or Active Data Guard. It outlines the key benefits of a hybrid cloud/on-premises configuration and provides a 10-step process for implementing this along with considerations for security, networking, and ongoing management after deployment. The document is intended to help technical audiences set up a cloud-based standby database for disaster recovery that follows Oracle Maximum Availability Architecture best practices.
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility for three different roles: Development, DBA, and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge gained by MySQL DBAs after years or focusing on a single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show a minimal but most effective set of options to improve MySQL performance. For illustrations, I will use real user stories gained from my Support experience and Percona Kubernetes operators for PXC and MySQL.
YCSB++ is a benchmarking tool that provides extensions to Yahoo!'s Cloud Serving Benchmark (YCSB) to test advanced features of scalable table stores. It allows for distributed, coordinated testing across client nodes using ZooKeeper. It also enables fine-grained, correlated monitoring of systems using the OTUS monitor. The tool is useful for understanding performance problems and debugging complex interactions between components in table stores. Two illustrative examples show how YCSB++ can analyze the tradeoff between fast inserts and weak consistency using batch writing, as well as benchmark features for high-speed ingest like bulk loading and table pre-splitting.
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyDataStax Academy
This document discusses strategies for avoiding read-modify-write operations in Cassandra databases. It presents several Cassandra features that allow updating data without explicit read-modify-writes, such as overwriting rows, using collections, and lightweight transactions. It also covers data modeling techniques like journaling, content-addressable storage, and modeling time-series data. The document concludes that Cassandra is well-suited for write-heavy workloads and provides tools to safely perform read-modify-writes when necessary.
Cassandra is a highly scalable distributed masterless NoSQL database. It values availability and partition tolerance over consistency. All nodes are equal and resilient. Cassandra uses consistent hashing to spread data uniformly and writes data immutably through log structured storage for easy scaling. It is not ACID compliant but offers tunable consistency levels. Compaction cleans up data and repair ensures cluster synchronization. CQL provides a SQL-like interface but Cassandra differs significantly from relational databases.
Преимущества NoSQL баз данных на примере MongoDBUNETA
Докладчик: Винников Олег – .NET Developer in Digital Cloud Technologies (https://twitter.com/#!/VinnikovOleg)
Тема доклада: «Преимущества NoSQL баз данных на примере MongoDB».
Доклад посвящен альтернативе реляционных СУБД - классу концепций NoSQL. Вы узнаете о основных видах NoSQL баз данных, их отличие и преимущества перед реляционными базами данных. Как основное преимущество, в докладе будет рассмотренно масштабирование NoSQL баз данных на примере MongoDB. Ключевые вопросы, которые будут рассмотрены в докладе:
- Почему NoSql;
- Краткий обзор видов NoSql баз данных;
- Масштабирование NoSql баз данных;
- Шардинг и репликация на примере MongoDB;
http://uneta.ua/community/events/9
This document provides an overview of distributed databases and the Yahoo! Cloud Serving Benchmark (YCSB). It discusses NoSQL databases Cassandra and HBase and how YCSB can be used to benchmark their performance. Experiments were conducted on Amazon EC2 using YCSB to load data and run workloads on Cassandra and HBase clusters. The results showed Cassandra had lower latency and higher throughput than HBase. YCSB provides a way to compare the performance of different databases.
This tutorial was held at IEEE BigData '14 on October 29, 2014 in Bethesda, ML, USA.
Presenters: Chaitan Baru and Tilmann Rabl
More information available at:
http://msrg.org/papers/BigData14-Rabl
Summary:
This tutorial will introduce the audience to the broad set of issues involved in defining big data benchmarks, for creating auditable industry-standard benchmarks that consider performance as well as price/performance. Big data benchmarks must capture the essential characteristics of big data applications and systems, including heterogeneous data, e.g. structured, semi- structured, unstructured, graphs, and streams; large-scale and evolving system configurations; varying system loads; processing pipelines that progressively transform data; workloads that include queries as well as data mining and machine learning operations and algorithms. Different benchmarking approaches will be introduced, from micro-benchmarks to application- level benchmarking.
Since May 2012, five workshops have been held on Big Data Benchmarking including participation from industry and academia. One of the outcomes of these meetings has been the creation of industry’s first big data benchmark, viz., TPCx-HS, the Transaction Processing Performance Council’s benchmark for Hadoop Systems. During these workshops, a number of other proposals have been put forward for more comprehensive big data benchmarking. The tutorial will present and discuss salient points and essential features of such benchmarks that have been identified in these meetings, by experts in big data as well as benchmarking. Two key approaches are now being pursued—one, called BigBench, is based on extending the TPC- Decision Support (TPC-DS) benchmark with big data applications characteristics. The other called Deep Analytics Pipeline, is based on modeling processing that is routinely encountered in real-life big data applications. Both will be discussed.
We conclude with a discussion of a number of future directions for big data benchmarking
This document compares Cassandra and Redis for use as a backend for a Facebook game with 1 million daily users and 10 million total users. Redis was chosen over Cassandra due to its simpler architecture, higher write throughput, and ability to meet the capacity and performance requirements using a single node. The Redis master handled all reads and writes, with a slave for failover. User data was stored in Redis hashes to turn it into a "document DB" and allow for atomic operations on parts of the data.
Covers different types of big data benchmarking, different suites, details into terasort, demo with TPCx-HS
Meetup Details of presentation:
http://www.meetup.com/lspe-in/events/203918952/
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...Amazon Web Services
The way humans interact with machines is at a turning point, and conversational artificial intelligence (AI) is at the center of the transformation. Learn how Amazon is using machine learning and cloud computing to fuel innovation in AI, making Amazon Alexa smarter every day. Alexa VP and Head Scientist Rohit Prasad presents the state of the union Alexa and Recent Advances in Conversational AIn for Alexa. He addresses Alexa's advances in spoken language understanding and machine learning, and shares Amazon's thoughts about building the next generation of user experiences.
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)Amazon Web Services
This session will introduce you to Amazon Polly, a new deep learning service that turns text into lifelike speech. Polly enables existing applications to speak as a first class feature and creates the opportunity for entirely new categories of speech-enabled products – from mobile apps and cars, to devices and appliances. Polly includes 47 lifelike voices and support for 24 languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Polly is easy to use – you just send the text you want converted into speech to the Polly API, and Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3. Polly supports Speech Synthesis Markup Language (SSML) tags like prosody so you can adjust the speech rate, pitch, or volume. Polly is a secure service that delivers all of these benefits at high scale and at low latency. You can cache and replay Polly’s generated speech at no additional cost. Polly lets you convert 5M characters per month for free during the first year. Polly’s pay-as-you-go pricing, low cost per request, and lack of restrictions on storage and reuse of voice output make it a cost-effective way to enable speech synthesis everywhere. Join this session to learn more and find out how you get can started with Amazon Polly, today!
Why Your Healthcare Business Intelligence Strategy Can't WinHealth Catalyst
Business intelligence may hold tremendous promise but it can’t answer healthcare’s challenges unless it’s built on the solid foundation of a clinical data warehouse. Learn the definition of business intelligence, why a clinical data warehouse is needed for any healthcare BI strategy, the various options in data warehousing, which one is most effective for hospitals and the industry and why.
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)Amazon Web Services
Natural user interfaces, such as those based on speech, enable customers to interact with their home in a more intuitive way. With the VUI (Voice User Interface) smart home, now customers don't need to use their hands or eyes to do things around the home — they only have to ask and it's at their command. This session will address the vision for the VUI smart home and how innovations with Amazon Alexa make it possible.
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...Amazon Web Services
Amazon AI services bring natural language understanding (NLU), automatic speech recognition (ASR), visual search and image recognition, text-to-speech (TTS), and machine learning (ML) technologies within reach of every developer. Amazon Lex make it easy to build sophisticated text and voice chatbots, powered by Alexa; Amazon Rekognition provides deep learning-based image recognition; and Amazon Polly turns text into lifelike speech. In this workshop, you'll get a chance to use each of the new deep learning services. We'll see you there!
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)Amazon Web Services
This session will introduce you to Amazon Rekognition, a new service that makes it easy to add image analysis to your applications. With Rekognition, you can detect objects, scenes, and faces in images. You can also search and compare faces. Rekognition’s API lets you easily build powerful visual search and discovery into your applications. With Amazon Rekognition, you only pay for the images you analyze and the face metadata you store. There are no minimum fees and there are no upfront commitments.
To get started with Rekognition, simply log in to the Rekognition console to try the service with sample photos or your own photos. Join this session and learn more about Amazon Rekognition!
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010CLOUDIAN KK
This is the summary materials of "Benchmarking Cloud Serving Systems with YCSB" paper for nosql summer reading in Tokyo on September 15, 2010 at Gemini Mobile Technologies in Shibuya, Tokyo.
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...Amazon Web Services
Want to learn how to Alexa-power your home? Join Brookfield Residential CIO and EVP Tom Wynnyk and Senior Solutions Architect Nathan Grice, for Alexa Smart Home for an overview of building the next generation of integrated smart homes using Alexa to create voice-first experiences. Understand the technologies used and how to best expose voice experiences to users through Alexa. Paul and Nathan cover the difference between custom Alexa skills and Smart Home Skill API skills, and build a home automation control from the ground up using Alexa and AWS IoT.
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...Amazon Web Services
The Jet Propulsion Laboratory designs and creates some of the most advanced space robotics ever imagined. JPL IT is now innovating to help streamline how JPLers will work in the future in order to design, build, operate, and support these spacecraft. They hope to dramatically improve JPLers' workflows and make their work easier for them by enabling simple voice conversations with the room and the equipment across the entire enterprise.
What could this look like? Imagine just talking with the conference room to configure it. What if you could kick off advanced queries across AWS services and kick off AWS Kinesis tasks by simply speaking the commands? What if the laboratory could speak to you and warn you about anomalies or notify you of trends across your AWS infrastructure? What if you could control rovers by having a conversation with them and ask them questions? In this session, JPL will demonstrate how they leveraged AWS Lambda, DynamoDB and CloudWatch in their prototypes of these use cases and more. They will also discuss some of the technical challenges they are overcoming, including how to deploy and manage consumer devices such as the Amazon Echo across the enterprise, and give lessons learned. Join them as they use Alexa to query JPL databases, control conference room equipment and lights, and even drive a rover on stage, all with nothing but the power of voice!
Methods of NoSQL database systems benchmarkingТранслируем.бел
Ilya Bakulin presents methods for benchmarking NoSQL database systems. He discusses the Yahoo Cloud Serving Benchmark (YCSB) framework, which allows benchmarking of NoSQL databases using common workloads. YCSB issues simple operations like insert, update, delete and scan without using SQL. It has adapters for popular NoSQL systems and allows custom workloads and databases to be added. Bakulin demonstrates YCSB by benchmarking Cassandra and sharded MySQL under different read/write ratios. Cassandra performs better in a write-heavy workload while MySQL is better for reads.
SQL Server 2008 Fast Track Data Warehouse 2.0
This was a presentation to the Silicon Valley SQL Server User Group in February 2010.
Speaker: Phil Hummel of WinWire Technologies
Presentation developed by Bruce Campbell
Western Region Data Warehouse Specialist, Microsoft
For more information about the SQL Server User Group, contact Mark Ginnebaugh, President of DesignMind, at mark@designmind.com
Big data architecture on cloud computing infrastructuredatastack
This document provides an overview of using OpenStack and Sahara to implement a big data architecture on cloud infrastructure. It discusses:
- The characteristics and service models of cloud computing
- An introduction to OpenStack, why it is used, and some of its key statistics
- What Sahara is and its role in provisioning and managing Hadoop, Spark, and Storm clusters on OpenStack
- Sahara's architecture, how it integrates with OpenStack, and examples of how it can be used to quickly provision data processing clusters and execute analytic jobs on cloud infrastructure.
Performance Tuning a Cloud Application: A Real World Case Studyshane_gibson
During the OpenStack Icehouse summit in Atlanta, Symantec presented on our vision for a Key Value as a Service storage technology utilizing MagnetoDB. Since then our Cloud Platform Team has rolled the service out in our production environments. Through that process we have learned about tuning requirements of the solution on bare metal versus hosted VMs within an OpenStack environment.
Our initial performance testing was done with MagnetoDB running on bare metal nodes. After migrating the service from bare metal to an OpenStack VM hosted environment, we observed a 50% reduction in performance.
This presentation will dig into the details of the performance baselines, the tuning of the Nova Compute servers, Virtual Machine settings, and the applications itself to increase our performance.
Why larger community will be interested in this topic
This presentation will dig in to the technical details of performance tuning an application running on an OpenStack Nova Compute cluster. We will examine the performance related configuration settings necessary to improve the hosted application from three different angles:
the underlying compute node Operating System configuration
the hypervisor virtualization layer
and the Guest VM and Application stack
This presentation will provide a real world analysis of the steps taken. In addition, it will provide an outline for other cloud operators to follow when they work towards performance tuning their own cloud stack.
Hive is a data warehouse infrastructure tool used to process large datasets in Hadoop. It allows users to query data using SQL-like queries. Hive resides on HDFS and uses MapReduce to process queries in parallel. It includes a metastore to store metadata about tables and partitions. When a query is executed, Hive's execution engine compiles it into a MapReduce job which is run on a Hadoop cluster. Hive is better suited for large datasets and queries compared to traditional RDBMS which are optimized for transactions.
Rigorous and Multi-tenant HBase Performance MeasurementDataWorks Summit
The document discusses techniques for rigorously measuring HBase performance in both standalone and multi-tenant environments. It begins with an overview of HBase and the Yahoo! Cloud Serving Benchmark (YCSB) for evaluating databases. It then discusses best practices for cluster setup, data loading, and benchmarking techniques like warming the cache, setting target throughput, and using appropriate workloads. Finally, it covers challenges in measuring HBase performance when used alongside other frameworks like MapReduce and Solr in a multi-tenant setting.
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
The document discusses techniques for rigorously measuring Apache HBase performance in both standalone and multi-tenant environments. It introduces the Yahoo! Cloud Serving Benchmark (YCSB) and best practices for cluster setup, workload generation, data loading, and measurement. These include pre-splitting tables, warming caches, setting target throughput, and using appropriate workload distributions. The document also covers challenges in achieving good multi-tenant performance across HBase, MapReduce and Apache Solr.
Using OpenStack in a traditional hosting environment posed scaling challenges that required automating provisioning across multiple data centers. OpenStack was chosen for its support, scalability, and ability to support future cloud offerings. Bluehost implemented optimizations like using MySQL slaves, custom schedulers, and replacing Qpid with ZeroMQ to address scalability issues with messaging, databases, and APIs under heavy load. The enhanced OpenStack deployment now supports over 10,000 physical servers being added daily.
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
Using OpenStack in a traditional hosting environment posed scaling challenges that required automating provisioning across multiple data centers. OpenStack was chosen for its open source support, typical cloud features, and ability to transition to a future cloud offering. Bluehost implemented OpenStack across over 10,000 servers, addressing issues like unstable messaging, overloaded MySQL, and premature networking plugins. Solutions involved read-only databases, optimized configurations, and custom scheduler, quantum, and nova components.
Building a big data intelligent application on top of xPatterns using tools that leverage Spark, Shark, Mesos, Tachyon and Cassandra. Jaws, open sourcing our own spark sql restful service and our own contributions to the Spark and Mesos projects, lessons learned
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksCloudera, Inc.
The Apache Hadoop community is gearing up for the upcoming release of Apache Hadoop 0.23. This release has major enhancements to Hadoop such as HDFS Federation for hyper-scale and a Next Generation MapReduce framework. Arun, the Apache Hadoop Release Master for 0.23, will briefly cover the highlights of the release and pay particular attention to the plans and efforts undertaken to test, stabilize and release Hadoop.next. The talk covers some of the timelines for the release, our plans for compatibility and upgrade paths for existing users of Hadoop.
Apache Hadoop 0.23 at Hadoop World 2011Hortonworks
This document discusses Apache Hadoop 0.23, the first stable release of Hadoop in over 30 months. It introduces the speaker, Arun Murthy, and describes significant new features in Hadoop 0.23 like HDFS federation and YARN. It also covers performance improvements, HDFS high availability, and the extensive testing done for the release across many projects like HBase, Pig and Hive to enable very large deployments of 6000+ nodes.
This document provides an overview of a NoSQL Night event presented by Clarence J M Tauro from Couchbase. The presentation introduces NoSQL databases and discusses some of their advantages over relational databases, including scalability, availability, and partition tolerance. It covers key concepts like the CAP theorem and BASE properties. The document also provides details about Couchbase, a popular document-oriented NoSQL database, including its architecture, data model using JSON documents, and basic operations. Finally, it advertises Couchbase training courses for getting started and administration.
The document summarizes two use cases for Hadoop in biotech companies. The first case discusses a large biotech firm "N" that implemented Hadoop to improve their drug development workflow using next generation DNA sequencing. Hadoop reduced the workflow from 6 weeks to 2 days. The second case discusses challenges at another biotech firm "M" around scaling genomic data analysis and Hadoop's role in addressing those challenges through improved data ingestion, storage, querying and analysis capabilities.
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
This document discusses the attributes of a high-performance, low-latency database like ScyllaDB. It begins with introductions and an overview of ScyllaDB. It then summarizes how hardware has evolved over 20 years with more cores, memory, and faster disks. ScyllaDB was redesigned from first principles to take advantage of modern hardware, using an asynchronous, shared-nothing architecture with one shard per core. This allows it to achieve significantly higher performance than Cassandra. The document shows benchmark results demonstrating ScyllaDB's lower latencies and ability to scale to higher throughput. It also discusses how ScyllaDB uses workload prioritization to manage different types of workloads.
1) Olivier Tisserand has experience designing and deploying virtual and on-premise infrastructure using tools like Chef, AWS, and Mikrotik equipment.
2) He has managed teams working on automation, testing, and DevOps projects for companies across industries including banking, ecommerce, and marketing analytics.
3) His background includes roles overseeing networking, servers, software development, and continuous integration/delivery processes using Agile methodologies.
Learn from HomeAway Hadoop Development and Operations Best PracticesDriven Inc.
HomeAway's Big Data team shares a number of development best practices using Cascading, a data application development framework. They also review several operational best practices for managing production Big Data applications that are business critical.
Mobile internet development in China is rapidly evolving. [1] Mobile internet usage in China has grown significantly in recent years with over 500 million mobile internet users projected by 2012. [2] Younger users between ages 18-30 make up the largest demographic of mobile internet users in China. [3] Popular mobile internet activities include using mobile browsers to access news, social networking, search, and entertainment such as music and videos.
Mobile Web Content And Services In Europekevin han
This document summarizes the evolution of mobile content and services in Europe over the past 10 years. It describes the transformation from basic phones to smartphones, the rise of mobile apps, and increasing smartphone and mobile web penetration across different European countries. It also discusses challenges like device diversity, localizing for different markets and carriers, and the difficulty of making money from mobile content. Finally, it provides best practices for success in Europe and predicts further growth of app stores and changes to operators' roles in the coming years.
1. Yahoo! Cloud Serving Benchmark
Overview and results – February 3, 2010
Brian F. Cooper
cooperb@yahoo-inc.com
Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears
System setup and tuning assistance from members of the Cassandra and HBase
committers, and the Sherpa engineering team
1
2. Versions of this deck
• V4.1 – Original set of results from
benchmark
• V4.2 – added Cassandra 0.5 versus 0.4.2
comparison, Cassandra range query
results, and vary scan size results
2
3. Motivation
• There are many “cloud DB” and “nosql” systems out there
– Sherpa/PNUTS
– BigTable
• HBase, Hypertable, HTable
– Megastore
– Azure
– Cassandra
– Amazon Web Services
• S3, SimpleDB, EBS
– CouchDB
– Voldemort
– Dynomite
– Etc: Tokyo, Redis, MongoDB
• How do they compare?
– Feature tradeoffs
– Performance tradeoffs
– Not clear!
3
4. Goal
• Implement a standard benchmark
– Evaluate different systems on common workloads
– Focus on performance and scale out
• Future additions – availability, replication
• Artifacts
– Open source workload generator
– Experimental study comparing several systems
4
5. Benchmark tool
• Java application
– Many systems have Java APIs
– Other systems via HTTP/REST, JNI or some other solution
Command-line parameters
• DB to use
• Target throughput
• Number of threads
•…
Workload
YCSB client
Cloud DB
parameter file
DB client
• R/W mix
Client
• Record size
Workload threads
• Data set
executor
•…
Stats
Extensible: define new workloads
Extensible: define new workloads
Extensible: plug in new clients
Extensible: plug in new clients
5
6. Workloads
• Workload – particular combination of workload parameters, defining
one workload
– Defines read/write mix, request distribution, record size, …
– Two ways to define workloads:
• Adjust parameters to an existing workload (via properties file)
• Define a new kind of workload (by writing Java code)
• Experiment – running a particular workload on a particular hardware
setup to produce a single graph for 1 or N systems
– Example – vary throughput and measure latency while running a
workload against Cassandra and HBase
• Workload package – A collection of related workloads
– Example: CoreWorkload – a set of basic read/write workloads
6
7. Benchmark tiers
• Tier 1 – Performance
– For constant hardware, increase offered throughput
until saturation
– Measure resulting latency/throughput curve
– “Sizeup” in Wisconsin benchmark terminology
• Tier 2 – Scalability
– Scaleup – Increase hardware, data size and workload
proportionally. Measure latency; should be constant
– Elastic speedup – Run workload against N servers;
while workload is running att N+1th server; measure
timeseries of latencies (should drop after adding
server)
7
8. Test setup
• Setup
– Six server-class machines
• 8 cores (2 x quadcore) 2.5 GHz CPUs, 8 GB RAM, 6 x 146GB 15K RPM SAS drives in RAID 1+0,
Gigabit ethernet, RHEL 4
– Plus extra machines for clients, routers, controllers, etc.
– Cassandra 0.4.2
– HBase 0.20.2
– MySQL 5.1.32 organized into a sharded configuration
– Sherpa 1.8
– No replication; force updates to disk (except HBase, which does not yet support this)
• Workloads
– 120 million 1 KB records = 20 GB per server
– Reads retrieve whole record; updates write a single field
– 100 or more client threads
• Caveats
– Write performance would be improved for Sherpa, sharded MySQL and Cassandra with a
dedicated log disk
– We tuned each system as well as we knew how, with assistance from the teams of
developers
8
9. Workload A – Update heavy
• 50/50 Read/update
Workload A - Read latency Workload A - Update latency
90 80
80 70
Average read latency (ms)
70
Update latency (ms)
60
60
50
50
40
40
30
30
20 20
10 10
0 0
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Throughput (ops/sec) Throughput (ops/sec)
Cassandra Hbase Sherpa MySQL Cassandra Hbase Sherpa MySQL
Comment: Cassandra is optimized for writes, and has better write latency. However, Sherpa
has pretty good write latency, comparable read latency, and comparable peak
throughput. HBase has good write latency because it does not sync updates to disk, at
the cost of lower durability; but read latency is very bad 9
10. Workload B – Read heavy
• 95/5 Read/update
Workload B - Read latency Workload B - Update latency
60 40
Average update latency (ms)
35
Average read latency (ms)
50
30
40
25
30 20
15
20
10
10
5
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Throughput (operations/sec) Throughput (operations/sec)
Cassandra HBase Sherpa MySQL Cassandra Hbase Sherpa MySQL
Comment: Sherpa does very well here, with better read and write latency and peak
throughput than Cassandra, and better read latency and peak throughput than HBase.
Again HBase write latency is very low because of no disk syncs. Buffer pool architecture
is good for random reads. 10
11. Workload E – short scans
• Scans of 1-100 records of size 1KB
Workload E - Scan latency
120
100
Average scan latency (ms)
80
60
40
20
0
0 200 400 600 800 1000 1200 1400 1600
Throughput (operations/sec)
Hbase Sherpa Cassandra
Comment: HBase and Sherpa are roughly equivalent for latency and peak throughput,
even though HBase is “meant” for scans. Cassandra’s performance is poor, but the
development team notes that many optimizations still need to be done.
11
12. Workload E – range size
• Vary size of range scans
Range size versus latency (Workload E)
500
Average range scan latency (ms)
450
400
350
300
250
200
150
100
50
0
0 200 400 600 800 1000 1200 1400 1600 1800
Max range size (records)
Hbase Sherpa
Comment: For small ranges, queries are similar to random lookups; Sherpa is efficient for
random lokoups and does well. As range increases, HBase begins to perform better
since it is optimized for large scans 12
13. Scale-up
• Read heavy workload with varying hardware
Read latency during scale-up
35
30
Average read latency (ms)
25
20
15
10
5
0
0 2 4 6 8 10 12 14
Number of servers
Cassandra Hbase Sherpa
Comment: Sherpa scales well, with flat latency as system size increases.
Cassandra scales less well, with more P2P communication. HBase is very
unstable; 3 servers or less performs very poorly. More experiments are
needed to get more data points on these curves.
13
14. Elasticity
• Run a read-heavy workload on 3 servers; add a 4th
server after 5 minutes
Cassandra elastic read performance
8.2
8
7.8
Average read latency (ms)
7.6
7.4
7.2
7
6.8
6.6
0 10 20 30 40 50 60 70
Time (min)
Comment: Cassandra shows nice elasticity; after a fourth server is added,
average latency of requests quickly drops by 11% with little or no
disruption. 14
15. Elasticity
• Run a read-heavy workload on 3 servers; add a 4th
server after 5 minutes
Hbase elastic read performance (detail)
70
65
60
Average read latency (ms)
55
50
45
40
35
30
0 10 20 30 40 50 60 70
Time (min)
Comment: HBase initially exhibits a large latency spike, with some requests
taking as much as 1000 ms; then, latency settles down and eventually
becomes 12% lower than latency before adding the server. 15
16. Cassandra 0.5 Results
Workload A - Update heavy
90
80
70
Average latency (ms)
60
50
40
30
20
10
0
0 2000 4000 6000 8000 10000 12000 14000
Throughput (operations/sec)
Cas 0.5 Read Cas 0.5 Update Cas 0.4.2 Read Cas 0.4.2 Update
16
17. Cassandra 0.5 Results
Workload B - Read heavy
60
50
Average latency (ms)
40
30
20
10
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Throughput (operations/sec)
Cas 0.5 Read Cas 0.5 Update Cas 0.4.2 Read Cas 0.4.2 Update
17
18. For more information
• Contact: Brian Cooper (cooperb@yahoo-inc.com)
• Detailed writeup of benchmark:
http://www.brianfrankcooper.net/pubs/ycsb.pdf
• Open source YCSB tool coming soon
18