EC2 NoSQL Benchmarking

•Download as PPTX, PDF•

2 likes•1,389 views

This presentation contains information on the test environment, settings, major criteria for evaluation, and component diagrams that can help you to test a NoSQL data store for your project. It also provides a matrix that compares a number of NoSQL products based on our test results. We also list the issues we encountered and some approaches we used to overcome them. For more independent research into Hadoop, NoSQL, and other big data technologies, please visit www.altoros.com/research-papers or follow @altoros.

Technology Business

© ALTOROS | CONFIDENTIAL
Vitaly Rudenya
Team Leader
vitaly.rudenya@altoros.com

© ALTOROS | CONFIDENTIAL 2













© ALTOROS | CONFIDENTIAL 3
Multi-data center (regions) bi-directional replication to multi regions
Support for active/active reads across regions
Auto resynchronization of data between regions in the event of loss of connectivity
between regions
Support for encryption of data replication traffic across regions
Support for active/active writes across regions
Configurable replication factor (within cluster in a single region and across regions)
Tunable Consistency for reads and writes

© ALTOROS | CONFIDENTIAL 4
0
200
400
600
800
1000
1200
1400
1600
Results

© ALTOROS | CONFIDENTIAL 6
•
•
•
•
•
•
•
•
•
•

© ALTOROS | CONFIDENTIAL 7
Performance verification tests that evaluate capabilities of the system will be carried out under
different workloads to provide the required statistics on the tunable requests.
For Performance verification there should be run all the workloads 1-10 as they are described
above.
The main goal is to define for each workload:
maximum throughput possible
optimal number of threads
optimal latency

© ALTOROS | CONFIDENTIAL 12
•
 3 availability zones in one region
 Single security group with all required port opened
 3 i2.4xlarge 64bit instances for cluster nodes: 122GB RAM, 16 vCPU, 53 ECU, high-
performance network
•
 SSD ephemeral volume by 120 GB

© ALTOROS | CONFIDENTIAL 13
•
 No more than 10k ops/sec for single YCSB client
 Minor performance changes with number of threads or YCSB processes increasing
on same client server
 Low resources utilizing (Network, HDD, Memory, CPU)
 Linear cluster performance increasing with additional YCSB server
•
 Amazon VPN has been created
 We used 9 YCSB nodes to load the cluster
 Servers were configured to use same placement group

© ALTOROS | CONFIDENTIAL 14
Vitaly Rudenya
Team Leader
vitaly.rudenya@altoros.com
Altoros, 2014

What's hot

Presentation cisco iasbu private cloud introductionxKinAnx

Content Delivery Networks (CDN) in Depth!RobertCluett

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula Project

ContainerDays NYC 2016: "State of the Persistence Art: Present Best Practices...DynamicInfraDays

Stratoscale Latest and GreatestZach Lanksbury

OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula Project

Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...Redis Labs

OpenEBS - Containerized Storage for ContainersUmasankar Mukkara

OpenStack on AArch64LinuxCon ContainerCon CloudOpen China

OpenNebula TechDay Boston 2015 - Future of Information Storage with ISS Super...OpenNebula Project

Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, PivotalRedis Labs

Web後端技術的演變inwin stack

Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...DevOpsDays Tel Aviv

Xyratex SC13 Podcastinside-BigData.com

CI, CD, CT, Deploy, IaaS, DevOps, StageArtur Basak

Ceph Day Amsterdam 2015 - Ceph backing the first Government Cloud in the Neth...Ceph Community

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Alluxio, Inc.

Kube ovn-sandbox-proposal梦馨刘

Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph Community

Making Ceph awesome on Kubernetes with Rook - Bassam TabbaraCeph Community

What's hot (20)

Presentation cisco iasbu private cloud introduction

Content Delivery Networks (CDN) in Depth!

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula

ContainerDays NYC 2016: "State of the Persistence Art: Present Best Practices...

Stratoscale Latest and Greatest

OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...

Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...

OpenEBS - Containerized Storage for Containers

OpenStack on AArch64

OpenNebula TechDay Boston 2015 - Future of Information Storage with ISS Super...

Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal

Web後端技術的演變

Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...

Xyratex SC13 Podcast

CI, CD, CT, Deploy, IaaS, DevOps, Stage

Ceph Day Amsterdam 2015 - Ceph backing the first Government Cloud in the Neth...

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

Kube ovn-sandbox-proposal

Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim

Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara

Similar to EC2 NoSQL Benchmarking

Azure IaaS Tanıtım - Kısa Anlatım Mustafa

Couchbase Performance BenchmarkingRenat Khasanshyn

Couchbase Performance Benchmarking 2012Altoros

Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...IT Share

TechTalkThai-CiscoHyperFlexJarut Nakaramaleerat

TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project

SoC Solutions Enabling Server-Based NetworkingNetronome

Exchange 2010 New England Vmugcsharney

Ceph Day New York 2014: Ceph over High Performance NetworksCeph Community

Ceph Day London 2014 - Ceph Over High-Performance Networks Ceph Community

Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with SuccessAltoros

CC-4153, Verizon Cloud Compute and the SM15000, by Paul CurtisAMD Developer Central

VMware End-User-Computing Best Practices PosterVMware Academy

Oracle Storage a ochrana datMarketingArrowECS_CZ

Five Steps to Creating a Secure Hybrid Cloud ArchitectureAmazon Web Services

2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...Hendrik van Run

Microsoft azure platformsMotty Ben Atia

HPC and cloud distributed computing, as a journeyPeter Clapham

Cosmos DB at VLDB 2019Dharma Shukla

Help, I need to migrate my On Premise Database to Azure, which Database Tier ...Erwin de Kreuk

Similar to EC2 NoSQL Benchmarking (20)

Azure IaaS Tanıtım - Kısa Anlatım

Couchbase Performance Benchmarking

Couchbase Performance Benchmarking 2012

Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...

TechTalkThai-CiscoHyperFlex

TechDay - Toronto 2016 - Hyperconvergence and OpenNebula

SoC Solutions Enabling Server-Based Networking

Exchange 2010 New England Vmug

Ceph Day New York 2014: Ceph over High Performance Networks

Ceph Day London 2014 - Ceph Over High-Performance Networks

Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success

CC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis

VMware End-User-Computing Best Practices Poster

Oracle Storage a ochrana dat

Five Steps to Creating a Secure Hybrid Cloud Architecture

2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...

Microsoft azure platforms

HPC and cloud distributed computing, as a journey

Cosmos DB at VLDB 2019

Help, I need to migrate my On Premise Database to Azure, which Database Tier ...

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

🐬 The future of MySQL is Postgres 🐘RTylerCroy

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Boost PC performance: How more available memory can improve productivity

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

GenCyber Cyber Security Day Presentation

Unblocking The Main Thread Solving ANRs and Frozen Frames

🐬 The future of MySQL is Postgres 🐘

The 7 Things I Know About Cyber Security After 25 Years | April 2024

[2024]Digital Global Overview Report 2024 Meltwater.pdf

SQL Database Design For Developers at php[tek] 2024

Scaling API-first – The story of a global engineering organization

Injustice - Developers Among Us (SciFiDevCon 2024)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Data Cloud, More than a CDP by Matt Robison

Salesforce Community Group Quito, Salesforce 101

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Handwritten Text Recognition for manuscripts and early printed texts

EC2 NoSQL Benchmarking

3. © ALTOROS | CONFIDENTIAL 3 Multi-data center (regions) bi-directional replication to multi regions Support for active/active reads across regions Auto resynchronization of data between regions in the event of loss of connectivity between regions Support for encryption of data replication traffic across regions Support for active/active writes across regions Configurable replication factor (within cluster in a single region and across regions) Tunable Consistency for reads and writes

7. © ALTOROS | CONFIDENTIAL 7 Performance verification tests that evaluate capabilities of the system will be carried out under different workloads to provide the required statistics on the tunable requests. For Performance verification there should be run all the workloads 1-10 as they are described above. The main goal is to define for each workload: maximum throughput possible optimal number of threads optimal latency

12. © ALTOROS | CONFIDENTIAL 12 •  3 availability zones in one region  Single security group with all required port opened  3 i2.4xlarge 64bit instances for cluster nodes: 122GB RAM, 16 vCPU, 53 ECU, high- performance network •  SSD ephemeral volume by 120 GB

13. © ALTOROS | CONFIDENTIAL 13 •  No more than 10k ops/sec for single YCSB client  Minor performance changes with number of threads or YCSB processes increasing on same client server  Low resources utilizing (Network, HDD, Memory, CPU)  Linear cluster performance increasing with additional YCSB server •  Amazon VPN has been created  We used 9 YCSB nodes to load the cluster  Servers were configured to use same placement group

Editor's Notes

Often referred to as NoSQL, non-relational databases feature elasticity and scalability. In addition, they can store big data and work with cloud computing systems. All of these factors make them extremely popular.
For the purpose of the test, we divided the databases into three categories. Each database was evaluated based on 22 criteria.
For the purpose of the test, we divided the databases into three categories. Each database was evaluated based on 22 criteria.
NoSQL solutions address many of these problems.POINT 1: In 2013, the number of NoSQL products reached 150+ and the figure is still growing. That variety makes it difficult to select the best tool for a particular case.POINT 2: They come in many types--key-value, columnar, document-oriented, and graph.POINT 3: There is one thing in common for all NoSQL databases. They don't use the relational data model. This means they do not use the SQL query language.POINT 4: NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID)POINT 5: They provide APIs to perform various operations. Some of NoSQL data stores support query language operations, for example, Cassandra and Hbase. However, there is no standard. This is another difference between NoSQL databases and traditional RDBMS.POINT 6: RDBMS usually have strong data consistency. In contrast to that, NoSQL data stores operate with eventual consistency. When you add data to the system, it becomes consistent after some time.POINT 7: NoSQL architectures are designed to run in cluster that consist of several nodes. This makes it possible to scale them horizontally by increasing the number of nodes.In addition, NoSQL data stores serve huge amounts of data and provide high throughput.
POINT 1: NoSQL databases differ from RDBMS in their data models. These systems can be divided into 4 groups:A. Key Value StoresKey value stores are similar to maps or dictionaries where data is addressed by a unique key.B. Document StoresDocument Stores encapsulate key value pairs in JSON or JSON like documents. Within documents, keys have to be unique. In contrast to key-value stores, values are not opaque to the system and can be queried as well.C. Column Family StoresColumn Family Stores are also known as column oriented stores, extensible record stores and wide columnar stores.D. Graph databasesKey-value stores, document stores, and column family stores have a common feature. They do store denormalized data in order to gain advantages in distribution.In contrast to relational databases and the already introduced key oriented NoSQL databases, graph databases are specialized on efficient management of heavily linked data.POINT 2: All NoSQL data stores have an API to work with data. Some DBs use certain SQL operations. Others support MapReduce aggregation.POINT 3: Multiversion concurrency control (MVCC) relaxes strict consistency in favor of performance. In order to support transactions without reserving multiple datasets for exclusive access, optimistic locking is provided by many stores. Before changed data is committed, each transaction checks, whether another transactions made any conflicting modifications to the same datasets.POINT 4: NoSQL databases differ in the way they distribute data on multiple machines. Since data models of key-value stores, document stores and column family stores are key oriented, the two common partition strategies are based on keys, too.The first strategy distributes datasets by the range of their keys. A routing server splits the whole keyset into blocks and allocates these blocks to different nodes. Afterwards, one node is responsible for storage and request handling of his specific key ranges. In order to find a certain key, clients have to contact the routing server for getting the partition table.Higher availability and much simpler cluster architecture can be achieved with the second distributionstrategy called consistent hashing. In contrast to range based partitioning, keys are distributed by using hash functions. Since every server is responsible for a certain hash region, addresses of certain keys within the cluster can be calculated very fast.In addition to better read performance through load balancing, replication also brings better availability and durability, because failing nodes can be replaced by other servers. If all replicas of a master server were updated synchronously, the system would not be available until all slaves had committed a write operation. Ifmessages got lost due to network problems, the system would not be available for a longer period of time. This solution is not suitable for platforms that rely on high availability, because even a few milliseconds of latency can have a big influence on user behavior.POINT 5: (PERFORMANCE: TYPICAL WORKLOADS)Obviously, performance is a very important factor. Performance of data storage solutions can be evaluated using typical scenarios. These scenarios simulate the most common operations performed by applications that use the data store, also known as typical workloads. The tests that we performed to compare performance of several NoSQL data stores also used typical workloads.
1. Device syncUpload of raw payload from devices. ConfigTable: SYNCInitial records: imported on initial workloadOperations: 100% custom inserts to SYNC Custom operationParameters: PAYLOADSteps: Generate SYNC_IDRead PAYLOAD value from pregenerated file;insert new record (SYNC_ID, PAYLOAD) into SYNC 2. Add to Shopping CartConsumer would add to the cart and may not buy: reading from INVENTORY, inserting to ORDERS. ConfigTable(read): INVENTORYTable(insert): ORDERSInitial records: imported on initial workloadOperations: 100% adding to Shopping Cart: read of INVENTORY table – 50%, insert to ORDERS – 50% CustomoperationParameters: USER_IDSKU_ID QTY :=1STATE := ‘INCOMPLETE’Steps:Retrieve row from INVENTORY with given SKU_IDGenerate ORDER_IDInsert new Order record(USERID, ORDER_ID, SKU_ID, QTY, CR_DATE := SYSDATE, STATE := ‘INCOMPLETE’) to ORDERS 3. ProfileregistrationsRegistration of new users in a system. ConfigTable: USERInitial records: imported on initial workloadOperations: 100% adding of new profile: read (checking if user with such an email exists) – 50%, insert of new profile – 50% CustomoperationParameters:Password := ‘1234’Shipping addr := ‘1 Bowerman Drive, Beaverton, OR, 97005’Billing addr := ‘1 Bowerman Drive, Beaverton, OR, 97005’Steps: Generate a unique USER_ID [based on Java UUID]: must be uniqueGenerate FIRST_NAME, LAST_NAME: random generator can be used, but generating name so that EMAIL constructed from them is unique – see the next pointGenerate a EMAIL: FIRST_NAME.LAST_NAME@test.com – must be uniqueLAST_ACTIVITY_DATE will be SYSDATE in TIMESTAMP formatRetrieve row from USER based on EMAIL (simulating a check of uniqueness)INSERT to USER(USER_ID, EMAIL, FIRST_NAME, LAST_NAME, Password, Shipping addr, Billing addr) 4. LoginLogin of user and updating of his last activity date ConfigTable: USERInitial records: imported on initial workloadOperations: 100% login of user: read from USER – 50%, update USER – 50% CustomoperationParameters:User EMAILSteps:Retrieve row from USER based on EMAILSet Last Activity Date to SYSDATE in TIMESTAMP format and update USER record 5. OrderCreateCustomer would add to shopping cart and check out. ConfigTable(read): INVENTORYTable(update): INVENTORYTable(insert): ORDERSInitial records: imported on initial workloadOperations: 100% of create order operation: read from INVENTORY - 33.3%, insert to ORDERS - 33.3%, update of INVENTORY - 33.3% CustomoperationParameters:USER_ID from USERValid SKU_ID from INVENTORYQTY :=2STATE := ‘COMPLETE’Steps:Retrieve row from INVENTORY based on SKU_IDCheckif LOCK is ‘N’Check QTY < AVAIL_QTYSubtract QTY from AVAIL_QTY in INVENTORY row for SKU_IDSet LOCK to ‘Y’ for SKU_ID [No other user should be able to place an order for this SKU as long as LOCK is ‘Y’] and update this INVENTORY rowCR_DATE will be SYSDATE in Timestamp formatINSERT to ORDERS(USER_ID, SKU_ID, QTY, CR_DATE, STATE)Set LOCK to ‘N’ in INVENTORY row and update it again 6. Last 30 useractivitiesChecking for the last 30 activities on mobile app. ConfigTable: ACTIVITYInitial records: imported on initial workloadOperations: custom query read – 100% Custom operationParameters:USER_ID from USERSteps:Retrieve last 30 rows from ACTIVITY based on USER ID and sort desc based on DATE column 7. Single activity detail based on activity idChecking data of a single activity ConfigTable: ACTIVITYInitial records: imported on initial workloadOperations: custom read based on ACTIVITY_ID – 100% CustomoperationParameters:ACTIVITY_ID Steps:Retrieve data from ACTIVITY based on ACTIVITY_ID 8. Total fuel for the last 30 activitiesAggregate query of total fuel earned by user for the last 30 activities ConfigTable: ACTIVITYInitial records: imported on initial workloadOperations: custom aggregated read – 100% Custom operationParameters:USER_ID from USERSteps:Aggregate FUEL for the last 30 activities from ACTIVITY 9. Delete an activity based on activity idUser’s deleting a single activity from their mobile app. ConfigTable: ACTIVITYInitial records: imported on initial workloadOperations: 100% custom delete based on ACTIVITY_ID CustomoperationParameters:ACTIVITY_IDSteps:Delete from ACTIVITY based on ACTIVITY_ID 10. Profile search based on the First Name and Last NameUser record lookup. 10.1 ConfigTable: USERInitial records: imported on initial workloadOperations: custom search read – 100% 10.2 Custom operationParameters:FirstNamefrom USERLastNamefrom USERSteps:Retrieve data based on First name and Last name [case insensitive]
Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions. In our tests we tried to see how NoSQL data stores perform under the same conditions.POINT 1: For benchmarking, we used the Yahoo Cloud Serving Benchmark (YCSB)The kernel of YCSB has a a framework with a workload generator that creates test workload and a set of workload scenarios.POINT 2: Developers need to describe the scenario of the workload by operation type: what operations are performed on what types of records. POINT 3: Supported operations include: insert, update (change one of the fields), read (one random field or all the field of one record), and scan (read the records in the order of the key starting from a randomly selected record).We can define the workload by the data that will be loaded into the database during the loading phase and the operations that will be executed against the data set during the transaction phase.
This is a component diagram of the YCSB framework. It consists of several modules.Workload executor applies the workload to the data store. For each session, when the client accesses the DB, a client thread is initiated. Each thread performs a set of operations from the workload. The results in the form of statistics are then sent to the statistics module, which prints the output of the test to console where benchmark is started. These tests are consequently repeated for all the selected solutions.The YCSB framework has connectors for a wide range of DBs. For each database tested with YCSB, a developer needs to determine the type of database, target throughput, the number of concurrent threads on the client side, and how many operations we want to perform. This is necessary to create and start a test.

EC2 NoSQL Benchmarking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to EC2 NoSQL Benchmarking

Similar to EC2 NoSQL Benchmarking (20)

More from Altoros

More from Altoros (20)

Recently uploaded

Recently uploaded (20)

EC2 NoSQL Benchmarking

Editor's Notes