This document discusses PlanDas Cache Cloud, a caching solution. It begins by covering concepts like availability, performance, reliability, and manageability as they relate to caching. It then discusses the differences between distributed and global caching approaches. The document outlines how caching can improve performance for web services and help address bottlenecks. It introduces the PlanDas Cache Cloud architecture, which uses consistent hashing for high availability. The document shows how the solution provides a global cache, multi-tenancy, and high performance. It also covers the web management interface and similarities to Redis APIs. Finally, it shares performance test results on AWS and physical machines that show throughput scaling as nodes are added.
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
Nitin Verma, Pravin Mittal, and Maxim Lukiyanov (Microsoft)
This session presents our success story of enabling a big internal customer on Microsoft Azure’s HBase service along with the methodology and tools used to meet high-throughput goals. We will also present how new features in HBase (like BucketCache and MultiWAL) are helping our customers in the medium-latency/high-bandwidth cloud-storage scenario.
This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
This document discusses implementing an HBase coprocessor to index columns from HBase into an Elasticsearch cluster. It describes storing book records from publishers and libraries in HBase and indexing them into Elasticsearch using MapReduce jobs. To handle updates, a coprocessor uses HBase's checkAndPut method to verify the record version before updating and indexing the new version into Elasticsearch, ensuring consistency between the two systems.
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon
In this session, learn about the move to Asciidoc in HBase docs, some of the other notable changes lately, and things we've done to make it easier for you to contribute to the docs.
This document summarizes a presentation about optimizing for low latency in HBase. It discusses how to measure latency, the write and read paths in HBase, sources of latency like garbage collection and compactions, and techniques for reducing latency like streaming puts, block caching, and timeline consistency. The key points are that single puts can achieve millisecond latency while garbage collection and machine failures can cause pauses of 10s of milliseconds to seconds, and optimizing for the "magical 1%" of requests after the 99th percentile is important to improve average latency.
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiMichael Stack
HBase and OpenTSDB Practices at Huawei describes HBase and OpenTSDB practices at Huawei including:
1) HBase practices such as accelerating HMaster startup time by parallelizing region locality computation, enhancing replication by configuring peer cluster principals and adaptive call timeouts, and ensuring reliable region assignment through periodic recovery of stuck regions and detection of duplicate region assignments.
2) OpenTSDB practices such as improving compaction by integrating it with HBase internal compaction to avoid extra write amplification, and adding per-metric data lifecycle management and a two-level thread model for improved performance. Benchmark results show significant improvements in throughput, CPU usage, and query latency.
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
Speakers: Enis Soztutar and Devaraj Das (Hortonworks)
HBase has ACID semantics within a row that make it a perfect candidate for a lot of real-time serving workloads. However, single homing a region to a server implies some periods of unavailability for the regions after a server crash. Although the mean time to recovery has improved a lot recently, for some use cases, it is still preferable to do possibly stale reads while the region is recovering. In this talk, you will get an overview of our design and implementation of region replicas in HBase, which provide timeline-consistent reads even when the primary region is unavailable or busy.
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
This document discusses techniques used by Pinterest to improve the reliability of HBase and reduce the cost and complexity of backing up HBase data. It describes how Pinterest uses geo-replication across data centers to provide high availability of HBase clusters. It also details Pinterest's upgrade to their backup pipeline to allow direct export of HBase snapshots and write-ahead logs to Amazon S3, avoiding the need for an intermediate HDFS backup cluster. Additionally, it covers their use of an offline deduplication tool called PinDedup to further reduce S3 storage usage by identifying and replacing duplicate files across backup cycles. This combination of techniques significantly reduced infrastructure costs and backup times for Pinterest's critical HBase data.
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
Nitin Verma, Pravin Mittal, and Maxim Lukiyanov (Microsoft)
This session presents our success story of enabling a big internal customer on Microsoft Azure’s HBase service along with the methodology and tools used to meet high-throughput goals. We will also present how new features in HBase (like BucketCache and MultiWAL) are helping our customers in the medium-latency/high-bandwidth cloud-storage scenario.
This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
This document discusses implementing an HBase coprocessor to index columns from HBase into an Elasticsearch cluster. It describes storing book records from publishers and libraries in HBase and indexing them into Elasticsearch using MapReduce jobs. To handle updates, a coprocessor uses HBase's checkAndPut method to verify the record version before updating and indexing the new version into Elasticsearch, ensuring consistency between the two systems.
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon
In this session, learn about the move to Asciidoc in HBase docs, some of the other notable changes lately, and things we've done to make it easier for you to contribute to the docs.
This document summarizes a presentation about optimizing for low latency in HBase. It discusses how to measure latency, the write and read paths in HBase, sources of latency like garbage collection and compactions, and techniques for reducing latency like streaming puts, block caching, and timeline consistency. The key points are that single puts can achieve millisecond latency while garbage collection and machine failures can cause pauses of 10s of milliseconds to seconds, and optimizing for the "magical 1%" of requests after the 99th percentile is important to improve average latency.
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiMichael Stack
HBase and OpenTSDB Practices at Huawei describes HBase and OpenTSDB practices at Huawei including:
1) HBase practices such as accelerating HMaster startup time by parallelizing region locality computation, enhancing replication by configuring peer cluster principals and adaptive call timeouts, and ensuring reliable region assignment through periodic recovery of stuck regions and detection of duplicate region assignments.
2) OpenTSDB practices such as improving compaction by integrating it with HBase internal compaction to avoid extra write amplification, and adding per-metric data lifecycle management and a two-level thread model for improved performance. Benchmark results show significant improvements in throughput, CPU usage, and query latency.
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
Speakers: Enis Soztutar and Devaraj Das (Hortonworks)
HBase has ACID semantics within a row that make it a perfect candidate for a lot of real-time serving workloads. However, single homing a region to a server implies some periods of unavailability for the regions after a server crash. Although the mean time to recovery has improved a lot recently, for some use cases, it is still preferable to do possibly stale reads while the region is recovering. In this talk, you will get an overview of our design and implementation of region replicas in HBase, which provide timeline-consistent reads even when the primary region is unavailable or busy.
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
This document discusses techniques used by Pinterest to improve the reliability of HBase and reduce the cost and complexity of backing up HBase data. It describes how Pinterest uses geo-replication across data centers to provide high availability of HBase clusters. It also details Pinterest's upgrade to their backup pipeline to allow direct export of HBase snapshots and write-ahead logs to Amazon S3, avoiding the need for an intermediate HDFS backup cluster. Additionally, it covers their use of an offline deduplication tool called PinDedup to further reduce S3 storage usage by identifying and replacing duplicate files across backup cycles. This combination of techniques significantly reduced infrastructure costs and backup times for Pinterest's critical HBase data.
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
Zen is a storage service built at Pinterest that offers a graph data model of top of HBase and potentially other storage backends. In this talk, Zen's architects go over the design motivation for Zen and describe its internals including the API, type system, and HBase backend.
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
CCSMap is a new data structure introduced by Alibaba to improve the performance of HBase. It aims to reduce the overhead of the default Java ConcurrentSkipListMap (CSLM) data structure and improve young garbage collection times. CCSMap chunks data into fixed size blocks for better memory management and uses direct pointers between nodes for faster access. It also provides various configuration options. Alibaba has achieved significant performance gains using CCSMap in HBase, including reduced young GC times, and it continues working to integrate CCSMap further and add new features.
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.
This document discusses ways to reduce the mean time to recovery (MTTR) in HBase to below 1 minute. It outlines improvements made to failure detection, region reassignment, and data recovery processes. Faster failure detection is achieved by lowering ZooKeeper timeouts to 30 seconds from 180. Region reassignment is made faster through parallelism. Data recovery is improved by rewriting the recovery process to directly write edits to regions instead of HDFS. These changes have reduced recovery times from 10-15 minutes to less than 1 minute in tests.
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
Swarnim Kulkarni (Cerner)
Cerner has been an active consumer of HBase for a very long time, storing petabytes of healthcare data in its multiple isolated HBase clusters. This talk will walk through the design of Cerner's enterprise data hub with a focus on the multi-tenant HBase as a service offering within the hub.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
Solbase is an exciting new open-source, real-time search engine being developed at Photobucket to service the over 30 million daily search requests Photobucket handles. Solbase replaces Lucene’s file system-based index with HBase. This allows the system to update in real-time and linearly scale to serve millions of daily search requests on a large dataset. This session will explore the architecture of Solbase as well as some of Lucene/Solr’s inherent issues we overcame. Finally, we’ll go over performance metrics of Solbase against production traffic.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
eBay marketplace has been working hard on the next generation search infrastructure and software system, code-named Cassini. The new search engine processes over 250 million search queries and serves more than 2 billion page views each day. Its indexing platform is based on Apache Hadoop and Apache HBase. Apache HBase is a distributed persistent layer built on Hadoop to support billions of updates per day. Its easy sharding character, fast writes, and table scans, super fast data bulk load, and natural integration to Hadoop provide the cornerstones for successful continuous index builds. We will share with the audience the technical details and share the difficulties and challenges that we’ve gone through and that we are still facing in the process.
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
This document summarizes Xiaomi's implementation and use of HBase for data storage. It discusses Xiaomi's HBase clusters across multiple public cloud providers and data centers. It also describes Xiaomi's approaches to multi-tenancy, quota and throttling, synchronous replication between clusters, and high availability in the case of node or cluster failures. Synchronous replication provides stronger consistency guarantees but with some performance overhead compared to asynchronous replication.
We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
Apache HDFS, the file system on which HBase is most commonly deployed, was originally designed for high-latency high-throughput batch analytic systems like MapReduce. Over the past two to three years, the rising popularity of HBase has driven many enhancements in HDFS to improve its suitability for real-time systems, including durability support for write-ahead logs, high availability, and improved low-latency performance. This talk will give a brief history of some of the enhancements from Hadoop 0.20.2 through 0.23.0, discuss some of the most exciting work currently under way, and explore some of the future enhancements we expect to develop in the coming years. We will include both high-level overviews of the new features as well as practical tips and benchmark results from real deployments.
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...Cloudera, Inc.
This document discusses using Apache Flume to stream data into Apache HBase. It describes how Flume provides a scalable and flexible way to collect and transport log and event data to HBase. Specifically, it covers the HBase sink plugin for Flume, which allows routing Flume events to HBase tables. It notes that while the initial HBase sink had limitations, the asynchronous HBase sink improved performance by fully utilizing the HBase cluster. Overall, the document presents Flume as a viable alternative to directly writing to HBase and provides flexibility to change schemas without code changes.
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
This document summarizes Jeremy Carroll's presentation on HBase operations at Amazon. It discusses how Amazon uses HBase across 5 clusters with billions of page views. Key points include:
- HBase is deployed on Amazon Web Services using CDH and customized for EC2 instability and lack of rack locality
- Puppet is used to provision nodes and apply custom HDFS and HBase configurations
- Extensive monitoring of the clusters is done using OpenTSDB, Ganglia, and custom dashboards to ensure high availability
- Various techniques are used to optimize performance, handle large volumes, and back up data on EC2 infrastructure.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
2015 GHC Presentation - High Availability and High Frequency Big Data AnalyticsEsther Kundin
This document discusses techniques for achieving high availability and high frequency analytics on big data. It describes handling 2 terabytes of data with 4 billion writes and 140 trillion reads daily within 50ms latency. High availability is achieved through replication across multiple servers and data centers. High frequency is addressed through techniques like garbage collection tuning to reduce latency spikes. The key takeaways are that high availability solves most uptime and performance issues, supporting multiple data centers is needed, and tuning settings is important to maximize performance.
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
HBase has been in production in hundreds of clusters across the CDH/HDP customer base and Cloudera/Hortonworks support it for many years.
In this talk, based on our support experience, we aim to introduce useful information to troubleshoot HBase clusters efficiently. First off, we (Daisuke at Cloudera support) are going to talk about typical log messages and web UI info which we can use for troubleshooting (especially for struggling with performance issues). Since their meanings have been changing over the past versions, we would like to show the difference and improvements as well (e.g. HBASE-20232 for memstore flush, HBASE-16972 for slow scanner, HBASE-18469 for request counter, and also HBASE-21207 for sorting in web UI). We (Toshihiro at Cloudera, a former Hortonworks employee) will also cover some new tools (e.g. HBASE-21926 Profiler Servlet, HBASE-11062 htop, etc.), which should also be useful for performance troubleshooting.
This document discusses tales from the field of deploying and managing HBase production clusters at Cloudera. It introduces the authors and their backgrounds with HBase. It then covers stories about starting production deployments, fixing bugs, and upgrading clusters. Key lessons include properly sizing clusters, managing regions, fixing issues like hotspots or faulty hardware, and automating upgrades. The goal is to help others avoid problems and have fewer tales of their own through best practices.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
In this series of 15-minute technical flash talks you will learn directly from Amazon CloudFront engineers and their best practices on debugging caching issues, measuring performance using Real User Monitoring (RUM), and stopping malicious viewers using CloudFront and AWS WAF.
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
Zen is a storage service built at Pinterest that offers a graph data model of top of HBase and potentially other storage backends. In this talk, Zen's architects go over the design motivation for Zen and describe its internals including the API, type system, and HBase backend.
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
CCSMap is a new data structure introduced by Alibaba to improve the performance of HBase. It aims to reduce the overhead of the default Java ConcurrentSkipListMap (CSLM) data structure and improve young garbage collection times. CCSMap chunks data into fixed size blocks for better memory management and uses direct pointers between nodes for faster access. It also provides various configuration options. Alibaba has achieved significant performance gains using CCSMap in HBase, including reduced young GC times, and it continues working to integrate CCSMap further and add new features.
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.
This document discusses ways to reduce the mean time to recovery (MTTR) in HBase to below 1 minute. It outlines improvements made to failure detection, region reassignment, and data recovery processes. Faster failure detection is achieved by lowering ZooKeeper timeouts to 30 seconds from 180. Region reassignment is made faster through parallelism. Data recovery is improved by rewriting the recovery process to directly write edits to regions instead of HDFS. These changes have reduced recovery times from 10-15 minutes to less than 1 minute in tests.
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
Swarnim Kulkarni (Cerner)
Cerner has been an active consumer of HBase for a very long time, storing petabytes of healthcare data in its multiple isolated HBase clusters. This talk will walk through the design of Cerner's enterprise data hub with a focus on the multi-tenant HBase as a service offering within the hub.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
Solbase is an exciting new open-source, real-time search engine being developed at Photobucket to service the over 30 million daily search requests Photobucket handles. Solbase replaces Lucene’s file system-based index with HBase. This allows the system to update in real-time and linearly scale to serve millions of daily search requests on a large dataset. This session will explore the architecture of Solbase as well as some of Lucene/Solr’s inherent issues we overcame. Finally, we’ll go over performance metrics of Solbase against production traffic.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
eBay marketplace has been working hard on the next generation search infrastructure and software system, code-named Cassini. The new search engine processes over 250 million search queries and serves more than 2 billion page views each day. Its indexing platform is based on Apache Hadoop and Apache HBase. Apache HBase is a distributed persistent layer built on Hadoop to support billions of updates per day. Its easy sharding character, fast writes, and table scans, super fast data bulk load, and natural integration to Hadoop provide the cornerstones for successful continuous index builds. We will share with the audience the technical details and share the difficulties and challenges that we’ve gone through and that we are still facing in the process.
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
This document summarizes Xiaomi's implementation and use of HBase for data storage. It discusses Xiaomi's HBase clusters across multiple public cloud providers and data centers. It also describes Xiaomi's approaches to multi-tenancy, quota and throttling, synchronous replication between clusters, and high availability in the case of node or cluster failures. Synchronous replication provides stronger consistency guarantees but with some performance overhead compared to asynchronous replication.
We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
Apache HDFS, the file system on which HBase is most commonly deployed, was originally designed for high-latency high-throughput batch analytic systems like MapReduce. Over the past two to three years, the rising popularity of HBase has driven many enhancements in HDFS to improve its suitability for real-time systems, including durability support for write-ahead logs, high availability, and improved low-latency performance. This talk will give a brief history of some of the enhancements from Hadoop 0.20.2 through 0.23.0, discuss some of the most exciting work currently under way, and explore some of the future enhancements we expect to develop in the coming years. We will include both high-level overviews of the new features as well as practical tips and benchmark results from real deployments.
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...Cloudera, Inc.
This document discusses using Apache Flume to stream data into Apache HBase. It describes how Flume provides a scalable and flexible way to collect and transport log and event data to HBase. Specifically, it covers the HBase sink plugin for Flume, which allows routing Flume events to HBase tables. It notes that while the initial HBase sink had limitations, the asynchronous HBase sink improved performance by fully utilizing the HBase cluster. Overall, the document presents Flume as a viable alternative to directly writing to HBase and provides flexibility to change schemas without code changes.
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
This document summarizes Jeremy Carroll's presentation on HBase operations at Amazon. It discusses how Amazon uses HBase across 5 clusters with billions of page views. Key points include:
- HBase is deployed on Amazon Web Services using CDH and customized for EC2 instability and lack of rack locality
- Puppet is used to provision nodes and apply custom HDFS and HBase configurations
- Extensive monitoring of the clusters is done using OpenTSDB, Ganglia, and custom dashboards to ensure high availability
- Various techniques are used to optimize performance, handle large volumes, and back up data on EC2 infrastructure.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
2015 GHC Presentation - High Availability and High Frequency Big Data AnalyticsEsther Kundin
This document discusses techniques for achieving high availability and high frequency analytics on big data. It describes handling 2 terabytes of data with 4 billion writes and 140 trillion reads daily within 50ms latency. High availability is achieved through replication across multiple servers and data centers. High frequency is addressed through techniques like garbage collection tuning to reduce latency spikes. The key takeaways are that high availability solves most uptime and performance issues, supporting multiple data centers is needed, and tuning settings is important to maximize performance.
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
HBase has been in production in hundreds of clusters across the CDH/HDP customer base and Cloudera/Hortonworks support it for many years.
In this talk, based on our support experience, we aim to introduce useful information to troubleshoot HBase clusters efficiently. First off, we (Daisuke at Cloudera support) are going to talk about typical log messages and web UI info which we can use for troubleshooting (especially for struggling with performance issues). Since their meanings have been changing over the past versions, we would like to show the difference and improvements as well (e.g. HBASE-20232 for memstore flush, HBASE-16972 for slow scanner, HBASE-18469 for request counter, and also HBASE-21207 for sorting in web UI). We (Toshihiro at Cloudera, a former Hortonworks employee) will also cover some new tools (e.g. HBASE-21926 Profiler Servlet, HBASE-11062 htop, etc.), which should also be useful for performance troubleshooting.
This document discusses tales from the field of deploying and managing HBase production clusters at Cloudera. It introduces the authors and their backgrounds with HBase. It then covers stories about starting production deployments, fixing bugs, and upgrading clusters. Key lessons include properly sizing clusters, managing regions, fixing issues like hotspots or faulty hardware, and automating upgrades. The goal is to help others avoid problems and have fewer tales of their own through best practices.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
In this series of 15-minute technical flash talks you will learn directly from Amazon CloudFront engineers and their best practices on debugging caching issues, measuring performance using Real User Monitoring (RUM), and stopping malicious viewers using CloudFront and AWS WAF.
This document provides an overview of Cassandra, a decentralized structured storage model. Some key points:
- Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It provides high availability with no single point of failure.
- Cassandra's data model is based on Dynamo and BigTable, with data distributed across nodes through consistent hashing. It uses a column-based data structure with rows, columns, column families and supercolumns.
- Cassandra was originally developed at Facebook to address issues of high write throughput and latency for their inbox search feature, which now stores over 50TB of data across 150 nodes.
- Other large companies using Cassandra include Netflix, eBay
The document discusses best practices for architecting applications in AWS. It recommends choosing AWS regions based on factors like proximity, availability of services, and cost. It also recommends building security into every layer, leveraging different storage options like S3 and DynamoDB, implementing elasticity through auto-scaling, using caching to improve performance, and designing systems to be fault-tolerant by eliminating single points of failure and using features like multiple availability zones.
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
Learning Objectives:
- Understand how to build a serverless big data solution quickly and easily
- Learn how to discover and prepare all your data for analytics
- Learn how to query and visualize analytics on all your data to create actionable insights
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...Amazon Web Services
This document summarizes a presentation about implementing cross-region replication in Amazon DynamoDB. It includes:
1. An introduction to DynamoDB and replication patterns using DynamoDB Streams and the AWS Lambda service.
2. Details about Under Armour's use of DynamoDB cross-region replication to distribute user data across regions while complying with data residency requirements.
3. Their experience so far with the current solution and next steps to improve latency, reliability, and support for concurrent writes across regions.
This document discusses WANdisco's Non-Stop Hadoop solution, which provides continuous availability of Hadoop across local and wide area networks using an active-active replication technique. It addresses key problems with multi-cluster Hadoop deployments like lack of 100% uptime and challenges sharing data globally. The solution utilizes WANdisco's patented distributed coordination engine to achieve consensus across data centers for metadata operations and absolute consistency. Use cases highlighted include eliminating single point of failures, enabling parallel data ingest across locations, optimizing resource utilization through cluster zoning, and achieving near-zero RTO disaster recovery.
Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. In this session we covered ,modeling of data using NOSQL cosmos database and how it's helpful for distributed application to maintain high availability ,scaling in multiple region and throughput.
DocumentDB is a fast, globally distributed, multi-model NoSQL database service. It provides automatic scaling of storage and throughput, high availability across regions, flexible data models, and developer productivity with support for SQL and JavaScript queries. Customers can use DocumentDB for building scalable applications that need to handle large volumes of data across any number of regions worldwide with low latency and high availability.
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
Virtualizing big data platforms like Hadoop provides organizations with agility, elasticity, and operational simplicity. It allows clusters to be quickly provisioned on demand, workloads to be independently scaled, and mixed workloads to be consolidated on shared infrastructure. This reduces costs while improving resource utilization for emerging big data use cases across many industries.
This document discusses containerization and the Docker ecosystem. It begins by describing the challenges of managing different software stacks across multiple environments. It then introduces Docker as a solution that packages applications into standardized units called containers that are portable and can run anywhere. The rest of the document covers key aspects of the Docker ecosystem like orchestration tools like Kubernetes and Docker Swarm, networking solutions like Flannel and Weave, storage solutions, and security considerations. It aims to provide an overview of the container landscape and components.
The 3.0 release of the Maginatics Cloud Storage Platform (MCSP) includes great improvements in Data Protection, Multi-tier Caching and APIs, as well as other significant new features that make Maginatics the ideal choice for enterprise businesses with demanding storage requirements.
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.
This webinar will show you examples of how to use Amazon EMR to with the MapR Distribution for Hadoop. You will learn how you can free yourself from the heavy lifting required to run Hadoop on-premises, and gain the advantages of using the cloud to increase flexibility and accelerate projects while lowering costs.
What we'll learn:
• See a live demonstration of how you can quickly and easily launch your first Hadoop cluster in a few steps.
• Examples of real world applications and customer successes in production
• Best practices for maximizing the benefits of using MapR with AWS.
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
- Quick review of Cassandra functionality that applies to this use case
- Common Data Center and application architectures for highly available inventory applications, and why the were designed that way
- Cassandra implementations vis-a-vis infrastructure capabilities
The impedance mismatch: compromises made to fit into IT infrastructures designed and implemented with an old mindset
This document provides an overview and agenda for a presentation on Amazon ElastiCache. The presentation will discuss why in-memory data stores are important for modern applications that require real-time performance. It will then introduce Amazon ElastiCache as a fully managed in-memory cache in the cloud, supporting the Redis and Memcached protocols. Finally, it will cover several common use cases for ElastiCache including caching, leaderboards, chat/messaging, ratings, and rate limiting.
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
The document discusses the future of software-defined storage in 3 years. It predicts that storage media will continue to advance with higher capacities and lower latencies using technologies like 3D NAND and NVDIMMs. Networking and interconnects like NVMe over Fabrics will allow disaggregated storage resources to be pooled and shared across servers. Software-defined storage platforms will evolve to provide common services for distributed data platforms beyond just block storage, with advanced data placement and policy controls to optimize different workloads.
VMworld Europe 2014: Virtual SAN Best Practices and Use CasesVMworld
This document provides an overview and agenda for a presentation on VMware Virtual SAN. It discusses key features of Virtual SAN including its software-defined storage approach and hybrid storage using SSD and HDD. Several use cases are reviewed like virtual desktop infrastructure, remote office/branch office, and DMZ/isolated environments. Best practices are also covered for various use cases around sizing, policies, and ready nodes. The document aims to introduce attendees to Virtual SAN capabilities and considerations for different deployment scenarios.
Application Scalability in Server Farms - NCacheAlachisoft
NCache is an in-memory caching solution by Alachisoft that improves application scalability and performance by reducing database trips and storing frequently accessed data in memory to provide better performance. It is also used to cache session data in web farms.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
9. Disk vs. SSD vs. Memory
From “Scalable Web Architecture And Distributed Systems”
From The Pathologies of Big Data : reference site
Cache : Distributed vs. Global
11. Cache for performance
local cache Nodes with local cache
From “Scalable Web Architecture And Distributed Systems”
Cache : Distributed vs. Global
12. Cache Distributed vs. Global
Distributed cache Global cache
From “Scalable Web Architecture And Distributed Systems”
Cache : Distributed vs. Global
13. Distributed cache features
• Not be same data that
request nodes have.
• Between request nodes
communicate cached
data. And then data is
same.
• The more the number of
nodes is increased traffic.
Distributed Cache
From “Scalable Web Architecture And Distributed Systems”
Cache : Distributed vs. Global
14. Global cache Features
• All request nodes have
same cached data.
• Between request nodes do
not communicate cached
data.
Global Cache
From “Scalable Web Architecture And Distributed Systems”
Cache : Distributed vs. Global
15. Global Cache
Responsible global cache Simple global cache
From “Scalable Web Architecture And Distributed Systems”
1. For each request, the Request Node will
check the global cache first.
2. If the data was not in the cache, the cache
will pull the data from the origin, and then
add the result to the cache for future
requests by the Global Cache.
1. For each request, the Request Node will
check the global cache first.
2. If the data was not in the cache then the
Request Node will retrieve it from the
origin.
3. When data is retrieved from the origin, it
can be added to the cache by the Request
node.
Cache : Distributed vs. Global
24. Plandas Cache Cloud
1. Plandas Architecture
2. HA: Auto Fail-Over (Client/Server)
3. Web (Admin) Management
4. If you want, support immediately
5. Similar to Redis API
41. Plandasj API
Category Methods
Keys exists,persist,type,ttl,del, etc.
Strings set,get,setbit,getbit,decrBy,incrBy, etc.
Hashes hset,hget,hmset,hmset,hincrBy,hdel, etc.
Lists rpush,lpush,llen,lrem,lpop,rpop, etc.
Sets sadd,spop,smembers, etc.
SortedSets Zadd, zcard, zcount, zincrby, zrange, zrangeByScore , etc.
But Not Support aggregation methods and the method that has gt 2 keys.
Ex) Sets’s sinter, sinterstore, sunion….
43. Test environment on AWS
• Test tool : ngrinder 3.1
• agent vm spec : 10ea (m1.large)
• Cache Cloud (node 개수가 많을수록 tps 는 증가함)
• vm spec : 3ea
• nodes per vm : 2ea
• total nodes : 6ea
51. NEXT
• Support all most languages (Proxy Server) : Completed
• Multi-IDC: Completed
• Spatial / Data Store Cloud
• from Cache to Data Store
52. I skate to where the puck is going to be,
not where it has been.
웨인 그레츠키(NFL)
Editor's Notes
Plandas / Cache Cloud
시작. Concept
Cache 구성 모델.
서비스상의 Cache
Plandas
Concept 에서 출발한 처음 목표는 Cache 입니다.
Cache 관한 내용.
일반적인 Web Service Architecture.
Data -> Database, No-SQL, remote resource.
사용자는 원하는 특정 정보를 일반적으로 화면에 보여주길 원한다.
일반적으로 DBMS에 데이타를 저장하고 조회하는 구조.
결국 DBMS의 성능이 전체 시스템에 성능을 좌우하게 된다.
데이타의 저장위치에 따라 Random / Sequential Access 를 기준으로
성능을 보여주는 그림입니다. 상대적 비교를 보시면 …
Random : 1 vs 6 vs 12만.
Sequential : 1.2 vs 1 vs 8.5
사용자는 원하는 특정 정보를 일반적으로 화면에 보여주길 원한다.
일반적으로 DBMS에 데이타를 저장하고 조회하는 구조.
결국 DBMS의 성능이 전체 시스템에 성능을 좌우하게 된다.
데이타를 메모리에 가지고 있는 가격대 성능으로 보면 미리 가지고 있는 것.
App Server의 메모리에 미리 가지고 있으면 데이타 조회를 위해 먼길을 갈 필요가 없습니다.
미리 데이타를 가지고 있는 시스템 구조상의 패턴은 두가지로 나눌 수 있음.
앞의 App server가 가지고 있는 방법. 또는 별도로 저장소를 곧 메모리 저장소에 넣어두는 방법.
App Server에서 Cache 하는 구조의 가장 문제는
App server가 Round Robin 으로 Stateless하게 수행을 한다는 것
따라서 노드사이의 데이타가 동일해야 함으로 서로 데이타를 주고 받는 비용이 발생.
Global cache 는 네트워크 비용은 발생하지만 데이타를 한곳에 Cache 함.
설계시 네트워크 비용을 감안하여 설계한다면 Best 솔루션이 된.
가장 대표적으로 Session 을 저장하는 모델.
단점은 Cache 되는 데이타를 넣고 빼는 부분에 신경을 써야한다.
좀더 Global Cache 모델을 살펴보면.
Responsible 모델과 simple모델로 나눌수 있음.
Responsible : dbms’s buffer :result cache and web cache (squid , varnish)
Global : redis, memcached
그럼 앞에서 주구장창 말한 Cache는 어디에 사용할 수 있을까.
가장 떠오르는 모델은 DBMS의 성능을 향상시키는 모델.
보통 create, update, delete 와 read의 비율은 1:9 임.
Read의 성능을 개선하기 위해 DB를 튜닝하고 join을 효율화 하는 등.
DBMS의 성능은 기본적을 한계를 갖는다. 따라서 많은 비용을 소비하는 read를 효율화 해야한다.
Redis 모델은 Data Store 지원.
Plandas 의 구성으로 이루어지는 Redis 의 경운.
In memory key-value DataBase.
MQ상에서 pub/sub 모델에서 Message Queue 에서도 메세지를 저장할 때 DBMS를 사용한다.
동일한 성능 문제가 발생. Bottleneck 이 된다.
Cache 모델도 Data Store 모델로도 대치 가능하다.
그럼 끝나는게 아니고 다양한 서비스에서 동시 접근이 가능하게도 가능.
Cache 를 넘어가 Sharing. 더나아가 Data Store로도.
Plandas의 Goal은 Memory Data Store 임. 아직 좀 먼 이야기지만
먼저 Cache 에 중점을 두고 만듬.
Plandas에 대한 설명 순서.
Architecture
HA : 절대중요 Fail-Over
간단하게 관리.
즉시 서비스 사용가능
사용자 익숙성.
크게 3가지 layer.
중요핵심은 Library, Distributor
서비스 무장애 Lib, Coordinator, Distributor (Container), Storage.
Flow 로 살펴보면.
0 lookup dns .
1->2->3->4->5->6
분산 CS . Key hash function.
다음장.
Ketama consistent hashing ring
에서 virtual node 를 만개~수십만개 만들고
Key에 대해서 function 적용하여 분산.
심플하게는 mod 함수 생각.
다음HA.
Client Fail-Over 를 보면
Client 내에 Active Address 관리.
Threshold
Invalid address 제거, valid address 사용.
Cache Node는 저장되는 데이타를 분산하여 저장(continuum)
Hash function 에 의해 결정된다. (실제로는 p.m 에서 v.n 어려개..만개이상)
장비상의 장애 발생시 Fail-Over는 Client와 Server가 동시에 진행.
앞에 언급한 client fail-over. Server side fail-over.
간단하게 정리 Plandas는 재정리.
Cloud(?) 는 뒤에서 web service로 관리.
Admin 위한 configuration management
서비스 에게도 트래픽 모니터링을 제공.
통합 솔루션.
Web Admin 은 웹 서비스용이고
실제로는 management server 가 그 역할을 수행.
통계.
CSMap 관리를 통한 Cache node 관리.
Admin 에게 Container Server 정보를 조회 할 수 있도록함.
서비스 생성은
Admin에서 Service Code , 접근을 위한 auth code 를 만들고
Connect 할 cache node만 있으면 됨.
다음장은 Redis command 를 제공하는 API에 대해서.
Redis java client Jedis와
전용 Client Plandasj 를 비교하고 있음.
Connection , Trigger, Pub/sub, Server 은 지원안함.
또한 Aggregation func(union, intersection )
Jedis 와 거의 동일한 API를 제공함.
알파는 .. Redis의 다양한 언어 Client 를 제공하기 위한 다음 계획임.
그럼 성능은 어떨까요
AWS, Physical Machine (24 core, memory 32G)
Ngrinder 를 이용하여 진행함.
Scale up가능.
AWS 에서 Data 사이즈별 추이
AWS Instance의 시간별 편차가 있음을 유의해야 하며 AWS Instance 성능(VM) 에 영향받음.
Physical Machine 인 경우 2~4배정도 성능을 보이며 Network Latency 가 거의 발생하지 않아 영향이 없다.
TPS 맞은편 숫자는 ms 단위의 응답시간입니다. (mtt = 응답시간)
참고> 아래 그래프에서 vuser는 mtt가 짧은 경우 더 많은 tps 를 처리한다.
AWS 에서 Data 사이즈별 추이
AWS Instance의 시간별 편차가 있음을 유의해야 하며 AWS Instance 성능(VM) 에 영향받음.
Physical Machine 인 경우 2~4배정도 성능을 보이며 Network Latency 가 거의 발생하지 않아 영향이 없다.
TPS 맞은편 숫자는 ms 단위의 응답시간입니다. (mtt = 응답시간)
참고> 아래 그래프에서 vuser는 mtt가 짧은 경우 더 많은 tps 를 처리한다.
Nodes 가용량측정 : 일정한 리소스 투입으로 일정량의 가용성이 증가하는지 측정함.
기준되는 1 부하에 80 TPS(3.2ms)
1개 150 부하에 1만 2천 TPS , 3.6 ms 성능
2개 300 부하에 2만 2천 TPS , 6 ms 성능
3개 450 부하에 3만 2천 TPS , 8.3 ms 성능
한대의 세팅에서 2대이상으로 올라가는 경우 네트워크 비용이 발생한다.
AWS임을 고려하면 대략 2ms 정도가 응답시간에 영향을 주게 된다.
음. 장비증설이 가용량 증가분에 정비례하진? 않다? 는 결론이 나고 . 그 기울기는 ?
가용 용량을 공유하는 자원을 크게 가질수 있는 대신 네트웍 비용은 조금이나마 조금씩 증가하는 원리이다.
그럼 다음은 P.M
앞에서 AWS에서 테스트와 동일한 진행을.
결론적으로 기대TPS가 100% 만족되는 결과를 가져옴.
그래프화 해서 보면 .
왼쪽 수치는 TPS
오른쪽은 ms 단위의 응답시간입니다.
4k -10k 저장 값의 사이즈에 따른 성능을 한눈에 볼 수 있음.
보라색 점선은 1k 사이즈에 대한 평균치라고 보시면 됨.