The document discusses performance tuning for HBase. It provides tips for garbage collection tuning, using memstore local allocation buffers, compression, optimizing splits and compactions, load balancing, merging regions, best client API practices, and configuration settings. It also recommends using tools like the HBase PerformanceEvaluation tool and YCSB for load testing HBase.
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)Jungsik Choi
Recently, with the emergence of low-latency NVM storage, software overhead has become a greater bottleneck than storage latency, and memory mapped file I/O has gained attention as a means to avoid software overhead. However, according to our analysis, memory mapped file I/O incurs a significant amount of additional overhead. To utilize memory mapped file I/O to its true potential, such overhead should be alleviated. We propose map-ahead, mapping cache, and extended madvise techniques to maximize the performance of memory mapped file I/O on low-latency NVM storage systems. This solution can avoid both page fault overhead and page table entry construction overhead. Our experimental results show throughput improvements of 38–70% in microbenchmarks and performance improvements of 6–18% in real applications compared to existing memory mapped I/O mechanisms.
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)Jungsik Choi
Recently, with the emergence of low-latency NVM storage, software overhead has become a greater bottleneck than storage latency, and memory mapped file I/O has gained attention as a means to avoid software overhead. However, according to our analysis, memory mapped file I/O incurs a significant amount of additional overhead. To utilize memory mapped file I/O to its true potential, such overhead should be alleviated. We propose map-ahead, mapping cache, and extended madvise techniques to maximize the performance of memory mapped file I/O on low-latency NVM storage systems. This solution can avoid both page fault overhead and page table entry construction overhead. Our experimental results show throughput improvements of 38–70% in microbenchmarks and performance improvements of 6–18% in real applications compared to existing memory mapped I/O mechanisms.
moxi is a memcached proxy with several features which can help keep the memcached contract whole in complicated environments. It also brings several optimizations to memcached deployments, without requiring any changes to the application software using memcached. For more, visit: http://labs.northscale.com/moxi/
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
The latest Apache HBase releases, 0.92 and 0.94, contain many improvements over prior releases in terms of correctness and performance improvements. We discuss a couple of these improvements from a development and operations perspective. For correctness, we discuss the ACID guarantees of HBase, give a case study of problems with earlier releases, and give an overview of the implementation internals that were improved to fix the issues. For performance, we discuss recent improvements in 0.94 and how to monitor the performance of a cluster with new metrics.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Existing users of HBase/Phoenix as well as operators managing HBase clusters will benefit the most where they can learn about the new release and the long list of features. We will also briefly cover earlier 1.x release lines and compatibility and upgrade paths for existing users and conclude by giving an outlook on the next level of initiatives for the project.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
HBase has been in production in hundreds of clusters across the HDP customer base. In this talk, we will go over the best practices and lessons learned in supporting these clusters over the years. We will cover top 10 recurring issues like ZooKeeper, GC, number of regions, HBCK, operating system and coprocessor related issues and more across 1000+ support tickets. We will also cover common solutions and lessons learned and go into details of how to tune, monitor and operate your clusters. We will then cover some of the improvements that we have been adding to HBase and Ambari for easing up some of the pain.
Digital Library Collection Management using HBaseHBaseCon
Speaker: Ron Buckley (OCLC)
OCLC has been working over the last year to move its massive repository to HBase. This talk will focus on the impetus behind the move, implementation details and technology choices we've made (key design, shredding PDFs and other digital objects into HBase, scaling), and the value-add that HBase brings to digital collection management.
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
HBASE is the leading NoSQL database. Tightly integrated with Hadoop ecosystem, it offers random, real-time read/write capabilities on billions of rows and millions of columns. Apache Phoenix offers a SQL interface to HBASE, opening HBase to large community of SQL developers and enabling inter-operability with SQL compliant applications. The session will cover the essentials of HBASE and provide an in-depth insight into Apache Phoenix. Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community. Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=de6d0c435c0761adedf3114a100e7483%20
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
moxi is a memcached proxy with several features which can help keep the memcached contract whole in complicated environments. It also brings several optimizations to memcached deployments, without requiring any changes to the application software using memcached. For more, visit: http://labs.northscale.com/moxi/
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
The latest Apache HBase releases, 0.92 and 0.94, contain many improvements over prior releases in terms of correctness and performance improvements. We discuss a couple of these improvements from a development and operations perspective. For correctness, we discuss the ACID guarantees of HBase, give a case study of problems with earlier releases, and give an overview of the implementation internals that were improved to fix the issues. For performance, we discuss recent improvements in 0.94 and how to monitor the performance of a cluster with new metrics.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Existing users of HBase/Phoenix as well as operators managing HBase clusters will benefit the most where they can learn about the new release and the long list of features. We will also briefly cover earlier 1.x release lines and compatibility and upgrade paths for existing users and conclude by giving an outlook on the next level of initiatives for the project.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
HBase has been in production in hundreds of clusters across the HDP customer base. In this talk, we will go over the best practices and lessons learned in supporting these clusters over the years. We will cover top 10 recurring issues like ZooKeeper, GC, number of regions, HBCK, operating system and coprocessor related issues and more across 1000+ support tickets. We will also cover common solutions and lessons learned and go into details of how to tune, monitor and operate your clusters. We will then cover some of the improvements that we have been adding to HBase and Ambari for easing up some of the pain.
Digital Library Collection Management using HBaseHBaseCon
Speaker: Ron Buckley (OCLC)
OCLC has been working over the last year to move its massive repository to HBase. This talk will focus on the impetus behind the move, implementation details and technology choices we've made (key design, shredding PDFs and other digital objects into HBase, scaling), and the value-add that HBase brings to digital collection management.
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
HBASE is the leading NoSQL database. Tightly integrated with Hadoop ecosystem, it offers random, real-time read/write capabilities on billions of rows and millions of columns. Apache Phoenix offers a SQL interface to HBASE, opening HBase to large community of SQL developers and enabling inter-operability with SQL compliant applications. The session will cover the essentials of HBASE and provide an in-depth insight into Apache Phoenix. Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community. Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=de6d0c435c0761adedf3114a100e7483%20
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
BigBase is a read-optimized version of HBase NoSQL data store and is FULLY, 100% HBase compatible. 100% compatibility means that the upgrade from HBase to BigBase and other way around does not involve data migration and even can be made without stopping the cluster (via rolling restart).
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
Speaker: Vladimir Rodionov (bigbase.org)
This talks introduces a totally new implementation of a multilayer caching in HBase called BigBase. BigBase has a big advantage over HBase 0.94/0.96 because of an ability to utilize all available server RAM in the most efficient way, and because of a novel implementation of a L3 level cache on fast SSDs. The talk will show that different type of caches in BigBase work best for different type of workloads, and that a combination of these caches (L1/L2/L3) increases the overall performance of HBase by a very wide margin.
In this session you will learn:
HBase Introduction
Row & Column storage
Characteristics of a huge DB
What is HBase?
HBase Data-Model
HBase vs RDBMS
HBase architecture
HBase in operation
Loading Data into HBase
HBase shell commands
HBase operations through Java
HBase operations through MR
To know more, click here: https://www.mindsmapped.com/courses/big-data-hadoop/big-data-and-hadoop-training-for-beginners/
This is a copy of the NoSQL Day 2019 session presented in Washington D.C on May 2019. It covers a series of the most common HBase issues observed among Cloudera customer base, together with RCA and recipes for recovery.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths.
Speaker
Ankit Singhal, Member of Technical Staff, Hortonworks
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
HBase has been in production in hundreds of clusters across the CDH/HDP customer base and Cloudera/Hortonworks support it for many years.
In this talk, based on our support experience, we aim to introduce useful information to troubleshoot HBase clusters efficiently. First off, we (Daisuke at Cloudera support) are going to talk about typical log messages and web UI info which we can use for troubleshooting (especially for struggling with performance issues). Since their meanings have been changing over the past versions, we would like to show the difference and improvements as well (e.g. HBASE-20232 for memstore flush, HBASE-16972 for slow scanner, HBASE-18469 for request counter, and also HBASE-21207 for sorting in web UI). We (Toshihiro at Cloudera, a former Hortonworks employee) will also cover some new tools (e.g. HBASE-21926 Profiler Servlet, HBASE-11062 htop, etc.), which should also be useful for performance troubleshooting.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
7. Optimizing Splits and Compactions
• Split
split major_compact
hbase.hregion.max.filesize region
ex.100GB
• Region Hotspotting
Region Region Hotspotting
Key
Sequential ID NG
Random Key
7
11. Client API: Best Practices
• Java Client API
Disable auto-flush
flushCommits()
Use scanner-caching
scan NG
Limit scan scope
family
Close ResultScanners
Block cache usage
Optimal loading of row keys
filter
Turn off WAL on Puts
...
11