HBase-2.0.0 has been a couple of years in the making. It is chock-a-block full of a long list of new features and fixes. In this session, the 2.0.0 release manager will perform the impossible, describing the release content inside the session time bounds.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
Xinxin Fan and Hongxiang Jiang
First, we will give a brief introduction about the HBase service at Netease,include the basic cluster info and the key HBase service. And then we will talk same tips about the tuning practices for HBase. Last, we will introduce some improvements at the internal HBase version.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
Shuaifeng Zhou
When we do real-time data loading to HBase, we use put/putlist interface. After receiving put request, regionserver will write WAL, write data into memory store, flush memory store to disk-store, then compact files again and again. That precedure occupies too much resource and causing read/write performance decrease. To solve the problem, we provide a kind of near-line loading method and architecture, greatly increase the loading bandwidth, and decrease the influence to read operations.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
Zheng Hu
We'll share some HBase experience at XiaoMi:
1. How did we tuning G1GC for HBase Clusters.
2. Development and performance of Async HBase Client.
hbaseconasia2017 hbasecon hbase xiaomi https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
gohbase is an implementation of an HBase client in pure Go: https://github.com/tsuna/gohbase. In this presentation we'll talk about its architecture and compare its performance against the native Java HBase client as well as AsyncHBase (http://opentsdb.github.io/asynchbase/) and some nice characteristics of golang that resulted in a simpler implementation.
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
Xinxin Fan and Hongxiang Jiang
First, we will give a brief introduction about the HBase service at Netease,include the basic cluster info and the key HBase service. And then we will talk same tips about the tuning practices for HBase. Last, we will introduce some improvements at the internal HBase version.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
Shuaifeng Zhou
When we do real-time data loading to HBase, we use put/putlist interface. After receiving put request, regionserver will write WAL, write data into memory store, flush memory store to disk-store, then compact files again and again. That precedure occupies too much resource and causing read/write performance decrease. To solve the problem, we provide a kind of near-line loading method and architecture, greatly increase the loading bandwidth, and decrease the influence to read operations.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
Zheng Hu
We'll share some HBase experience at XiaoMi:
1. How did we tuning G1GC for HBase Clusters.
2. Development and performance of Async HBase Client.
hbaseconasia2017 hbasecon hbase xiaomi https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
gohbase is an implementation of an HBase client in pure Go: https://github.com/tsuna/gohbase. In this presentation we'll talk about its architecture and compare its performance against the native Java HBase client as well as AsyncHBase (http://opentsdb.github.io/asynchbase/) and some nice characteristics of golang that resulted in a simpler implementation.
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
The strength of an open source project resides entirely in its developer community; a strong democratic culture of participation and hacking makes for a better piece of software. The key requirement is having developers who are not only willing to contribute, but also knowledgeable about the project’s internal structure and architecture. This session will introduce developers to the core internal architectural concepts of HBase, not just “what” it does from the outside, but “how” it works internally, and “why” it does things a certain way. We’ll walk through key sections of code and discuss key concepts like the MVCC implementation and memstore organization. The goal is to convert serious “HBase Users” into HBase Developer Users”, and give voice to some of the deep knowledge locked in the committers’ heads.
Speakers: Liang Xie and Honghua Feng (Xiamoi)
This talk covers the HBase environment at Xiaomi, including thoughts and practices around latency, hardware/OS/VM configuration, GC tuning, the use of a new write thread model and reverse scan, and block index optimization. It will also include some discussion of planned JIRAs based on these approaches.
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
Infrastructure failures are a given in the cloud, but in a multi-tenant environment separating those failures from usage can be a challenge. I'll be presenting data gathered from over a hundred region server failures at HubSpot along with what we've done to improve our MTTR and what we're contributing back to the community. Covered topics will include separating usage-related failures from infrastructure and hardware failures, as well as steps we've taken to improve MTTR in both scenarios.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.
Date-tiered Compaction Policy for Time-series DataHBaseCon
Clara Xiong (Flurry/Yahoo!)
With petabytes of data on thousands of nodes replicated across multiple data centers, growing at an accelerating rate, we have been running a workload at scale with a bottleneck of IO bandwidth. This talk covers a new compaction policy to improve efficiency for time-range scans of various look-back windows by structuring and maintaining a date-tiered store file layout for time-series data with infrequent updates and deletes.
Yahoo has long been involved in HBase and its community. In 2013, HBase was offered as a hosted service at Yahoo. Since then, adoption has grown rapidly., and today, HBase is used by numerous teams across the company, helping to enable a diverse set of use cases ranging from near real-time processing to data warehousing.
This was made possible thanks to HBase along with some enhancements to support multi-tenancy and scale. As our clusters continue to grow and use cases become more demanding we are working towards supporting a million regions in a single cluster.
In this keynote, we’ll paint a picture of where Yahoo! is today and the enhancements we have been working on to reach today’s scale as well as supporting a million regions and beyond.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
Speakers: Nick Dimiduk (Hortonworks) and Nicolas Liochon (Scaled Risk)
HBase is an online database so response latency is critical. This talk will examine sources of latency in HBase, detailing steps along the read and write paths. We'll examine the entire request lifecycle, from client to server and back again. We'll also look at the different factors that impact latency, including GC, cache misses, and system failures. Finally, the talk will highlight some of the work done in 0.96+ to improve the reliability of HBase.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.
Breaking the Sound Barrier with Persistent Memory HBaseCon
Liqi Yi and Shylaja Kokoori (Intel)
A fully optimized HBase cluster could easily hit the limit of the underlying storage device’s capability, which is beyond the reach of software optimization alone. To get around this constraint, we need a new design that brings data processing and data storage closer together. In this presentation, we will look at how persistent memory will change the way large datasets are stored. We will review the hardware characteristics of 3D XPoint™, a new persistent memory technology with low latency and high capacity. We will also discuss opportunities for further improvement within the HBase framework using persistent memory.
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
Zhiyong Bai
As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapid,and provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment.
hbaseconasia2017 hbasecon hbase
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
Solbase is an exciting new open-source, real-time search engine being developed at Photobucket to service the over 30 million daily search requests Photobucket handles. Solbase replaces Lucene’s file system-based index with HBase. This allows the system to update in real-time and linearly scale to serve millions of daily search requests on a large dataset. This session will explore the architecture of Solbase as well as some of Lucene/Solr’s inherent issues we overcame. Finally, we’ll go over performance metrics of Solbase against production traffic.
Logging at OVHcloud :
Logs Data platform est la plateforme de collecte, d'analyse et de gestion centralisée de logs d'OVHcloud. Cette plateforme a pour but de répondre aux challenges que constitue l'indexation de plus de 4000 milliards de logs par une entreprise comme OVHcloud. Cette présentation vous décrira l'architecture générale de Logs Data Platform autour de ses composants centraux Elasticsearch et Graylog et vous décrira les différentes problématiques de scalabilité, disponibilité, performance et d'évolutivité qui sont le quotidien de l'équipe Observability à OVHcloud.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
The strength of an open source project resides entirely in its developer community; a strong democratic culture of participation and hacking makes for a better piece of software. The key requirement is having developers who are not only willing to contribute, but also knowledgeable about the project’s internal structure and architecture. This session will introduce developers to the core internal architectural concepts of HBase, not just “what” it does from the outside, but “how” it works internally, and “why” it does things a certain way. We’ll walk through key sections of code and discuss key concepts like the MVCC implementation and memstore organization. The goal is to convert serious “HBase Users” into HBase Developer Users”, and give voice to some of the deep knowledge locked in the committers’ heads.
Speakers: Liang Xie and Honghua Feng (Xiamoi)
This talk covers the HBase environment at Xiaomi, including thoughts and practices around latency, hardware/OS/VM configuration, GC tuning, the use of a new write thread model and reverse scan, and block index optimization. It will also include some discussion of planned JIRAs based on these approaches.
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
Infrastructure failures are a given in the cloud, but in a multi-tenant environment separating those failures from usage can be a challenge. I'll be presenting data gathered from over a hundred region server failures at HubSpot along with what we've done to improve our MTTR and what we're contributing back to the community. Covered topics will include separating usage-related failures from infrastructure and hardware failures, as well as steps we've taken to improve MTTR in both scenarios.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.
Date-tiered Compaction Policy for Time-series DataHBaseCon
Clara Xiong (Flurry/Yahoo!)
With petabytes of data on thousands of nodes replicated across multiple data centers, growing at an accelerating rate, we have been running a workload at scale with a bottleneck of IO bandwidth. This talk covers a new compaction policy to improve efficiency for time-range scans of various look-back windows by structuring and maintaining a date-tiered store file layout for time-series data with infrequent updates and deletes.
Yahoo has long been involved in HBase and its community. In 2013, HBase was offered as a hosted service at Yahoo. Since then, adoption has grown rapidly., and today, HBase is used by numerous teams across the company, helping to enable a diverse set of use cases ranging from near real-time processing to data warehousing.
This was made possible thanks to HBase along with some enhancements to support multi-tenancy and scale. As our clusters continue to grow and use cases become more demanding we are working towards supporting a million regions in a single cluster.
In this keynote, we’ll paint a picture of where Yahoo! is today and the enhancements we have been working on to reach today’s scale as well as supporting a million regions and beyond.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
Speakers: Nick Dimiduk (Hortonworks) and Nicolas Liochon (Scaled Risk)
HBase is an online database so response latency is critical. This talk will examine sources of latency in HBase, detailing steps along the read and write paths. We'll examine the entire request lifecycle, from client to server and back again. We'll also look at the different factors that impact latency, including GC, cache misses, and system failures. Finally, the talk will highlight some of the work done in 0.96+ to improve the reliability of HBase.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.
Breaking the Sound Barrier with Persistent Memory HBaseCon
Liqi Yi and Shylaja Kokoori (Intel)
A fully optimized HBase cluster could easily hit the limit of the underlying storage device’s capability, which is beyond the reach of software optimization alone. To get around this constraint, we need a new design that brings data processing and data storage closer together. In this presentation, we will look at how persistent memory will change the way large datasets are stored. We will review the hardware characteristics of 3D XPoint™, a new persistent memory technology with low latency and high capacity. We will also discuss opportunities for further improvement within the HBase framework using persistent memory.
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
Zhiyong Bai
As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapid,and provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment.
hbaseconasia2017 hbasecon hbase
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
Solbase is an exciting new open-source, real-time search engine being developed at Photobucket to service the over 30 million daily search requests Photobucket handles. Solbase replaces Lucene’s file system-based index with HBase. This allows the system to update in real-time and linearly scale to serve millions of daily search requests on a large dataset. This session will explore the architecture of Solbase as well as some of Lucene/Solr’s inherent issues we overcame. Finally, we’ll go over performance metrics of Solbase against production traffic.
Logging at OVHcloud :
Logs Data platform est la plateforme de collecte, d'analyse et de gestion centralisée de logs d'OVHcloud. Cette plateforme a pour but de répondre aux challenges que constitue l'indexation de plus de 4000 milliards de logs par une entreprise comme OVHcloud. Cette présentation vous décrira l'architecture générale de Logs Data Platform autour de ses composants centraux Elasticsearch et Graylog et vous décrira les différentes problématiques de scalabilité, disponibilité, performance et d'évolutivité qui sont le quotidien de l'équipe Observability à OVHcloud.
Kat Grigg, Confluent, Senior Customer Success Architect + Jen Snipes, Confluent, Senior Customer Success Architect
This presentation will cover tips and best practices for Apache Kafka. In this talk, we will be covering the basic internals of Kafka and how these components integrate together including brokers, topics, partitions, consumers and producers, replication, and Zookeeper. We will be talking about the major categories of operations you need to be setting up and monitoring including configuration, deployment, maintenance, monitoring and then debugging.
https://www.meetup.com/KafkaBayArea/events/270915296/
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
QuestDB es una base de datos open source de alto rendimiento. Mucha gente nos comentaba que les gustaría usarla como servicio, sin tener que gestionar las máquinas. Así que nos pusimos manos a la obra para desarrollar una solución que nos permitiese lanzar instancias de QuestDB con provisionado, monitorización, seguridad o actualizaciones totalmente gestionadas.
Unos cuantos clusters de Kubernetes más tarde, conseguimos lanzar nuestra oferta de QuestDB Cloud. Esta charla es la historia de cómo llegamos ahí. Hablaré de herramientas como Calico, Karpenter, CoreDNS, Telegraf, Prometheus, Loki o Grafana, pero también de retos como autenticación, facturación, multi-nube, o de a qué tienes que decir que no para poder sobrevivir en la nube.
High-level architecture of a complete MariaDB deploymentFederico Razzoli
In this webinar Federico will illustrate how to design a complete MariaDB setup. This doesn't only include MariaDB, but all the components that are necessary to form a reliable, scalable architecture.
Federico will cover these points:
- MariaDB storage engines
- Asynchronous replication vs Galera
- Load balancing and failover: ProxySQL, MaxScale, HAProxy
- Eliminating SPoFs in the proxy level
- Backup strategies
- Monitoring solutions
Percona XtraBackup - New Features and ImprovementsMarcelo Altmann
Percona XtraBackup is an open-source hot backup utility for MySQL - based servers that doesn't lock your database during the backup. In this talk, we will cover the latest development and new features introduced on Xtrabackup and its auxiliary tools: - Page Tracking - Azure Blob Storage Support - Exponential Backoff - Keyring Components - and more.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
In the last 24 months, MySQL replication speed has improved a lot thanks to implementing parallel replication. MySQL and MariaDB have different types of parallel replication; in this talk, I present in details the different implementations, with their limitations and the corresponding tuning parameters. I also present benchmark results from real Booking.com workloads. Finally, I discuss some deployments at Booking.com that benefits from parallel replication speed improvements.
HashKV aims for efficient updates and GC atop KV separation. This #PaperReading highlights HashKV's design—hash-based data grouping and hotness-awareness, weighs its pros and cons from 5 metrics, and shares some insights into workflow optimization.
Build real time stream processing applications using Apache KafkaHotstar
This talk was presented at the Hotstar Scale Meetup in Bangalore by Jayesh Sidhwani
In this talk, the presenter introduces Apache Kafka and the Apache Kafka Streams library. Starting from the need for building streaming applications to thinking the use-cases as a streaming job - this talk covers all the technicalities.
It ends with a short description of how Kafka is deployed and used at Hotstar
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Jingcheng Du
Apache Beam is an open source and unified programming model for defining batch and streaming jobs that run on many execution engines, HBase on Beam is a connector that allows Beam to use HBase as a bounded data source and target data store for both batch and streaming data sets. With this connector HBase can work with many batch and streaming engines directly, for example Spark, Flink, Google Cloud Dataflow, etc. In this session, I will introduce Apache Beam, and the current implementation of HBase on Beam and the future plan on this.
hbaseconasia2017 hbasecon hbase
https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
Ashish Singhi
HBase Disaster recovery solution aims to maintain high availability of HBase service in case of disaster of one HBase cluster with very minimal user intervention. This session will introduce the HBase disaster recovery use cases and the various solutions adopted at Huawei like.
a) Cluster Read-Write mode
b) DDL operations synchronization with standby cluster
c) Mutation and bulk loaded data replication
d) Further challenges and pending work
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
Tianying Chang
HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
hbaseconasia2017: Ecosystems with HBase and CloudTable service at HuaweiHBaseCon
Jieshan Bi and Yanhui Zhong
1. CTBase: A light-weight HBase client for structured data.
1). Schematized table, more friendly for structured data storage.
2). Global secondary index for HBase.
3). HBase Query DSL. JSON based light-weight API.
4) Cluster table. Pre-joining with keys, a better solution for cross-table join queries from HBase.
2. Tagram: Distributed bitmap index implementation with HBase.
1). Distributed bitmap index for accelerating AD-HOC queries with low cardinality columns.
2). Powerful and flexible query API.
3). Tagram offers millisecond-level query latency.
3. CloudTable Service Introduction: HBase on Huawei cloud.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
As HBase and Hadoop continue to become routine across enterprises, these enterprises inevitably shift priorities from effective deployments to cost-efficient operations. Consolidation of infrastructure, the sum of hardware, software, and system-administrator effort, is the most common strategy to reduce costs. As a company grows, the number of business organizations, development teams, and individuals accessing HBase grows commensurately, creating a not-so-simple requirement: HBase must effectively service many users, each with a variety of use-cases. This is problem is known as multi-tenancy. While multi-tenancy isn’t a new problem, it also isn’t a solved one, in HBase or otherwise. This talk will present a high-level view of the common issues organizations face when multiple users and teams share a single HBase instance and how certain HBase features were designed specifically to mitigate the issues created by the sharing of finite resources.
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
Hundreds of millions of people use Quora to find accurate, informative, and trustworthy answers to their questions. As it so happens, counting things at scale is both an important and a difficult problem to solve.
In this talk, we will be talking about Quanta, Quora's counting system built on top of HBase that powers our high-volume near-realtime analytics that serves many applications like ads, content views, and many dashboards. In addition to regular counting, Quanta supports count propagation along the edges of an arbitrary DAG. HBase is the underlying data store for both the counting data and the graph data.
We will describe the high-level architecture of Quanta and share our design goals, constraints, and choices that enabled us to build Quanta very quickly on top of our existing infrastructure systems.
In the age of NoSQL, big data storage engines such as HBase have given up ACID semantics of traditional relational databases, in exchange for high scalability and availability. However, it turns out that in practice, many applications require consistency guarantees to protect data from concurrent modification in a massively parallel environment. In the past few years, several transaction engines have been proposed as add-ons to HBase; three different engines, namely Omid, Tephra, and Trafodion were open-sourced in Apache alone. In this talk, we will introduce and compare the different approaches from various perspectives including scalability, efficiency, operability and portability, and make recommendations pertaining to different use cases.
In order to effectively predict and prevent online fraud in real time, Sift Science stores hundreds of terabytes of data in HBase—and needs it to be always available. This talk will cover how we used circuit-breaking, cluster failover, monitoring, and automated recovery procedures to improve our HBase uptime from 99.7% to 99.99% on top of unreliable cloud hardware and networks.
In DiDi Chuxing Company, which is China’s most popular ride-sharing company. we use HBase to serve when we have a bigdata problem.
We run three clusters which serve different business needs. We backported the Region Grouping feature back to our internal HBase version so we could isolate the different use cases.
We built the Didi HBase Service platform which is popular amongst engineers at our company. It includes a workflow and project management function as well as a user monitoring view.
Internally we recommend users use Phoenix to simplify access.even more,we used row timestamp;multidimensional table schema to slove muti dimension query problems
C++, Go, Python, and PHP clients get to HBase via thrift2 proxies and QueryServer.
We run many important buisness applications out of our HBase cluster such as ETA/GPS/History Order/API metrics monitoring/ and Traffic in the Cloud. If you are interested in any aspects listed above, please come to our talk. We would like to share our experiences with you.
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon
Both Spark and HBase are widely used, but how to use them together with high performance and simplicity is a very hard topic. Spark HBase Connector(SHC) provides feature rich and efficient access to HBase through Spark SQL. It bridges the gap between the simple HBase key value store and complex relational SQL queries and enables users to perform complex data analytics on top of HBase using Spark.
SHC implements the standard Spark data source APIs, and leverages the Spark catalyst engine for query optimization. To achieve high performance, SHC constructs the RDD from scratch instead of using the standard HadoopRDD. With the customized RDD, all critical techniques can be applied and fully implemented, such as partition pruning, column pruning, predicate pushdown and data locality. The design makes the maintenance very easy, while achieving a good tradeoff between performance and simplicity.
Also, SHC has supported Phoenix data as input to HBase in addition to Avro data. Defaulting to a simple native binary encoding seems susceptible to future changes and is a risk for users who write data from SHC into HBase. For example, with SHC going forward, backwards compatibility needs to be properly handled. So the default, SHC needs to support a more standard and well tested format like Phoenix.
In this talk, we will demo how SHC works, how to use SHC in secure/non-secure clusters, how SHC works with multi-HBase clusters, etc. This talk will also benefit people who use Spark and other data sources (besides HBase) as it inspires them with ideas of how to support high performance data source access at the Spark DataFrame level.
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
In this talk we introduce Apache Beam, a unified model to create efficient and portable data processing pipelines. Beam uses a single set of abstractions to implement both batch and streaming computations that can be executed in different environments, e.g. Apache Spark, Apache Flink and Google Dataflow. Beam not only does data processing, but can be used as a tool to ingest/extract data to/from different data stores including HBase. We will present interaction scenarios between HBase and Beam and explore Beam's Input/Output (IO) model and how we leverage it to provide support for HBase.
Our team is responsible for storage at Xiaomi and we provide storage services for dozens of businesses, such as personal cloud storage for smart phones and user profile data. So we will share some practices and improvements of HBase at Xiaomi:
1: We upgraded most of our cluster from 0.94 to 0.98 in the last year and will share some experience about upgrading.
2: We encountered some problems and made some improvements on replication.
3: We fixed or still fixing some confusing behavior from client side.
4: We introduced some improvements on scan to make users easy to use and reduce the time of RPC requests.
5: We implement an asynchronous hbase client which is an important feature for HBase 2.0.
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing.
by Jason Plurad and Jing Chen He of IBM
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...HBaseCon
There are lots of time series solutions out there, but when we started looking for one which could solve the sensor data problem we had none would fit, so we decided to roll our own.
And here we are, four years later, presenting Warp 10, a complete Open Source platform and toolsuite for managing and analyzing time series data.
This talk will describe the architecture of Warp 10 and specifically how we used HBase to support hundreds of trillions of datapoints, ensure query performance even when accessing series with a high label cardinality, and provide deletion capabilities at a scale we did not anticipate.
We will also dive into the data model and analytics capabilities through the WarpScript language and will describe how Warp 10 manages multi-tier data management and interacts with third party systems.
by Mathias Herbert
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...HBaseCon
Unlike other blockchain technologies that take hours to settle, Ripple can confirm and settle a transaction in seconds with a throughput of a 1000 transactions a second. Designing a real-time analytics solution needed a something that was scalable, reliable and most importantly guaranteed consistency at the speed of availability.
The combination of hBase as storage, and Kafka and Storm as the processing framework allows Ripple to publish transactional activity within milliseconds of committed transaction. Queries are funnelled through a diverse set of hand crafted API. RippleCharts is the public facing visual analytics on nodeJS(D3) that heavily leverages those API, but we also leverage the same resources to provide to our compliance analysts valuable information for investigations and research.
by Abraham Tom and Warren Anderson of Ripple
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...HBaseCon
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
At Yahoo! HBase has been running as a hosted multi-tenant service since 2013. In a single HBase cluster we have around 30 tenants running various types of workloads (ie batch, near real-time, ad-hoc, etc). Typically such a deployment would cause tenant workloads to negatively affect each other because of resource contention (disk, cpu, network, cache thrashing, etc). Using RegionServer Groups we are able to designate a dedicated subset of RegionServers in a cluster to host only tables of a given tenant (HBASE-6721).
Most HBase deployments use HDFS as their distributed filesystem, which in turn does not guarantee that a region’s data is locally available to the hosting regionserver. This poses a problem when providing isolation since the hdfs data blocks may have to be read remotely from a different tenant’s host thus contending for disk or network resources. Favored nodes addresses this problem by providing hints to HDFS on which datanodes data should be stored and only assigns regions to these favored regionservers (HBASE-15531).
We will walk through these features explaining our motivation, how they work as well as our experiences running these multi-tenant clusters. These features will be available in Apache HBase 2.0.
by Francis Liu and Thiruvel Thirumoolan
https://youtu.be/xorEYNyLMbM
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
6. Goals: Compatibility
● Double-down on Semantic Versioning, semver
○ Adopted in hbase-1.0.0
○ MAJOR.MINOR.PATCH[-IDENTIFIER]
■ E.g. 2.0.0-alpha1
7. But….
● Semantic Versioning is about API only
○ What about….
■ Internal/External Interfaces
● Where is Client Interface when Spark/MapReduce
8. API Annotations
● From Hadoop…
○ InterfaceAudience.Public
■ Get/Put/Scan/Connection
○ InterfaceAudience.LimitedPublic
■ Coprocessors, Replication, etc.
○ InterfaceAudience.Private
■ Internal only
9. But (Again)...
● What about…
○ Source/Binary compatibility
○ Serializations
■ Wire
■ Formats in HDFS/Zookeeper
○ Dependencies
● See refguide semver section
10. But still….
● Grey areas…
○ Protobufs
■ Protobufs in API
■ Protobufs as Interface (c++/go clients)
11. Goal: Rolling Upgrade 1.x to 2.x
● SemVer for DML at least in 2.x
○ No admin of hbase-2.x with hbase-1.x client
○ Coprocessors will need to be upgraded
● hbase-1.x client can work against hbase-2.x cluster
○ Even 1.x Coprocessor Endpoints will work with an hbase-2.x cluster
● No Singularity! No downtime.
12. Other Goals
● Scale
○ More Regions, bigger clusters
● Performance
○ Inline read/write but also macro restart, assign, etc.
○ Better resource utilization
■ I/O, RAM
● Fix primary root of operational woes/bugs
○ Master Region Assignment
● Cleanup
○ Spark narrative
○ Interfaces
16. Inside: Main Features
● New Master Core (A.K.A AMv2)
○ Prompt assign of millions of Regions, faster startup, larger scale!
○ Many small Regions instead of a few big ones
○ Promise: new degree of Resilience
● Offheap Read/Write path
○ Less JVM
● Accordion
○ In-memory compaction
○ Less write amplification
● New Async Client
● etc.
17. Assignment Manager
● Assignment Manager v1 root of many operational headaches
● Redo based on custom “ProcedureV2”-based State Machine
○ Scale/Performance
○ All Master ops recast as Pv2 procedures
■ E.g. Move is a Procedure composed of an Unassign and Assign subprocedure...
● Operation aggregation
● Entity locking mechanism
○ Isolate operations on tables, regions, servers
○ Parallelization
● One hbase:meta writer, the Master only
18. Assignment Manager
● No more intermediate state in ZK
○ At other end of an RPC...
○ Record intent in local Master ops WAL, persisted up in HDFS
○ Only final state published to hbase:meta
● No more HBCK, no more RIT
○ No more distributed state
● Faster Assign
○ PE AMv2
● Standalone Testable
19. Offheap
● Smaller JVM heaps, less copying
○ More accounting!
● Offheap Read Path
○ L1 on Heap for Indexes & Blooms
○ L2 off Heap via BucketCache for Data
■ Big
○ Better latency
■ Cache more
■ Less GC, less erratic
● Offheap Write Path
○ RPC=>HDFS data kept offheap (+Async WAL client)
● Follow-ons:
○ BucketCache always-on
20. Accordion
● In-memory LSM
○ In-memory flush from ConcurrentSkipListMap to read-only pipeline of ‘segments’
■ Memory ‘breathes’ like an accordion’s bellows...
■ Optional in-memory compaction of Segments
● Prune deletes/versions early
● Benefit depends on # of versions
■ Less flushing, write amplification
● Flavors:
○ NONE
○ BASIC (Default)
○ ENHANCED
● Symbiosis w/ Offheaping
○ MemStore SLABs offheap
○ Compact in-memory layout (CellChunkMap)
21. Miscellaneous: Complete
● Medium Object Blobs (MOB)
○ Documents, Images, etc.
○ 100k => 10MB
● RegionServer Groups (rsgroups)
○ Coarse Isolation
○ Tables vs RegionServers
● WALs and HFiles in different Filesystems
○ AWS EMR HBase on S3 offering (Amazon S3 Storage Mode)
● New Netty Server/Client chassis
● Filesystem Quotas on Table/Namespace sizes
○ Per Table, per Namespace -- not per User
○ Violation Policy
○ Via Shell
22. Miscellaneous: In progress, BLOCKERs
● Spark module
○ Cleanup of our HBase Spark story
● AsyncDFSClient
○ Done but for the testing
● Update+Relocation of core dependencies
○ hbase-thirdparty
■ Guava 0.12 => 0.22
■ Protobuf 2.5 => 3.3
■ Netty
○ Etc.
23. Miscellaneous: In progress, Nice-to-have
● C++Client
● Backup/Restore
● Hybrid Logical Clock
○ Leap Seconds or Errant clock
○ Overload timestamp to do time and sequence id duty
○ Metadata only for 2.x
24. TODOs: BLOCKERs
● Edit of default configs
○ Stale
● API compatibility review
○ And that it wholesome!
● Testing
○ Each new feature/configuration
○ Rolling upgrade
■ Different data provenance/Starting point
● Operation & Scan Timeout Narrative Cleanup
● Sequence Identifier narrative
● Tie-it-off
○ We keep adding to 2.0.0… it’ll never release.
25. Moving to 2.x
● API Cleanup
○ Interfaces are returned rather than Implementations
■ TableDescriptor instead of HTableDescriptor, etc.
○ Purged Protobuf and Guava classes from API
■ Purged from CLASSPATH too….(relocated)
● Evangelism:
○ Use shaded hbase-client and hbase-server instead going forward
26.
27. hbase-3.0.0
● Split hbase:meta
● Refactor filesystem Interface and Implementation
○ Less dependency on HDFS
○ Scale
● HLC everywhere all the time on all tables
● Replication v2
● Dynamic Online Configuration
28. That’s it!
● Thank you to our host Huawei
● Running 2.0 Status: hbase-2.0.0 Status Doc
● References:
○ Matteo Bertozzi on hbase2: https://speakerdeck.com/matteobertozzi/hbase-2-dot-0-sf-meetup-oct-15
○ Enis Soztutar on hbase2: https://www.slideshare.net/enissoz/meet-hbase-20
● Images:
● Accordion: https://static.roland.com/assets/images/products/gallery/fr-1x_top_open_gal.jpg
● Version2: http://logofaves.com/wp-content/uploads/2009/06/version_m.jpg?9cf02b
● Long Time: http://www.junodownload.com/products/one-night-only-long-time-coming/1957247-02/
● Comic:
http://4.bp.blogspot.com/-upwza0_lLn4/TmXB4lKkPKI/AAAAAAAAAHY/9lA7VYCmSkI/s1600/heap_0001.jpg
● Stork: http://phicenter.org/wp-content/uploads/2012/08/stork_baby_delivery_720x540.jpg
● Apache Helicopter: https://i.ytimg.com/vi/SQp-3fUBefA/maxresdefault.jpg
● Goals: http://az616578.vo.msecnd.net/files/2016/02/29/6359235488335557721545302495_GOALS600.png