SlideShare a Scribd company logo
1 of 35
Download to read offline
New Journey of HBase in Alibaba and Cloud
Chunhui Shen and Long Cao
August 17,2018
八年磨一剑,HBase在阿里巴巴和云上的新征程
Content AliHB-Introduction of Alibaba HBase
History,Tech Overview,Open Source,Core Scenarios
01
Recent Key Challenge & Improvements
GC Trouble,Separation of Computing & Storage,Cold-
Hot Data,Diagnostic System, Migration & Backup
02
03 HBase Ecosystem & Multi-model DB & Cloud
KV,Tabular,SQL,Graph,Time Series,Geospatial ,
Search, Mixed Workloads,Cloud
AliHB-Introduction of Alibaba HBase01
HBase History in Alibaba
• Why HBase
– Began using since 2010
– Active community
– Hadoop ecosystem
– Facebook successful case
– Google famous paper: Big Table
Our Choice in 2010
Open Source
Commercial
Develop New
Big Data
Store System
Cassandra
MySQL、Oracle
Data
Burst
• Used Version
– 0.20->0.90->0.92->0.94->0.98->1.1->2.0
• The earliest case in 2010-2011
– Search Store
– Taobao History Order
– Alipay Risk Management
• Internal branch AliHB
Overview of AliHB
5
• Performance
• High-Performance Data
Structure、Lock-Free、
Group IO
• Feature
• SQL、Secondary Index
• Multi-Tenants、Cold-Hot
Separation、Async API
• Stability
• High Availability Architecture
• Faster MTTR
• Verification in Double 11
Shopping Day
• Efficient Maintenance
• Effective Monitoring
• Full Path Trace
• No-pause migration
• 12000+ Nodes,100+ Clusters ,200+ Million OPS,100+ PB Data
• 20+ BU,6000+ Users, 100+ Production Changes per Day
Open Source and Community
6
• Contributing to open source since 2011
• 3 PMC, 6 Committers in Alibaba
• Sponsor the Chinese HBase Technology Community
• Already Organized 2 HBase Meetup
• At least one HBase Related tech article one day
• Tens of thousands of readers now, and more are coming
• Hosting HBase Con Asia 2018
• Promote the use of HBase through several conference talks
• Hope more people to join in HBase Community
Core Scenarios in Alibaba
7
Ali-HBase
旺旺(IM)
Alipay Bills Cainiao Logistics
Log
Monitor, Log,
Tracking, IoT Data…Message, Orders, Feeds … AI Storage
Ant Intelligent Security
Intelligent Customer Service
Recommendation
Search, BI Report…
Recent Key Challenge & Improvements02
GC Trouble
9
Frequent
Slow Request
Very Slow
Request
Service
Unavailable
GC Problems Under100GB Memory
GC Trouble
10
Only for offline application
Rewriting with C++
Exploring a Thorough
Solution
GC Trouble
11
Type Pause Time Frequency
YGC 100ms+ Once per 5 Secs
CMS 100~500ms Once per 5 Mins
FGC 20s-180s Once per 7~60 Days
Type Pause Time Frequency
YGC 5ms Once per 5 Secs
CMS 100ms Once per 5 Hours
FGC N/A N/A
CCSMap BucketCacheV2
Allocation and reclaim the major memory
by hbase itself, rather than JVM
ZenGC
New GC algorithm in AJDK
Try best to reuse object(In Core Path) when
programming
GC Trouble
New BucketCache in HBase-2.0
CCSMap in HBase-3.0
Separation of Computing & Storage
13
Localized Deployment
– Low IO latency with Short-Circuit Read
– Unbalanced storage space, especially between clusters
– Difficult to increase the usage ratio of CPU and Disk (both), especially when lots of scenarios
– Cluster scaling is slow because of datanode decommission
Separation of Computing & Storage
14
Shared-Storage Deployment
– Big shared storage, more
balanced
– Compute node can scale
independently
– Storage node can scale
independently
– Auto-scaling become
feasible
– Based on load statistics,
smart schedule between
clusters
– Share compute resources
with other applications
Heterogeneous Cold-Hot Storage
15
• HBase has the capability to hold all the data of whole life cycle
• But in most cases, like monitor, trace, order, logistics
• The recently generated data is often accessed, but occupy very little storage space
• The history data is rarely visited, but occupy a lot of storage space
• Common solution
• Cold storage system for history data
• Hot storage system for recent data
• Move the data from hot storage system to cold storage system periodically
Heterogeneous Cold-Hot Storage
16
• Easy To Use
• Auto Tiered
• Heterogeneous
• Read Optimization
Diagnostic System
17
“Request Rush?” — Monitor
“Big Region?” — Web UI
“Full Disk?” — df
“Bad Disk?” — tsar,demsg
……
12000+ Nodes,100+ Clusters ,6000+ Users
HBase Diagnostic Center
1. The unified entrance of trouble shooting
2. Experience/Solution => Function of Diagnostic System
Diagnostic System
18
2
One extra server for all
No Agent
Adding rule dynamically
Runtime information
Check all components
Only 10 seconds for a diagnosis6
Diagnostic System
19
 Compaction
 Stuck
 Balance Abnormal
 Table Abnormal
 Region Offline
 Replication Delay
 Too many files
 High Meta Load
 Multi Assign
 ……
HBase
 ZK Unavailable
 Block Miss
 NameNode Abnormal
 Full capacity of datanode
 Inconsistent state between
two namenodes
 Too much Xceivers
 Disk not mounted
 ……
ZK/HDFS
 Insufficient disk space
 Slow Disk
 Bad Disk
 Too much TCP error
 Slow ping
 CPU hang
 Load too high
 Port is unreachable
 ……
Hardware
50+
Rules
80%+
Accuracy
Shared on
Apsara HBase
Migration & Backup
20
Migration & Backup
21
Independent with HBase
• almost no impact to service
• easy to upgrade
• support multi versions
• support the non-hbase
target
Second-level RPO
Minute-level RTO
HBase Ecosystem & Multi-model DB & Cloud03
Popularity changes per DB category
Ranking scores per category in percent
Data size per day
All in one
RelationalKey Value Doucument Graph Time Series Geospatial
Tabular NoSQL
All in one
RelationalKey Value Doucument Graph Time Series Geospatial
Tabular NoSQL
HBase
HBase Phoenix/AntsDB HBase JanusGraph OpenTSDB GeoMesa
28
Multi-model - Native Or Layer
Neo4j
InfluxDB
CockroachDB
PG
HBase Ecosystem
DataStax
CosmosDB
Multi-model
KVIndex KVIndex
Storage
Multi-model
HBase Meet Cloud – Benefits
Cloud Native
New Hardware Flexibility Cost Savings
(TCO)
RDMA
Flash
GPU
Non-volatile
memory
Fast Add/Remove
Resource
Insight
Fix bugs in time
Self-driven
End up paying for
features
Flexibility
self-driven
Reduce human
……
30
Multi-model - Native Or Layer
Neo4j
InfluxDB
CockroachDB
PG
HBase Ecosystem
DataStax
CosmosDB
Multi-model
KVIndex KVIndex
Storage
Multi-model
ApsaraDB HBase Platform – Cloud Native
HBase
(KV、Tabular 、Doucument)
Solr/ES
(Full Text Index)
Hot data on SSD
SQL
Phoenix
Graph
JanusGraph
Time Series
OpenTSDB
Geospatial
GeoMesa
Spark
Cold data on HDD
and use EC like OSS
Warm data on SSD&HDD
Remote Read/Write use RDMA and 25G network
32
ApsaraDB HBase Platform Advantage
Item
ApsaraDB HBase (ALiyun Product)
https://cn.aliyun.com/product/hbase
Apache HBase (Sofeware)
Basic
High availability 99.9% ~ 99.99% N/A
Data reliability 99.999999999% N/A
Online Ability
Multi-master
clustering
Multi-master clustering,Multi-AZ/Regon NO
GC FGC NO,YGC 5ms GC 20s~100s,YGC 100ms+
Reduce Cost
Storage Cost Cut by 50%+ on share cloud disk,Total 3 Copy Maybe on Cloud Disk,Total 9 Copy
Support Cold
Storage
Support OSS,Cut by 70% at less read NO
Multi-model DB Multi-model DB
KV,Tabular,SQL,Graph,Time Series,Geospatial
Full Text index, Search
KV,Tabular
Enterprise
Characteristics
Disaster recovery Backup and Restore NO,maybe3.0
Security user/password,ACL Kerberos,ACL
Analytics Spark on HBase ,More optimization Spark on HBase
Version upgrade Automatic upgrade N/A
Self-driven
Database
control system
15min Create a DB/Monitor
Online add storage and node/Elastic Power in future
N/A
Diagnostic
Big request ,Big Table merge,Hot Region …… NO
欢迎加入
杭州、硅谷、深圳、北京
谢谢观看
Thanks

More Related Content

What's hot

hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영NAVER D2
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...confluent
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 

What's hot (20)

hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Amazon EFS: Deep Dive
Amazon EFS: Deep DiveAmazon EFS: Deep Dive
Amazon EFS: Deep Dive
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Presto
PrestoPresto
Presto
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 

Similar to HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud

AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Gavin Lin
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Gavin Lin
 
How Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon RedshiftHow Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon RedshiftAttunity
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehousehadoopsphere
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagationRegunath B
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoicebazaarvoice_engineering
 

Similar to HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud (20)

AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
How Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon RedshiftHow Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon Redshift
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
 

More from Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on CloudMichael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at PinterestMichael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., LtdMichael Stack
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at DidiMichael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...Michael Stack
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index ComponentMichael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesMichael Stack
 

More from Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
 

Recently uploaded

I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirtrahman018755
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyDamar Juniarto
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebJie Liau
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfOndejSur
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appscristianmanaila2
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresencePC Doctors NET
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfappinfoedgeca
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxChloeMeadows1
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsrahman018755
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?Linksys Velop Login
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkklolsDocherty
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsrahman018755
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfAnushkaTripathi61
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideVarun Mithran
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.Tortogel
 

Recently uploaded (16)

I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdf
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 

HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud

  • 1. New Journey of HBase in Alibaba and Cloud Chunhui Shen and Long Cao August 17,2018 八年磨一剑,HBase在阿里巴巴和云上的新征程
  • 2. Content AliHB-Introduction of Alibaba HBase History,Tech Overview,Open Source,Core Scenarios 01 Recent Key Challenge & Improvements GC Trouble,Separation of Computing & Storage,Cold- Hot Data,Diagnostic System, Migration & Backup 02 03 HBase Ecosystem & Multi-model DB & Cloud KV,Tabular,SQL,Graph,Time Series,Geospatial , Search, Mixed Workloads,Cloud
  • 4. HBase History in Alibaba • Why HBase – Began using since 2010 – Active community – Hadoop ecosystem – Facebook successful case – Google famous paper: Big Table Our Choice in 2010 Open Source Commercial Develop New Big Data Store System Cassandra MySQL、Oracle Data Burst • Used Version – 0.20->0.90->0.92->0.94->0.98->1.1->2.0 • The earliest case in 2010-2011 – Search Store – Taobao History Order – Alipay Risk Management • Internal branch AliHB
  • 5. Overview of AliHB 5 • Performance • High-Performance Data Structure、Lock-Free、 Group IO • Feature • SQL、Secondary Index • Multi-Tenants、Cold-Hot Separation、Async API • Stability • High Availability Architecture • Faster MTTR • Verification in Double 11 Shopping Day • Efficient Maintenance • Effective Monitoring • Full Path Trace • No-pause migration • 12000+ Nodes,100+ Clusters ,200+ Million OPS,100+ PB Data • 20+ BU,6000+ Users, 100+ Production Changes per Day
  • 6. Open Source and Community 6 • Contributing to open source since 2011 • 3 PMC, 6 Committers in Alibaba • Sponsor the Chinese HBase Technology Community • Already Organized 2 HBase Meetup • At least one HBase Related tech article one day • Tens of thousands of readers now, and more are coming • Hosting HBase Con Asia 2018 • Promote the use of HBase through several conference talks • Hope more people to join in HBase Community
  • 7. Core Scenarios in Alibaba 7 Ali-HBase 旺旺(IM) Alipay Bills Cainiao Logistics Log Monitor, Log, Tracking, IoT Data…Message, Orders, Feeds … AI Storage Ant Intelligent Security Intelligent Customer Service Recommendation Search, BI Report…
  • 8. Recent Key Challenge & Improvements02
  • 9. GC Trouble 9 Frequent Slow Request Very Slow Request Service Unavailable GC Problems Under100GB Memory
  • 10. GC Trouble 10 Only for offline application Rewriting with C++ Exploring a Thorough Solution
  • 11. GC Trouble 11 Type Pause Time Frequency YGC 100ms+ Once per 5 Secs CMS 100~500ms Once per 5 Mins FGC 20s-180s Once per 7~60 Days Type Pause Time Frequency YGC 5ms Once per 5 Secs CMS 100ms Once per 5 Hours FGC N/A N/A CCSMap BucketCacheV2 Allocation and reclaim the major memory by hbase itself, rather than JVM ZenGC New GC algorithm in AJDK Try best to reuse object(In Core Path) when programming
  • 12. GC Trouble New BucketCache in HBase-2.0 CCSMap in HBase-3.0
  • 13. Separation of Computing & Storage 13 Localized Deployment – Low IO latency with Short-Circuit Read – Unbalanced storage space, especially between clusters – Difficult to increase the usage ratio of CPU and Disk (both), especially when lots of scenarios – Cluster scaling is slow because of datanode decommission
  • 14. Separation of Computing & Storage 14 Shared-Storage Deployment – Big shared storage, more balanced – Compute node can scale independently – Storage node can scale independently – Auto-scaling become feasible – Based on load statistics, smart schedule between clusters – Share compute resources with other applications
  • 15. Heterogeneous Cold-Hot Storage 15 • HBase has the capability to hold all the data of whole life cycle • But in most cases, like monitor, trace, order, logistics • The recently generated data is often accessed, but occupy very little storage space • The history data is rarely visited, but occupy a lot of storage space • Common solution • Cold storage system for history data • Hot storage system for recent data • Move the data from hot storage system to cold storage system periodically
  • 16. Heterogeneous Cold-Hot Storage 16 • Easy To Use • Auto Tiered • Heterogeneous • Read Optimization
  • 17. Diagnostic System 17 “Request Rush?” — Monitor “Big Region?” — Web UI “Full Disk?” — df “Bad Disk?” — tsar,demsg …… 12000+ Nodes,100+ Clusters ,6000+ Users HBase Diagnostic Center 1. The unified entrance of trouble shooting 2. Experience/Solution => Function of Diagnostic System
  • 18. Diagnostic System 18 2 One extra server for all No Agent Adding rule dynamically Runtime information Check all components Only 10 seconds for a diagnosis6
  • 19. Diagnostic System 19  Compaction  Stuck  Balance Abnormal  Table Abnormal  Region Offline  Replication Delay  Too many files  High Meta Load  Multi Assign  …… HBase  ZK Unavailable  Block Miss  NameNode Abnormal  Full capacity of datanode  Inconsistent state between two namenodes  Too much Xceivers  Disk not mounted  …… ZK/HDFS  Insufficient disk space  Slow Disk  Bad Disk  Too much TCP error  Slow ping  CPU hang  Load too high  Port is unreachable  …… Hardware 50+ Rules 80%+ Accuracy Shared on Apsara HBase
  • 21. Migration & Backup 21 Independent with HBase • almost no impact to service • easy to upgrade • support multi versions • support the non-hbase target Second-level RPO Minute-level RTO
  • 22. HBase Ecosystem & Multi-model DB & Cloud03
  • 23. Popularity changes per DB category
  • 24. Ranking scores per category in percent
  • 26. All in one RelationalKey Value Doucument Graph Time Series Geospatial Tabular NoSQL
  • 27. All in one RelationalKey Value Doucument Graph Time Series Geospatial Tabular NoSQL HBase HBase Phoenix/AntsDB HBase JanusGraph OpenTSDB GeoMesa
  • 28. 28 Multi-model - Native Or Layer Neo4j InfluxDB CockroachDB PG HBase Ecosystem DataStax CosmosDB Multi-model KVIndex KVIndex Storage Multi-model
  • 29. HBase Meet Cloud – Benefits Cloud Native New Hardware Flexibility Cost Savings (TCO) RDMA Flash GPU Non-volatile memory Fast Add/Remove Resource Insight Fix bugs in time Self-driven End up paying for features Flexibility self-driven Reduce human ……
  • 30. 30 Multi-model - Native Or Layer Neo4j InfluxDB CockroachDB PG HBase Ecosystem DataStax CosmosDB Multi-model KVIndex KVIndex Storage Multi-model
  • 31. ApsaraDB HBase Platform – Cloud Native HBase (KV、Tabular 、Doucument) Solr/ES (Full Text Index) Hot data on SSD SQL Phoenix Graph JanusGraph Time Series OpenTSDB Geospatial GeoMesa Spark Cold data on HDD and use EC like OSS Warm data on SSD&HDD Remote Read/Write use RDMA and 25G network
  • 32. 32 ApsaraDB HBase Platform Advantage Item ApsaraDB HBase (ALiyun Product) https://cn.aliyun.com/product/hbase Apache HBase (Sofeware) Basic High availability 99.9% ~ 99.99% N/A Data reliability 99.999999999% N/A Online Ability Multi-master clustering Multi-master clustering,Multi-AZ/Regon NO GC FGC NO,YGC 5ms GC 20s~100s,YGC 100ms+ Reduce Cost Storage Cost Cut by 50%+ on share cloud disk,Total 3 Copy Maybe on Cloud Disk,Total 9 Copy Support Cold Storage Support OSS,Cut by 70% at less read NO Multi-model DB Multi-model DB KV,Tabular,SQL,Graph,Time Series,Geospatial Full Text index, Search KV,Tabular Enterprise Characteristics Disaster recovery Backup and Restore NO,maybe3.0 Security user/password,ACL Kerberos,ACL Analytics Spark on HBase ,More optimization Spark on HBase Version upgrade Automatic upgrade N/A Self-driven Database control system 15min Create a DB/Monitor Online add storage and node/Elastic Power in future N/A Diagnostic Big request ,Big Table merge,Hot Region …… NO
  • 33.