SlideShare a Scribd company logo
1 of 40
hosted by
HTAP DB—System : ApsaraDB
HBase Phoenix and Spark
Yun Zhang & Wei Li
August 17,2018
hosted by
Content
01
02
Phoenix Over ApsaraDB
HBase
Spark & ApsaraDB
HBase/Phoenix
hosted by
Phoenix Over ApsaraDB HBase1
hosted by
Content
1.1
1.2
1.4
1.3
Architecture
Use Cases
Best Practice
Challenges & Improvements
hosted by
Phoenix Over ApsaraDB HBase1.1
hosted by
Phoenix-As-A-Service
● Orientations
○ Provides OLTP and Operational analytics over ApsaraDB HBASE
● Targets
○ Make HBASE easier to use
■ JDBC API/SQL
○ Other functions
■ Secondary Index
■ Transaction
■ Multi tenancy
■ …
Phoenix-as-a-service over ApsaraDB HBase
hosted by
Phoenix Architecture
Thick Phoenix Client Architecture
Upgrades client is very painful as a cloud service!
hosted by
Phoenix Architecture
Thin Phoenix Client Architecture
Lower maintenance cost as a cloud service!
hosted by
Use Cases1.2
hosted by
Use Case 1
LOT Scenario
● Data
○ Big table(Spatial Temporal Data) 100 million+
○ Small table(User Information) less than 1 million
● Functional Requirements
○ Hash join(big table join small table)
○ Staled table (avoid hot spotting)
○ Secondary index
● Other Requirements
○ Latency less than 2 seconds (100 vertices of polygon)
○ Scale out
hosted by
Use Case 1
Architecture
hosted by
Use Case 1
Query
hosted by
Use Case 2
Internet Company Scenario
● Data
○ 350+ million/per day
○ 500G+/per day(uncompress)
● Functional Requirements
○ Staled table (avoid hot spotting)
○ Secondary index(multidimensional analytics)
● Other Requirements
○ Latency less than 200 Millisecond
○ 6+ index tables
○ Scale out
hosted by
Use Case 2
Architecture
hosted by
Best Practices1.3
hosted by
Best Practices
Table Properties
1. Recommend to set UPDATE_CACHE_FREQUENCY when
create table (120000ms as a default value)
2. Used pre-splitting keys is better than slated table.(Range scan
is limited when use slate buckets)
3. Pre-splitting region for index table(if your data tables are salted
table, index tables will inherit this property).
4. SALT_BUCKETS is not equal split keys(pre-splitting region)!!!
hosted by
Best Practices
Query Hint
1. Use USE_SORT_MERGE_JOIN when join bigger tables.
2. Use NO_CACHE will avoid caching any HBase blocks loaded,
which can reduce GC overhead and get better performance. it is
used export data query, such as UPSERT…SELECT clause.
3. Use SMALL will save an RPC call on the scan, Which can reduce
network overhead. it is used hit the small amount of data of query.
hosted by
Best Practices
Composite Key
Where Conditions Status
A=x and B=x and C=x Best
A=x and B=x Better
A=x OK
B=x and C=x Not recommended
C=x Dangerous
Data table composite key Index table composite key
CREATE TABLE DATA_TABLE(
A VARCHAR,
B VARCHAR,
C VARCHAR,
D INTEGER,
CONSTRAINT PK PRIMARY KEY(A, B, C))
CREATE INDEX IDX_NAME
ON DATA_TABLE(A, B, C)
hosted by
Best Practices
Other Tips
1. Recommend to use global index on massive data table
2. Reasonable to use Row timestamp that affects visibility of data
3. More index tables depressed write throughput
hosted by
Challenges & Improvements1.4
hosted by
Challenges
● Availability
○ Sometimes index table become unavailable.
● Stability
○ Full scan/complex queries affects cluster’s stability
● Users Complaints
○ queries can’t automatically choose the best index table
○ Using Python client get worse performance.
○ Lack of data transferring tools (data import/export)
○ Scarce monitor metrics
○ ...
hosted by
Improvements
● Stability
○ Phoenix Chaos Monkey test framework
● Availability
○ Support infinite retry policy when writing index failures to
avoid degrade to full scan data table.
● Producibility
○ Recognizes some full scan/complex queries and reject on the
Server(Query Server) side
○ Integrate monitor platform
○ Other new features
■ Alter modify column/rename
■ Reports rate of progress When creating index
hosted by
Spark & ApsaraDB
HBase/Phoenix2
hosted by
Spark &
ApsaraDB HBase
2.1
2.2
2.3
Overview
Scenario
Architecture &
Implementation
hosted by
Overview2.1
hosted by
Overview
HBase/phoenix requirements
1
2
3
4
Bulkload
Users need bulkload large number
of data to hbase/phoenix fastly
Real time etl/load
As data visibility,user need
realtime etl and load to
hbase/phoenix
Analysis
Phoenix not good at
complex analysis
Elastic resource
Phoenix uses hbase coprocessors
to do analysis, but hbase cluster
resource has limitation
Why?
hosted by
Overview
what can spark bring to ApsaraDB HBase
Compute resources
HBase/
Phoenix Spark
analysis bulkload
realtime
load
spark
build index
➢ Analysis:
• spark as a unified analytics engine,support SQL 2003
• Use dag support complex analysis
➢ Bulkload:spark can support multi datasource like jdbc, csv, elasticsearch,
mongo; have elastic resource
➢ Realtime load:struct streaming easy to do etl, and load to hbase
hosted by
Architecture & Implementation2.2
hosted by
Architecture & implementation
Use ApsaraDB HBase and spark construct big data platform
➢ BATCH LAYER : use spark sql/dataset analysis HBase/Phoenix, also
bulkload other datasources to HBase
➢ SPEED LAYER : use struct streaming etl data from kafka, and increment
load into HBase
➢ SERVING LAYER : User query result data from HBase/Phoenix
hosted by
spark sql& HBase
how spark sql analysis HBase/Phoenix?
➢ SQL API:use spark sql analysis hbase,table meta in hive metastore
➢ Performance:
• distributed scan;
• sql optimize like partition pruning、column pruning、predicate pushdown;
• direct reading hifles;
• auto transform to column based storage
hosted by
spark sql& HBase
CREATE TABLE HBaseTest USING org.apache.spark.sql.execution.datasources.hbase
OPTIONS ('catalog'=
'{
"table":{"namespace":"default", "name":”TestTable", "tableCoder":"PrimitiveType"},
"rowkey":"key",
"columns":{
"col0":{"cf":"rowkey", "col":"key", "type":"string"},
"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
"col2":{"cf":"cf2", "col":"col2", "type":"double"},
"col3":{"cf":"cf3", "col":"col3", "type":"float"},
"col4":{"cf":"cf4", "col":"col4", "type":"int"},
"col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
"col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
"col7":{"cf":"cf7", "col":"col7", "type":"string"},
"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
}
}’
)
demo:CREATE TABLE
hosted by
spark sql& HBase
• partition pruning:use col0 < ‘row050‘ to perform the needed regions
• predicate pushdown:col2 >’10.0 filter will pushdown to hbase scan
• column pruning: only scan the needed column
Performance:
• Data scale:500208
• Native HBaseRDD: HadoopRDD use TableInputFormat
• Spark SQL:spark hbase datasource
beeline> select count(col2) from HBaseTest where col0 < 'row050‘ and col2
>’10.0
demo and performance
类型 时间 结果
Native HBaseRDD 14.425s 5788
Spark SQL 1.036s 5788
hosted by
Spark struct streaming& HBase
demo
val catalog =
s"""{
|"table":{"namespace":"default", "name":"structStreamingCount", "tableCoder":"PrimitiveType"},
|"rowkey":"key",
|"columns":{
|"value":{"cf":"rowkey", "col":"key", "type":"string"},
|"count":{"cf":"cf1", "col":"count", "type":"int"}
|}
|}""".stripMargin
val lines = spark.XXX.load()
val wordCounts = lines.as[String].flatMap(_.split(" ")).filter($"value"=!="").groupBy("value").count()
val query = wordCounts.
writeStream.
outputMode("update").
format("org.apache.spark.sql.execution.datasources.hbase.HBaseSinkProvider").
option("checkpointLocation", “xxxx").
option("hbasecat", catalog).
start()
hosted by
Scenario2.3
hosted by
Scenario 1
big data online reporting services
➢ Specialty
• Online service
• Dimension table
➢ Case
• Mobile game:Real-time user activity in different
regions
• Business:Different types of goods pv real-time
report
hosted by
Scenario 2
big data complex reporting services
➢ Specialty
• Complex analysis
• Datawarehouse
• Quasi-real time
➢ Case
• Mobile game:Comparison of user activity in different
age groups during the same period
hosted by
Scenario 3
log indexes and query
➢ Specialty
• collect logs in real time
• log indexes and query
➢ Case
• Log service system
hosted by
Scenario 4
time series query and monitoring
➢ Specialty
• Time series data
• Multi metrics
• Time series query
➢ Case
• IOT、service and business monitoring system
hosted by
We are hiring!
➢ If you are interested in the unified online sql analysis
engine
➢ If you are interested in the spark kernel and ecosystem
ApsaraDB HBase
DingDing Community
Yun Zhang (wechat)
Phoenix
Wei Li (wechat)
Spark
hosted by
Thanks

More Related Content

What's hot

HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaMichael Stack
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...Michael Stack
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezDataWorks Summit
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPHBaseCon
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010Membase
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanMichael Stack
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSandish Kumar H N
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseCloudera, Inc.
 

What's hot (20)

HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 

Similar to HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-finalMaryann Xue
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataTrieu Nguyen
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.Vijaykumar Vangapandu
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Timothy Spann
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 

Similar to HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark (20)

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-final
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big data
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 

More from Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on CloudMichael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at PinterestMichael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., LtdMichael Stack
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at DidiMichael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...Michael Stack
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index ComponentMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesMichael Stack
 

More from Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
 

Recently uploaded

Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresencePC Doctors NET
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyDamar Juniarto
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfOndejSur
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsrahman018755
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfappinfoedgeca
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxChloeMeadows1
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirtrahman018755
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?Linksys Velop Login
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsrahman018755
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.Tortogel
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkklolsDocherty
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebJie Liau
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideVarun Mithran
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appscristianmanaila2
 

Recently uploaded (16)

Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark

  • 1. hosted by HTAP DB—System : ApsaraDB HBase Phoenix and Spark Yun Zhang & Wei Li August 17,2018
  • 2. hosted by Content 01 02 Phoenix Over ApsaraDB HBase Spark & ApsaraDB HBase/Phoenix
  • 3. hosted by Phoenix Over ApsaraDB HBase1
  • 5. hosted by Phoenix Over ApsaraDB HBase1.1
  • 6. hosted by Phoenix-As-A-Service ● Orientations ○ Provides OLTP and Operational analytics over ApsaraDB HBASE ● Targets ○ Make HBASE easier to use ■ JDBC API/SQL ○ Other functions ■ Secondary Index ■ Transaction ■ Multi tenancy ■ … Phoenix-as-a-service over ApsaraDB HBase
  • 7. hosted by Phoenix Architecture Thick Phoenix Client Architecture Upgrades client is very painful as a cloud service!
  • 8. hosted by Phoenix Architecture Thin Phoenix Client Architecture Lower maintenance cost as a cloud service!
  • 10. hosted by Use Case 1 LOT Scenario ● Data ○ Big table(Spatial Temporal Data) 100 million+ ○ Small table(User Information) less than 1 million ● Functional Requirements ○ Hash join(big table join small table) ○ Staled table (avoid hot spotting) ○ Secondary index ● Other Requirements ○ Latency less than 2 seconds (100 vertices of polygon) ○ Scale out
  • 11. hosted by Use Case 1 Architecture
  • 13. hosted by Use Case 2 Internet Company Scenario ● Data ○ 350+ million/per day ○ 500G+/per day(uncompress) ● Functional Requirements ○ Staled table (avoid hot spotting) ○ Secondary index(multidimensional analytics) ● Other Requirements ○ Latency less than 200 Millisecond ○ 6+ index tables ○ Scale out
  • 14. hosted by Use Case 2 Architecture
  • 16. hosted by Best Practices Table Properties 1. Recommend to set UPDATE_CACHE_FREQUENCY when create table (120000ms as a default value) 2. Used pre-splitting keys is better than slated table.(Range scan is limited when use slate buckets) 3. Pre-splitting region for index table(if your data tables are salted table, index tables will inherit this property). 4. SALT_BUCKETS is not equal split keys(pre-splitting region)!!!
  • 17. hosted by Best Practices Query Hint 1. Use USE_SORT_MERGE_JOIN when join bigger tables. 2. Use NO_CACHE will avoid caching any HBase blocks loaded, which can reduce GC overhead and get better performance. it is used export data query, such as UPSERT…SELECT clause. 3. Use SMALL will save an RPC call on the scan, Which can reduce network overhead. it is used hit the small amount of data of query.
  • 18. hosted by Best Practices Composite Key Where Conditions Status A=x and B=x and C=x Best A=x and B=x Better A=x OK B=x and C=x Not recommended C=x Dangerous Data table composite key Index table composite key CREATE TABLE DATA_TABLE( A VARCHAR, B VARCHAR, C VARCHAR, D INTEGER, CONSTRAINT PK PRIMARY KEY(A, B, C)) CREATE INDEX IDX_NAME ON DATA_TABLE(A, B, C)
  • 19. hosted by Best Practices Other Tips 1. Recommend to use global index on massive data table 2. Reasonable to use Row timestamp that affects visibility of data 3. More index tables depressed write throughput
  • 20. hosted by Challenges & Improvements1.4
  • 21. hosted by Challenges ● Availability ○ Sometimes index table become unavailable. ● Stability ○ Full scan/complex queries affects cluster’s stability ● Users Complaints ○ queries can’t automatically choose the best index table ○ Using Python client get worse performance. ○ Lack of data transferring tools (data import/export) ○ Scarce monitor metrics ○ ...
  • 22. hosted by Improvements ● Stability ○ Phoenix Chaos Monkey test framework ● Availability ○ Support infinite retry policy when writing index failures to avoid degrade to full scan data table. ● Producibility ○ Recognizes some full scan/complex queries and reject on the Server(Query Server) side ○ Integrate monitor platform ○ Other new features ■ Alter modify column/rename ■ Reports rate of progress When creating index
  • 23. hosted by Spark & ApsaraDB HBase/Phoenix2
  • 24. hosted by Spark & ApsaraDB HBase 2.1 2.2 2.3 Overview Scenario Architecture & Implementation
  • 26. hosted by Overview HBase/phoenix requirements 1 2 3 4 Bulkload Users need bulkload large number of data to hbase/phoenix fastly Real time etl/load As data visibility,user need realtime etl and load to hbase/phoenix Analysis Phoenix not good at complex analysis Elastic resource Phoenix uses hbase coprocessors to do analysis, but hbase cluster resource has limitation Why?
  • 27. hosted by Overview what can spark bring to ApsaraDB HBase Compute resources HBase/ Phoenix Spark analysis bulkload realtime load spark build index ➢ Analysis: • spark as a unified analytics engine,support SQL 2003 • Use dag support complex analysis ➢ Bulkload:spark can support multi datasource like jdbc, csv, elasticsearch, mongo; have elastic resource ➢ Realtime load:struct streaming easy to do etl, and load to hbase
  • 28. hosted by Architecture & Implementation2.2
  • 29. hosted by Architecture & implementation Use ApsaraDB HBase and spark construct big data platform ➢ BATCH LAYER : use spark sql/dataset analysis HBase/Phoenix, also bulkload other datasources to HBase ➢ SPEED LAYER : use struct streaming etl data from kafka, and increment load into HBase ➢ SERVING LAYER : User query result data from HBase/Phoenix
  • 30. hosted by spark sql& HBase how spark sql analysis HBase/Phoenix? ➢ SQL API:use spark sql analysis hbase,table meta in hive metastore ➢ Performance: • distributed scan; • sql optimize like partition pruning、column pruning、predicate pushdown; • direct reading hifles; • auto transform to column based storage
  • 31. hosted by spark sql& HBase CREATE TABLE HBaseTest USING org.apache.spark.sql.execution.datasources.hbase OPTIONS ('catalog'= '{ "table":{"namespace":"default", "name":”TestTable", "tableCoder":"PrimitiveType"}, "rowkey":"key", "columns":{ "col0":{"cf":"rowkey", "col":"key", "type":"string"}, "col1":{"cf":"cf1", "col":"col1", "type":"boolean"}, "col2":{"cf":"cf2", "col":"col2", "type":"double"}, "col3":{"cf":"cf3", "col":"col3", "type":"float"}, "col4":{"cf":"cf4", "col":"col4", "type":"int"}, "col5":{"cf":"cf5", "col":"col5", "type":"bigint"}, "col6":{"cf":"cf6", "col":"col6", "type":"smallint"}, "col7":{"cf":"cf7", "col":"col7", "type":"string"}, "col8":{"cf":"cf8", "col":"col8", "type":"tinyint"} } }’ ) demo:CREATE TABLE
  • 32. hosted by spark sql& HBase • partition pruning:use col0 < ‘row050‘ to perform the needed regions • predicate pushdown:col2 >’10.0 filter will pushdown to hbase scan • column pruning: only scan the needed column Performance: • Data scale:500208 • Native HBaseRDD: HadoopRDD use TableInputFormat • Spark SQL:spark hbase datasource beeline> select count(col2) from HBaseTest where col0 < 'row050‘ and col2 >’10.0 demo and performance 类型 时间 结果 Native HBaseRDD 14.425s 5788 Spark SQL 1.036s 5788
  • 33. hosted by Spark struct streaming& HBase demo val catalog = s"""{ |"table":{"namespace":"default", "name":"structStreamingCount", "tableCoder":"PrimitiveType"}, |"rowkey":"key", |"columns":{ |"value":{"cf":"rowkey", "col":"key", "type":"string"}, |"count":{"cf":"cf1", "col":"count", "type":"int"} |} |}""".stripMargin val lines = spark.XXX.load() val wordCounts = lines.as[String].flatMap(_.split(" ")).filter($"value"=!="").groupBy("value").count() val query = wordCounts. writeStream. outputMode("update"). format("org.apache.spark.sql.execution.datasources.hbase.HBaseSinkProvider"). option("checkpointLocation", “xxxx"). option("hbasecat", catalog). start()
  • 35. hosted by Scenario 1 big data online reporting services ➢ Specialty • Online service • Dimension table ➢ Case • Mobile game:Real-time user activity in different regions • Business:Different types of goods pv real-time report
  • 36. hosted by Scenario 2 big data complex reporting services ➢ Specialty • Complex analysis • Datawarehouse • Quasi-real time ➢ Case • Mobile game:Comparison of user activity in different age groups during the same period
  • 37. hosted by Scenario 3 log indexes and query ➢ Specialty • collect logs in real time • log indexes and query ➢ Case • Log service system
  • 38. hosted by Scenario 4 time series query and monitoring ➢ Specialty • Time series data • Multi metrics • Time series query ➢ Case • IOT、service and business monitoring system
  • 39. hosted by We are hiring! ➢ If you are interested in the unified online sql analysis engine ➢ If you are interested in the spark kernel and ecosystem ApsaraDB HBase DingDing Community Yun Zhang (wechat) Phoenix Wei Li (wechat) Spark