SlideShare a Scribd company logo
1 of 32
Download to read offline
1
Ecosystems built with HBase and
CloudTable Service at Huawei
Jieshan Bi, Yanhui Zhong
2
Agenda
CTBase: A light weight HBase client for structured data
Tagram: Distributed bitmap index implementation with HBase
CloudTable Service(HBase on Huawei Cloud)
3
CTBase Design Motivation
 Most of our customer scenarios are structured data
 HBase secondary index is a basic requirement
 New application indicated new HBase secondary development
 Simple cross-table join queries are common
 Full text index is also required for some customer scenarios
4
CTBase Features
 Schematized table
 Global secondary index
 Cluster table for simple cross-table join queries
 Online schema changes
 JSON based query DSL
5
Schematized Table
UserTable
Service conceptual user table for
storing user data
Column
User table column:
Each column indicates an
attribute of service data.
Index
Primary index:
Rowkey of table that stored the user data,
indicating the search scenario with the
highest probability
Secondary index:
Saves the information about the index to
the primary index.
Qualifier
HBase column:
Each column indicates a
KeyValue.
contains
contains mapping
Index RowKey
Column 1 Column 2 Column 3
Schematized Tables is better for structured user data storage. A
lot of modern NewSQL databases likes MegaStore, Spanner, F1,
Kudu are designed based on schematized tables.
6
CTBase provide schema definition API. Schema definition includes:
 Table Creation
A user table will be exist as simple or cluster table mode.
 Column Definition
Column is a similar concept with RDBMS. A column has specific type and length limit.
 Qualifier Definition
Column to ColumnFamily:Qualifier mapping. CTBase supports composite column, multiple column
can be stored into one same ColumnFamily:Qualifier.
 Index Definition
An index is either primary or secondary. The major part of index definition is the index rowkey
definition. Some hot columns can also be stored in secondary row.
Schema Manager
7
 Meta Cache
Each client has a schema locally in memory for fast data conversion.
 Meta Backup/Recovery Tool
Schema data can be exported as data file for fast recovery.
 Schema Changes
• Column changes
• Qualifier changes
• Index changes
Some changes are light-weight since they can take advantage of the scheme-less characteristics
of HBase. But some changes may cause the existing data to rebuild.
Schema Manager Cont.
8
HBase Global Secondary Index
NAME ID
Ariya I0000005
Bai I0000006
He I0000004
Lily I0000001
Lina I0000003
Lina I9999999
Lisa I0000008
Wang I0000002
Wang I0000007
……. ………….
Xiao I0000009
ID NAME PROVINCE GENDER PHONE AGE
I0000001 Lily Shandong MALE 13322221111 20
I0000002 Wang Guangdong FEMAIL 13222221111 15
I0000003 Lina Shanxi FEMAIL 13522221111 13
I0000004 He Henan MALE 13333331111 18
I0000005 Ariya Hebei FEMAIL 13344441111 28
I0000006 Bai Hunan MALE 15822221111 30
I0000007 Wang Hubei FEMAIL 15922221111 35
I0000008 Lisa Heilongjiang MALE 15844448888 38
I0000009 Xiao Jilin MALE 13802514000 38
…………. ……. …… ………. ………………….. ….
I9999999 Lina Liaoning MALE 13955225522 70
NAME =‘Lina’
Secondary index is for non-key column based queries.
Global secondary index is better for OLTP-like queries with
small batch results.
Region1
Region2
Region3
Region4
IndexRegionA
IndexRegionB
User Region Index Region
9
HBase Global Secondary Index Cont.
Section Section Section
Index RowKey Format
Suppose table UserInfo includes below 5 columns:
ID, NAME, ADDRESS, PHONE,DATE
Primary key are composed with 3 sections:
Section 1: ID
Section 2: NAME
Section 3: truncate(DATE, 8)
So the primary rowkey is:
Secondary Index Key
IDNAME
Secondary index key for NAME index:
H H
Secondary index key for PHONE index:
ID NAME truncate(DATE, 8)
………….
Primary Key
Section is normally related to one user column, but can also be a
constant or a random number.
truncate(DATE, 8)
H
ID NAME
HH
truncate(DATE, 8)
H
PHONE
NOTE:Sections with are also exist in primary keyH
10
Example: select a.account_id, a.amount, b.account_name, b.account_balance from Transactions a
left join AccountInfo b on a.account_id = b.account_id where a.account_id = “xxxxxxx”
account_id amount time
A0001 $100 12/12/2014 18:00:02
A0001 $1020 10/12/2014 15:30:05
A0001 $89 09/12/2014 13:00:07
A0002 $105 11/12/2014 20:15:00
account_id account_name account_balance
A0001 Andy $100232
A0002 Lily $902323
A0003 Selina $90000
A0004 Anna $102320
A0001 Andy $100232
A0001 $100 12/12/2014 18:00:02
A0001 $1020 10/12/2014 15:30:05
A0001 $89 09/12/2014 13:00:07
A0002 Lily $902323
A0002 $105 11/12/2014 20:15:00
A0002 $129 11/11/2014 18:15:00
Records from different
business-level user
table stored together
Transaction record
AccountInfo record
Pre-Joining with Keys: A better solution for cross-table join in
HBase. Records come from different tables but have some same
primary key columns can be stored adjacent to each other, so the
cross-table join turns into a sequential scan.
Cluster Table
11
Table table = null;
try {
table = conn.getTable(TABLE_NAME);
// Generate RowKey.
String rowKey = record.getId() + SEPERATOR + record.getName();
Put put = new Put(Bytes.toBytes(rowKey));
// Add name.
put.add(FAMILY, Bytes.toBytes("N"), Bytes.toBytes(record.getName()));
// Add phone.
put.add(FAMILY, Bytes.toBytes("P"), Bytes.toBytes(record.getPhone()));
// Add composite columns.
String compositeColumn = record.getAddress() + SEPERATOR
+ record.getAge() + SEPERATOR + record.getGender();
put.add(FAMILY, Bytes.toBytes("Z"), Bytes.toBytes(compositeColumn));
table.put(put);
} catch (IOException e) {
// Handle exception.
} finally {
// ……..
}
ClusterTableInterface table = null;
try {
table = new ClusterTable(conf, CLUSTER_TABLE);
CTRow row = new CTRow();
// Add all columns.
row.addColumn("ID", record.getId());
row.addColumn("NAME", record.getName());
row.addColumn("Address", record.getAddress());
row.addColumn("Phone", record.getPhone());
row.addColumn("Age", record.getAge());
row.addColumn("Gender", record.getGender());
table.put(USER_TABLE, row);
} catch (IOException e) {
// Handle exception.
} finally {
// ………….
}
RowKey/Put/KeyValue are not visible to application directly.
Secondary index row will be auto-generated by CTBase.
HBase Write Vs. ClusterTable Write
12
JSON Based Query DSL
{
table: “TableA",
conditions: [“ID": “23470%", “CarNo": “A1?234",
“Color”: “Yello || Black || White”],
columns: ["ID", “Time", “CarNo", “Color”],
caching: 100
}
 Flexible and powerful query API.
 Support for below operators:
Range Query Operator: >, >=, <, <=
Logic Operator: &&, ||
Fuzzy Query Operator: ?, *, %
 Index name can be specified, or just depend
on imbedded RBO to choose the best index.
 Using exist or customized filters to push
down queries for decreasing query latency.
JSON
Query Executor
JSON Analyzer
Rule Based Optimizer
Query Plan
Result Scanner
Result
13
Bulk Load
Local
Schema
Structured
Data
KeyValue
(User data)
KeyValue
(Index data)
HFile
HFile
 Schema has been defined in advance, including columns, column to qualifier
mappings, index row key format, etc. The only required configuration for bulk load
task is the column orders of the data file.
 Secondary index related HFiles can be generated together in one bulk load task.
14
Future Work For CTBase
1. Better Full-Text index support.
2. Active-Active Clusters Client.
3. Better HFile format for structured data.
15
Agenda
CTBase: A light weight HBase client for structured data
Tagram: Distributed Bitmap index implementation with HBase
CloudTable Service(HBase on Huawei Cloud)
16
 Low-cardinality attributes are popularly used in Personas area, these attributes are
used to describe user/entity typical characteristics, behavior patterns, motivations. E.g.
Attributes for describing buyer personas can help identify where your best customers
spend time on the internet.
 Ad-hoc queries must be supported. Likes:
“How many male customers have age < 30?”
“How many customers have these specific attributes?”
“Which people appeared in Area-A, Area-B and Area-C between 9:00 and 12:00?”
 Solr/Elasticsearch based solutions are not fast enough for low-cardinality attributes
based ad-hoc queries.
Tagram Design Motivation
17
Tagram Introduction
 Distributed bitmap index implementation uses
HBase as backend storage.
 Milliseconds level latency for attribute based ad-
hoc queries.
 Each attribute value is called a Tag. Entity is called
a TagHost. Each Tag relates to an independent
bitmap. Hot tags related bitmaps are memory-
resident.
 A Tag is either static or dynamic. Static tags must
be defined in advance. Dynamic tags have no such
restriction, likes Time-Space related tags.
Condition
GENDER:Male AND MARRIAGE:Married AND AGE:25-30
AND BLOOD_TYPE:A AND CAROWNER
Tagram Client
Query
Execution
TagZone
101111010010...
011001011110...
101001011010...
101111011010...
101010011010...
&
&
&
&
Query
Execution
Conditions
AST Tree
Query
Optimization
Query Plan
TagZone
001111010010...
111001011110...
001101011010...
101001011010...
000010011010...
&
&
&
&
101111010010101...
Each bit represent
whether an Entity
have this attribute
Each attribute value
relates to a Bitmap
Conditions
AST Tree
Query
Optimization
Query Plan
18
TagZone
HBase
Checkpoint Checkpoint
HDFS
Bitmap Container
Bitmap Bitmap Bitmap …
Dynamic Tag Loader
Query Cache
TagHostGroup
TagSource
StaticTag
ChangeLog
DynamicTag
PostingList
DTag
DynamicTag PostingList
Checkpoint
Bitmap Latest Data View
Changes
Base Delta
Service Threads
TagZone
Query CacheService Threads
Tagram Architecture
 TagZone service is initialized by
HBase coprocessor.
 Each TagZone is an independent
bitmap computing unit.
 All the real-time writes and logs
are stored in HBase.
 Use bitmap checkpoint for fast
recovery during service initialization.
Bitmap Container
Bitmap Bitmap Bitmap …
Dynamic Tag Loader
19
Data Model
TagSource
TagHostGroup
TagHostGroup_TAGZONE
M
1
1
1
Inverted index of Tag to TagHosts
TagHost to Tags
TagHostID
(Any Type)
TID
(Integer)
Tags Meta data storage
 TagSource: Meta data storage for static tags, includes
configurations per tag.
 TagHostGroup: Uses TagHostID as key, and store all the
tags as columns.
 TagZone: Inverted index from Tag to TagHost list.
Bitmap related data is also stored in this table. Partitions
are decided during table creation, and can not split in
future.
 Each table is an independent HBase table.
20
Query
Query grammar in BNF:
Query ::= ( Clause )+
Clause ::= ["AND", "OR", "NOT"] ([TagName:]TagValue| "(" Query ")" )
 A Query is a series of Clauses. Each Clause can also be a nested query.
 Supports AND/OR/NOT operators. AND indicates this clause is required, NOT
indicates this clause is prohibited, OR indicates this clause should appear in the
matching results. The default operator is OR is none operator specified.
 Parentheses “(” “)” can be used to improve the priority of a sub-query.
21
Query Example
 Normal Query:
GENDER:Male AND MARRIAGE:Married AND AGE:25-30 AND BLOOD_TYPE:A
 Use parentheses “(” “)” to improve the priority of sub-query:
GENDER:Male AND MARRIAGE:Married AND (AGE:25-30 OR AGE:30-35) AND BLOOD_TYPE:A
 Minimum Number Should Match Query:
At least 2 of below 4 groups of conditions should be satisfied:
(A1 B1 C1 D1 E1 F1 G1 H1) (A2 B2 C2 D2 E2 F2 G2 H2) (A3 B3 C3 D3 E3 F3 G3 H3) (A4 B4 C4 D4 E4 F4 G4 H4)
 Complex query with static and dynamic tags:
GENDER:Male AND MARRIAGE:Married AND AGE:25-30 AND CAROWNER AND $D:DTag1 AND $D:DTag2
22
Evaluation
Bitmap Cardinality In-memory Bytes On-Disk Size Bytes
5,000,000 15426632 10387402
10,000,000 29042504 20370176
50,000,000 140155632 99812920
100,000,000 226915200 198083304
Test results on small cluster:
3 Huawei 2288 Servers(256GB Memory, Intel(R) Xeon(R) CPU E5-2618L v3 @2.30GHZ*2 SATA,4TB*14)
1.5 Billion TagHosts, ~60 static Tags per TagHost.
Query with 10 random tags(Hundreds of thousands satisfied results), count and only return first screen
results. Average query latency: 60ms。
Bitmap in-memory and on-disk size:
NOTE: 1. Bitmap cardinality is the number of bit 1 from the bitmap in binary form.
2. The positions with bit 1 are random integers between 0 and Integer.Max.
3. The distribution of bit 1(In Bitmap binary form) and the range may affect the bitmap size.
23
Future Work For Tagram
1. Multiple TagZone Replica.
2. Async Tagram/HBase Client.
3. Better Bitmap Memory Management.
4. Integration with Graph/Full-Text index.
24
Acknowledgment
• Chaoqiang Zhong (zhongchaoqiang@huawei.com)
• Bene Guo (guoyijun@huawei.com)
• Daicheng Li (lidaicheng@huawei.com)
25
Agenda
CTBase: A light weight HBase client for structured data
Tagram: Distributed Bitmap index implementation with HBase
CloudTable Service(HBase on Huawei Cloud)
26
 Easy Maintenance
 Security
 High Performance
 SLA
 High Availability
 Low Cost
CloudTable Service Features
27
VPC1 HBase VPC2 HBase
RegionServer
HDFS
HMaster HRegion
Memstoe
HFile
…
RegionS
erver
…ZK
…
Tenant VPC
Tenant
VPC
VPC3 HBase
Tenant
VPC
 Isolation by VPC
 Shared Storage
CloudTable Service On Huawei Cloud
28
HBase
Disk Disk Disk
Block
Device
FileSystem
HDFS
HBase HBase
Disk Disk Disk Disk
Distribute Pool(Append only)
HDFS Interface
HBase HBase HBase
• A low-latency IO stack
• Deep Optimization With hardware
FileSystem FileSystem
Block
Device
Block
Device
Native HBase IO Stack
CloudTable IO Stack
CloudTable – IO Optimization
29
Region
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HDFS Data Node
Region
HFile
HFile
HFile
HFile
HFile
HFile
Region Server
Read Write
Compaction Region
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HFile
HDFS Data Node
Region
HFile
HFile
HFile
HFile
HFile
HFile
Region Server
Read
Write
Compaction
CMD:compactionOffload compaction
Smooth Performance
0
5000
10000
15000
20000
25000
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106
109
112
115
118
121
124
127
130
TPS
normal
offload
CloudTable – Offload Compaction
30
HBase
Cluster
Arbitration
Node1
Arbitration Cluster
Arbitration
Node2
Arbitration
Node3
HBase
Cluster
HBase
Cluster
HBase
Cluster
AZ1 AZ2
Sync Replication
Sync Replication
Heartbeat
 Cross AZ Replication
 Write: Strong Consistency
 Read: Timeline Consistency
 99.99% Availability
 99.999999999% Durability
 Auto Failover
CloudTable – High Availability
31
Disk Disk Disk Disk
Distribute Pool(Append only)
HDFS Interface
HBase Solr Other
Services
Disk
 40% resource savings
From:Flash Storage Disaggregation
CloudTable – Low Cost
32
Thank You!
bijieshan@huawei.com zhongyanhui@huawei.com

More Related Content

What's hot

Access presentation
Access presentationAccess presentation
Access presentationDUSPviz
 
Sdn beginners bi
Sdn beginners biSdn beginners bi
Sdn beginners biRam Tomar
 
03 HTML #burningkeyboards
03 HTML #burningkeyboards03 HTML #burningkeyboards
03 HTML #burningkeyboardsDenis Ristic
 
Access 2007-Datasheets 1-Create a table by entering data
Access 2007-Datasheets 1-Create a table by entering dataAccess 2007-Datasheets 1-Create a table by entering data
Access 2007-Datasheets 1-Create a table by entering dataOklahoma Dept. Mental Health
 
Training MS Access 2007
Training MS Access 2007Training MS Access 2007
Training MS Access 2007crespoje
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
Sas visual analytics Training
Sas visual analytics Training Sas visual analytics Training
Sas visual analytics Training bidwhm
 
MS Office Access Tutorial
MS Office Access TutorialMS Office Access Tutorial
MS Office Access TutorialvirtualMaryam
 
ASP.NET 10 - Data Controls
ASP.NET 10 - Data ControlsASP.NET 10 - Data Controls
ASP.NET 10 - Data ControlsRandy Connolly
 
Scraping, Transforming, and Enriching Bibliographic Data with Google Sheets
Scraping, Transforming, and Enriching Bibliographic Data with Google SheetsScraping, Transforming, and Enriching Bibliographic Data with Google Sheets
Scraping, Transforming, and Enriching Bibliographic Data with Google SheetsMichael P. Williams
 

What's hot (20)

Access presentation
Access presentationAccess presentation
Access presentation
 
Sq lite module8
Sq lite module8Sq lite module8
Sq lite module8
 
Sdn beginners bi
Sdn beginners biSdn beginners bi
Sdn beginners bi
 
03 HTML #burningkeyboards
03 HTML #burningkeyboards03 HTML #burningkeyboards
03 HTML #burningkeyboards
 
Access 2007-Datasheets 1-Create a table by entering data
Access 2007-Datasheets 1-Create a table by entering dataAccess 2007-Datasheets 1-Create a table by entering data
Access 2007-Datasheets 1-Create a table by entering data
 
Training MS Access 2007
Training MS Access 2007Training MS Access 2007
Training MS Access 2007
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
Introduction to Oracle
Introduction to OracleIntroduction to Oracle
Introduction to Oracle
 
Sq lite module2
Sq lite module2Sq lite module2
Sq lite module2
 
Sas visual analytics Training
Sas visual analytics Training Sas visual analytics Training
Sas visual analytics Training
 
Effective Use of Excel
Effective Use of ExcelEffective Use of Excel
Effective Use of Excel
 
Ch 9 S Q L
Ch 9  S Q LCh 9  S Q L
Ch 9 S Q L
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Intro
IntroIntro
Intro
 
MS Office Access Tutorial
MS Office Access TutorialMS Office Access Tutorial
MS Office Access Tutorial
 
ASP.NET 10 - Data Controls
ASP.NET 10 - Data ControlsASP.NET 10 - Data Controls
ASP.NET 10 - Data Controls
 
form view
form viewform view
form view
 
Scraping, Transforming, and Enriching Bibliographic Data with Google Sheets
Scraping, Transforming, and Enriching Bibliographic Data with Google SheetsScraping, Transforming, and Enriching Bibliographic Data with Google Sheets
Scraping, Transforming, and Enriching Bibliographic Data with Google Sheets
 
01 Microsoft Access
01 Microsoft Access01 Microsoft Access
01 Microsoft Access
 
Vba primer
Vba primerVba primer
Vba primer
 

Similar to hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei

MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
ASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationRandy Connolly
 
PostgreSQL as NoSQL
PostgreSQL as NoSQLPostgreSQL as NoSQL
PostgreSQL as NoSQLHimanchali -
 
Premiere Products Exercises
Premiere Products ExercisesPremiere Products Exercises
Premiere Products ExercisesAmy Miller
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndicThreads
 
Developing Microsoft SQL Server 2012 Databases 70-464 Pass Guarantee
Developing Microsoft SQL Server 2012 Databases 70-464 Pass GuaranteeDeveloping Microsoft SQL Server 2012 Databases 70-464 Pass Guarantee
Developing Microsoft SQL Server 2012 Databases 70-464 Pass GuaranteeSusanMorant
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 
It 302 computerized accounting (week 2) - sharifah
It 302   computerized accounting (week 2) - sharifahIt 302   computerized accounting (week 2) - sharifah
It 302 computerized accounting (week 2) - sharifahalish sha
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Data Warehousing with Python
Data Warehousing with PythonData Warehousing with Python
Data Warehousing with PythonMartin Loetzsch
 
xml-150211140504-conversion-gate01 (1).pdf
xml-150211140504-conversion-gate01 (1).pdfxml-150211140504-conversion-gate01 (1).pdf
xml-150211140504-conversion-gate01 (1).pdfssusere05ec21
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBKen Cenerelli
 

Similar to hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei (20)

MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
ASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And Representation
 
Physical Design and Development
Physical Design and DevelopmentPhysical Design and Development
Physical Design and Development
 
MYSQL.ppt
MYSQL.pptMYSQL.ppt
MYSQL.ppt
 
PostgreSQL as NoSQL
PostgreSQL as NoSQLPostgreSQL as NoSQL
PostgreSQL as NoSQL
 
Premiere Products Exercises
Premiere Products ExercisesPremiere Products Exercises
Premiere Products Exercises
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
Developing Microsoft SQL Server 2012 Databases 70-464 Pass Guarantee
Developing Microsoft SQL Server 2012 Databases 70-464 Pass GuaranteeDeveloping Microsoft SQL Server 2012 Databases 70-464 Pass Guarantee
Developing Microsoft SQL Server 2012 Databases 70-464 Pass Guarantee
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 
It 302 computerized accounting (week 2) - sharifah
It 302   computerized accounting (week 2) - sharifahIt 302   computerized accounting (week 2) - sharifah
It 302 computerized accounting (week 2) - sharifah
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Data Warehousing with Python
Data Warehousing with PythonData Warehousing with Python
Data Warehousing with Python
 
xml-150211140504-conversion-gate01 (1).pdf
xml-150211140504-conversion-gate01 (1).pdfxml-150211140504-conversion-gate01 (1).pdf
xml-150211140504-conversion-gate01 (1).pdf
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Kroenke Ch2 Solutions Essay
Kroenke Ch2 Solutions EssayKroenke Ch2 Solutions Essay
Kroenke Ch2 Solutions Essay
 

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comHBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 

Recently uploaded

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Recently uploaded (20)

201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei

  • 1. 1 Ecosystems built with HBase and CloudTable Service at Huawei Jieshan Bi, Yanhui Zhong
  • 2. 2 Agenda CTBase: A light weight HBase client for structured data Tagram: Distributed bitmap index implementation with HBase CloudTable Service(HBase on Huawei Cloud)
  • 3. 3 CTBase Design Motivation  Most of our customer scenarios are structured data  HBase secondary index is a basic requirement  New application indicated new HBase secondary development  Simple cross-table join queries are common  Full text index is also required for some customer scenarios
  • 4. 4 CTBase Features  Schematized table  Global secondary index  Cluster table for simple cross-table join queries  Online schema changes  JSON based query DSL
  • 5. 5 Schematized Table UserTable Service conceptual user table for storing user data Column User table column: Each column indicates an attribute of service data. Index Primary index: Rowkey of table that stored the user data, indicating the search scenario with the highest probability Secondary index: Saves the information about the index to the primary index. Qualifier HBase column: Each column indicates a KeyValue. contains contains mapping Index RowKey Column 1 Column 2 Column 3 Schematized Tables is better for structured user data storage. A lot of modern NewSQL databases likes MegaStore, Spanner, F1, Kudu are designed based on schematized tables.
  • 6. 6 CTBase provide schema definition API. Schema definition includes:  Table Creation A user table will be exist as simple or cluster table mode.  Column Definition Column is a similar concept with RDBMS. A column has specific type and length limit.  Qualifier Definition Column to ColumnFamily:Qualifier mapping. CTBase supports composite column, multiple column can be stored into one same ColumnFamily:Qualifier.  Index Definition An index is either primary or secondary. The major part of index definition is the index rowkey definition. Some hot columns can also be stored in secondary row. Schema Manager
  • 7. 7  Meta Cache Each client has a schema locally in memory for fast data conversion.  Meta Backup/Recovery Tool Schema data can be exported as data file for fast recovery.  Schema Changes • Column changes • Qualifier changes • Index changes Some changes are light-weight since they can take advantage of the scheme-less characteristics of HBase. But some changes may cause the existing data to rebuild. Schema Manager Cont.
  • 8. 8 HBase Global Secondary Index NAME ID Ariya I0000005 Bai I0000006 He I0000004 Lily I0000001 Lina I0000003 Lina I9999999 Lisa I0000008 Wang I0000002 Wang I0000007 ……. …………. Xiao I0000009 ID NAME PROVINCE GENDER PHONE AGE I0000001 Lily Shandong MALE 13322221111 20 I0000002 Wang Guangdong FEMAIL 13222221111 15 I0000003 Lina Shanxi FEMAIL 13522221111 13 I0000004 He Henan MALE 13333331111 18 I0000005 Ariya Hebei FEMAIL 13344441111 28 I0000006 Bai Hunan MALE 15822221111 30 I0000007 Wang Hubei FEMAIL 15922221111 35 I0000008 Lisa Heilongjiang MALE 15844448888 38 I0000009 Xiao Jilin MALE 13802514000 38 …………. ……. …… ………. ………………….. …. I9999999 Lina Liaoning MALE 13955225522 70 NAME =‘Lina’ Secondary index is for non-key column based queries. Global secondary index is better for OLTP-like queries with small batch results. Region1 Region2 Region3 Region4 IndexRegionA IndexRegionB User Region Index Region
  • 9. 9 HBase Global Secondary Index Cont. Section Section Section Index RowKey Format Suppose table UserInfo includes below 5 columns: ID, NAME, ADDRESS, PHONE,DATE Primary key are composed with 3 sections: Section 1: ID Section 2: NAME Section 3: truncate(DATE, 8) So the primary rowkey is: Secondary Index Key IDNAME Secondary index key for NAME index: H H Secondary index key for PHONE index: ID NAME truncate(DATE, 8) …………. Primary Key Section is normally related to one user column, but can also be a constant or a random number. truncate(DATE, 8) H ID NAME HH truncate(DATE, 8) H PHONE NOTE:Sections with are also exist in primary keyH
  • 10. 10 Example: select a.account_id, a.amount, b.account_name, b.account_balance from Transactions a left join AccountInfo b on a.account_id = b.account_id where a.account_id = “xxxxxxx” account_id amount time A0001 $100 12/12/2014 18:00:02 A0001 $1020 10/12/2014 15:30:05 A0001 $89 09/12/2014 13:00:07 A0002 $105 11/12/2014 20:15:00 account_id account_name account_balance A0001 Andy $100232 A0002 Lily $902323 A0003 Selina $90000 A0004 Anna $102320 A0001 Andy $100232 A0001 $100 12/12/2014 18:00:02 A0001 $1020 10/12/2014 15:30:05 A0001 $89 09/12/2014 13:00:07 A0002 Lily $902323 A0002 $105 11/12/2014 20:15:00 A0002 $129 11/11/2014 18:15:00 Records from different business-level user table stored together Transaction record AccountInfo record Pre-Joining with Keys: A better solution for cross-table join in HBase. Records come from different tables but have some same primary key columns can be stored adjacent to each other, so the cross-table join turns into a sequential scan. Cluster Table
  • 11. 11 Table table = null; try { table = conn.getTable(TABLE_NAME); // Generate RowKey. String rowKey = record.getId() + SEPERATOR + record.getName(); Put put = new Put(Bytes.toBytes(rowKey)); // Add name. put.add(FAMILY, Bytes.toBytes("N"), Bytes.toBytes(record.getName())); // Add phone. put.add(FAMILY, Bytes.toBytes("P"), Bytes.toBytes(record.getPhone())); // Add composite columns. String compositeColumn = record.getAddress() + SEPERATOR + record.getAge() + SEPERATOR + record.getGender(); put.add(FAMILY, Bytes.toBytes("Z"), Bytes.toBytes(compositeColumn)); table.put(put); } catch (IOException e) { // Handle exception. } finally { // …….. } ClusterTableInterface table = null; try { table = new ClusterTable(conf, CLUSTER_TABLE); CTRow row = new CTRow(); // Add all columns. row.addColumn("ID", record.getId()); row.addColumn("NAME", record.getName()); row.addColumn("Address", record.getAddress()); row.addColumn("Phone", record.getPhone()); row.addColumn("Age", record.getAge()); row.addColumn("Gender", record.getGender()); table.put(USER_TABLE, row); } catch (IOException e) { // Handle exception. } finally { // …………. } RowKey/Put/KeyValue are not visible to application directly. Secondary index row will be auto-generated by CTBase. HBase Write Vs. ClusterTable Write
  • 12. 12 JSON Based Query DSL { table: “TableA", conditions: [“ID": “23470%", “CarNo": “A1?234", “Color”: “Yello || Black || White”], columns: ["ID", “Time", “CarNo", “Color”], caching: 100 }  Flexible and powerful query API.  Support for below operators: Range Query Operator: >, >=, <, <= Logic Operator: &&, || Fuzzy Query Operator: ?, *, %  Index name can be specified, or just depend on imbedded RBO to choose the best index.  Using exist or customized filters to push down queries for decreasing query latency. JSON Query Executor JSON Analyzer Rule Based Optimizer Query Plan Result Scanner Result
  • 13. 13 Bulk Load Local Schema Structured Data KeyValue (User data) KeyValue (Index data) HFile HFile  Schema has been defined in advance, including columns, column to qualifier mappings, index row key format, etc. The only required configuration for bulk load task is the column orders of the data file.  Secondary index related HFiles can be generated together in one bulk load task.
  • 14. 14 Future Work For CTBase 1. Better Full-Text index support. 2. Active-Active Clusters Client. 3. Better HFile format for structured data.
  • 15. 15 Agenda CTBase: A light weight HBase client for structured data Tagram: Distributed Bitmap index implementation with HBase CloudTable Service(HBase on Huawei Cloud)
  • 16. 16  Low-cardinality attributes are popularly used in Personas area, these attributes are used to describe user/entity typical characteristics, behavior patterns, motivations. E.g. Attributes for describing buyer personas can help identify where your best customers spend time on the internet.  Ad-hoc queries must be supported. Likes: “How many male customers have age < 30?” “How many customers have these specific attributes?” “Which people appeared in Area-A, Area-B and Area-C between 9:00 and 12:00?”  Solr/Elasticsearch based solutions are not fast enough for low-cardinality attributes based ad-hoc queries. Tagram Design Motivation
  • 17. 17 Tagram Introduction  Distributed bitmap index implementation uses HBase as backend storage.  Milliseconds level latency for attribute based ad- hoc queries.  Each attribute value is called a Tag. Entity is called a TagHost. Each Tag relates to an independent bitmap. Hot tags related bitmaps are memory- resident.  A Tag is either static or dynamic. Static tags must be defined in advance. Dynamic tags have no such restriction, likes Time-Space related tags. Condition GENDER:Male AND MARRIAGE:Married AND AGE:25-30 AND BLOOD_TYPE:A AND CAROWNER Tagram Client Query Execution TagZone 101111010010... 011001011110... 101001011010... 101111011010... 101010011010... & & & & Query Execution Conditions AST Tree Query Optimization Query Plan TagZone 001111010010... 111001011110... 001101011010... 101001011010... 000010011010... & & & & 101111010010101... Each bit represent whether an Entity have this attribute Each attribute value relates to a Bitmap Conditions AST Tree Query Optimization Query Plan
  • 18. 18 TagZone HBase Checkpoint Checkpoint HDFS Bitmap Container Bitmap Bitmap Bitmap … Dynamic Tag Loader Query Cache TagHostGroup TagSource StaticTag ChangeLog DynamicTag PostingList DTag DynamicTag PostingList Checkpoint Bitmap Latest Data View Changes Base Delta Service Threads TagZone Query CacheService Threads Tagram Architecture  TagZone service is initialized by HBase coprocessor.  Each TagZone is an independent bitmap computing unit.  All the real-time writes and logs are stored in HBase.  Use bitmap checkpoint for fast recovery during service initialization. Bitmap Container Bitmap Bitmap Bitmap … Dynamic Tag Loader
  • 19. 19 Data Model TagSource TagHostGroup TagHostGroup_TAGZONE M 1 1 1 Inverted index of Tag to TagHosts TagHost to Tags TagHostID (Any Type) TID (Integer) Tags Meta data storage  TagSource: Meta data storage for static tags, includes configurations per tag.  TagHostGroup: Uses TagHostID as key, and store all the tags as columns.  TagZone: Inverted index from Tag to TagHost list. Bitmap related data is also stored in this table. Partitions are decided during table creation, and can not split in future.  Each table is an independent HBase table.
  • 20. 20 Query Query grammar in BNF: Query ::= ( Clause )+ Clause ::= ["AND", "OR", "NOT"] ([TagName:]TagValue| "(" Query ")" )  A Query is a series of Clauses. Each Clause can also be a nested query.  Supports AND/OR/NOT operators. AND indicates this clause is required, NOT indicates this clause is prohibited, OR indicates this clause should appear in the matching results. The default operator is OR is none operator specified.  Parentheses “(” “)” can be used to improve the priority of a sub-query.
  • 21. 21 Query Example  Normal Query: GENDER:Male AND MARRIAGE:Married AND AGE:25-30 AND BLOOD_TYPE:A  Use parentheses “(” “)” to improve the priority of sub-query: GENDER:Male AND MARRIAGE:Married AND (AGE:25-30 OR AGE:30-35) AND BLOOD_TYPE:A  Minimum Number Should Match Query: At least 2 of below 4 groups of conditions should be satisfied: (A1 B1 C1 D1 E1 F1 G1 H1) (A2 B2 C2 D2 E2 F2 G2 H2) (A3 B3 C3 D3 E3 F3 G3 H3) (A4 B4 C4 D4 E4 F4 G4 H4)  Complex query with static and dynamic tags: GENDER:Male AND MARRIAGE:Married AND AGE:25-30 AND CAROWNER AND $D:DTag1 AND $D:DTag2
  • 22. 22 Evaluation Bitmap Cardinality In-memory Bytes On-Disk Size Bytes 5,000,000 15426632 10387402 10,000,000 29042504 20370176 50,000,000 140155632 99812920 100,000,000 226915200 198083304 Test results on small cluster: 3 Huawei 2288 Servers(256GB Memory, Intel(R) Xeon(R) CPU E5-2618L v3 @2.30GHZ*2 SATA,4TB*14) 1.5 Billion TagHosts, ~60 static Tags per TagHost. Query with 10 random tags(Hundreds of thousands satisfied results), count and only return first screen results. Average query latency: 60ms。 Bitmap in-memory and on-disk size: NOTE: 1. Bitmap cardinality is the number of bit 1 from the bitmap in binary form. 2. The positions with bit 1 are random integers between 0 and Integer.Max. 3. The distribution of bit 1(In Bitmap binary form) and the range may affect the bitmap size.
  • 23. 23 Future Work For Tagram 1. Multiple TagZone Replica. 2. Async Tagram/HBase Client. 3. Better Bitmap Memory Management. 4. Integration with Graph/Full-Text index.
  • 24. 24 Acknowledgment • Chaoqiang Zhong (zhongchaoqiang@huawei.com) • Bene Guo (guoyijun@huawei.com) • Daicheng Li (lidaicheng@huawei.com)
  • 25. 25 Agenda CTBase: A light weight HBase client for structured data Tagram: Distributed Bitmap index implementation with HBase CloudTable Service(HBase on Huawei Cloud)
  • 26. 26  Easy Maintenance  Security  High Performance  SLA  High Availability  Low Cost CloudTable Service Features
  • 27. 27 VPC1 HBase VPC2 HBase RegionServer HDFS HMaster HRegion Memstoe HFile … RegionS erver …ZK … Tenant VPC Tenant VPC VPC3 HBase Tenant VPC  Isolation by VPC  Shared Storage CloudTable Service On Huawei Cloud
  • 28. 28 HBase Disk Disk Disk Block Device FileSystem HDFS HBase HBase Disk Disk Disk Disk Distribute Pool(Append only) HDFS Interface HBase HBase HBase • A low-latency IO stack • Deep Optimization With hardware FileSystem FileSystem Block Device Block Device Native HBase IO Stack CloudTable IO Stack CloudTable – IO Optimization
  • 29. 29 Region HFile HFile HFile HFile HFile HFile HFile HFile HFile HFile HDFS Data Node Region HFile HFile HFile HFile HFile HFile Region Server Read Write Compaction Region HFile HFile HFile HFile HFile HFile HFile HFile HFile HFile HDFS Data Node Region HFile HFile HFile HFile HFile HFile Region Server Read Write Compaction CMD:compactionOffload compaction Smooth Performance 0 5000 10000 15000 20000 25000 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121 124 127 130 TPS normal offload CloudTable – Offload Compaction
  • 30. 30 HBase Cluster Arbitration Node1 Arbitration Cluster Arbitration Node2 Arbitration Node3 HBase Cluster HBase Cluster HBase Cluster AZ1 AZ2 Sync Replication Sync Replication Heartbeat  Cross AZ Replication  Write: Strong Consistency  Read: Timeline Consistency  99.99% Availability  99.999999999% Durability  Auto Failover CloudTable – High Availability
  • 31. 31 Disk Disk Disk Disk Distribute Pool(Append only) HDFS Interface HBase Solr Other Services Disk  40% resource savings From:Flash Storage Disaggregation CloudTable – Low Cost