HBase在hulu的使用和实践
张虔熙 @ hulu
qianxi.zhang@hulu.com
About hulu
About me
• 张虔熙
ü软件工程师@Hulu大数据平台组
ü专注于分布式计算和存储技术
ü热衷于参与开源社区贡献代码
üqianxi.zhang@hulu.com
Agenda
• Overview
• Audience Platform(用户画像系统)
• Auto Balance InputFormat and Snapshot
• Online Bill Storage(订单信息存储系统)
• Replication, RPC Queue and Replica
Agenda
• Overview
• Audience Platform(用户画像系统)
• Auto Balance InputFormat and Snapshot
• Online Bill Storage(订单信息存储系统)
• Replication, RPC Queue and Replica
Overview
• HBase version : 1.2.0
• Hadoop nodes :1000+
• HBase nodes:200+
• HBase table: 200+
• HBase data size:700TB
• Cluster:4
Scenario
• Audience Platform(用户画像系统)
• Log Storage(日志存储系统)
• Online Bill Storage(订单信息存储系统)
• OpenTSDB
Agenda
• Overview
• Audience Platform(用户画像系统)
• Auto Balance InputFormat and Snapshot
• Online Bill Storage(订单信息存储系统)
• Replication, RPC Queue and Replica
Audience Platform(用户画像系统)
• 用户画像:根据用户行为抽象出的一个标签化的用户模型
• Data
üProfile(基本属性)
üUser behavior(用户行为)
üThird party data(第三方数据)
üLabel(标签)
Audience Platform(用户画像系统)
• Data characteristic
üSparse(10^6 qualifier)
üMulti-version(User behavior)
• Purpose
üMarketing decision
üPersonalized recommendation
üAdvertisement
Audience Platform(用户画像系统)
Kafka
HDFS
DB
HBase
Service
Cache
HDFS
Spark
Streaming
Bulk Load
Spark
MapReduce
Audience Platform(用户画像系统)
• Key technology
üAuto balance InputFormat
üSnapshot
Agenda
• Overview
• Audience Platform(用户画像系统)
• Auto Balance InputFormat and Snapshot
• Online Bill Storage(订单信息存储系统)
• Replication, RPC Queue and Replica
Region Size Distribution
Application Performance
• Problem
üTask execution time in MapReduce and Spark is positive correlationwithRegion Size
üTask execution time varies wildly
• Resolve
üEnable TableInputFormat autobalance(hbase.mapreduce.input.autobalance)
üSplit large Region and merge small Region for InputFormat
• Bug
üHBASE-15357(Wrong split/middle key)
Snapshot
• Snapshot
üTable Meta
üHFile Link
• Why Snapshot?
üPerformance
üThe view of data at specific time
Snapshot
• Problem
üCreate one snapshot per application?
üHow to share snapshot between application?
• Snapshot Service
üManage snapshot lifecycle
üAssign the reasonable snapshot to the application
Agenda
• Overview
• Audience Platform(用户画像系统)
• Auto Balance InputFormat and Snapshot
• Online Bill Storage(订单信息存储系统)
• Replication, RPC Queue and Replica
Online Bill Storage(订单信息存储系统)
• Characteristic
üBill information
üOnline service
üWrite more, read less
üRead delay < 1s
Online Bill Storage(订单信息存储系统)
• Key technology
üReplication
üRPC Queue
üReplica
Agenda
• Overview
• Audience Platform(用户画像系统)
• Auto Balance InputFormat and Snapshot
• Online Bill Storage(订单信息存储系统)
• Replication, RPC Queue and Replica
Replication
• Two datacenter,Master-Master Replication
Cluster A Cluster B
Write Read Write Read
Replication
Replication
Replication
• Problem
üReplicationTable and CF configurationwill be wrong if the table name includes namespace
üPrevious design did not consider namespace
üUse “:” when parsing tables and family,such as “userTable:family1”
üBut Namespace and table segmentationis also “:”, such as “namespace1:userTable:faimly1”
• Resolve
üHBASE-11386, HBASE-11393(Use Protobuf insteadof string)
Replication
• Problem
üSome data couldn’t be replicated
üPeerClusterZnode under regionserver of removed peer may never be deleted
üIf some regionserver crash, other regionserver couldn’t take over the rest
replication work since the method “copyQueuesFromRSUsingMulti” fails
• Resolve
üHBASE-16135, HBASE-14476
RPC Queue
• Improve Performance
üMulti RPC Queue
üHBASE-11355
• More
üControlling Queue Delay(CoDel)
üHBASE-15136
Write Queue
Get Queue
Scan Queue
Replica
• Problem
üWhen a RegionServer crash, the region on it is unavailable for a period
• Resolve
üRegion replicas
üThere could be more than one replica for one region
üOne primary replica could accept write and read operation
üMulti secondary replica only accepts read operation
üHBASE-10070
Replica
WAL HFile-1 HFile-2HDFS
RegionServer
Region
(Primary)
RegionServer
Region
(Secondary)
HBase
Client
Read and Write Read Only
Replica
• Client strategy
üQuery primaryregion first
üIf don’t get the result in 10ms, add a query to the secondary replicas
üTake the first answer and cancel others
• Problem
üThe data in secondary replica may be stale.
• More
üHBASE-11568(Async WAL to secondary replica)
Future
• Multi-Tenancy(HBASE-10994)
• Strong schema
• High availability
Reference
• https://issues.apache.org/jira/browse/HBASE-15357
• https://issues.apache.org/jira/browse/HBASE-11386
• https://issues.apache.org/jira/browse/HBASE-11393
• https://issues.apache.org/jira/browse/HBASE-16135
• https://issues.apache.org/jira/browse/HBASE-14476
• https://issues.apache.org/jira/browse/HBASE-15136
• https://issues.apache.org/jira/browse/HBASE-10070
• https://issues.apache.org/jira/browse/HBASE-11568
• https://issues.apache.org/jira/browse/HBASE-10994
Thank you
qianxi.zhang@hulu.com

hbaseconasia2017: HBase在Hulu的使用和实践