HDFS optimization for Hbase
At XiaoMi
01
02
03
01
l  Multiple replicas
l  Namenode & Datanode
r=replication, factor k=fault datanodes, N=total datanodes
NkailabilityreaddSLAAv /1 −=
( ) ( ) ( )rN,CkNi,rCki,CailabilitywriteSLAAv
r
=i
/1
1
−−×−= ∑
),,( readSLAwriteSLAailabilitynamenodeAvMintyavailabili =
Master
Worker
Cluster Configs
Namenode
DataNode
DataNodeWorker
Falcon
Metrics
HDFSMonitorCluster
02
Architecture
DFSClient
DfsClientShm
Slot Slot
slot slotDfsClientShm
Datanode
RegisteredShm RegisteredShm
Slot Slot
Block Block
Shared Mem
Domain Socket
•  Allocate Shm
•  Request slot & FDs
•  Release Slot
•  Release Shm
² 
• 
• 
•  Datanode Full GC caused by RegisteredShm
•  3000+ QPS alloction VS 1000+ QPS release
² 
² 
YCSB get 20% QPS incensement
² 
Preallocate the Shm:
•  1000 files
•  60 threads
•  seek read
² 
² 
² 
 
 
²  Listen drop on SSD cluster causes 3s delay
15:56:14.506610 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val
1590620938 ecr 0,nop,wscale 7], length 0<<<--------timeout on first try
15:56:17.506172 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val
1590623938 ecr 0,nop,wscale 7], length 0<<<--------retry
15:56:17.506211 IP y.y.y.y.29402 > x.x.x.x.62393: Flags [S.], seq 4109047318, ack 167786999, win 14480, options [mss
1460,sackOK,TS val 1589839920 ecr 1590623938,nop,wscale 7], length 0
²  HDFS-9669
²  After change backlog 128, 3s delays reduced to ~1/10 on Hbase SSD cluster
Somaxconn=128 Default Datanode backlog=50
²  Peer cache bucket adjustment
²  Connection/Socket timeout of the DFSClient & Datanode
dfs.client.socket-timeout
dfs.datanode.socket.write.timeout
l  Reduce the timeout to 15s
l  Avoid pipeline timeout, upgrade the DFSClient first
03
²  60 seconds timeout if datanode dies
²  Dead nodes not shared
²  “Dead” node not actually dead DeadNodeDetector
Global Dead Nodes
DFSInputStream
Local Dead Nodes
Namenode
DataNode
DataNode
DFSInputStream
Local Dead Nodes
DFSClient
HDFS
Suspicious Nodes Live Nodes
²  Node state machine
Init Live
Suspicious
Dead
Open
RPC Failure
Removed
Close
RPC Failure
RPC Success
Read Failure Read Success
Subtitle Text
Maintain the data on local host as much as
possible and reduce the over head of the
local read
Make sure the response to Hbase is
returned as soon as possible even a failed
one
Minor GC from both Hbase and HDFS
affects the latency. Try to easy the one
from HDFS on client side.
Thanks

HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi

  • 1.
    HDFS optimization forHbase At XiaoMi
  • 2.
  • 3.
  • 4.
    l  Multiple replicas l Namenode & Datanode r=replication, factor k=fault datanodes, N=total datanodes NkailabilityreaddSLAAv /1 −= ( ) ( ) ( )rN,CkNi,rCki,CailabilitywriteSLAAv r =i /1 1 −−×−= ∑ ),,( readSLAwriteSLAailabilitynamenodeAvMintyavailabili =
  • 5.
  • 6.
  • 7.
    Architecture DFSClient DfsClientShm Slot Slot slot slotDfsClientShm Datanode RegisteredShmRegisteredShm Slot Slot Block Block Shared Mem Domain Socket •  Allocate Shm •  Request slot & FDs •  Release Slot •  Release Shm
  • 8.
    ²  •  •  •  Datanode FullGC caused by RegisteredShm •  3000+ QPS alloction VS 1000+ QPS release
  • 9.
  • 10.
    ²  YCSB get 20%QPS incensement
  • 11.
    ²  Preallocate the Shm: • 1000 files •  60 threads •  seek read
  • 12.
  • 13.
    ²  Listen dropon SSD cluster causes 3s delay 15:56:14.506610 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val 1590620938 ecr 0,nop,wscale 7], length 0<<<--------timeout on first try 15:56:17.506172 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val 1590623938 ecr 0,nop,wscale 7], length 0<<<--------retry 15:56:17.506211 IP y.y.y.y.29402 > x.x.x.x.62393: Flags [S.], seq 4109047318, ack 167786999, win 14480, options [mss 1460,sackOK,TS val 1589839920 ecr 1590623938,nop,wscale 7], length 0 ²  HDFS-9669 ²  After change backlog 128, 3s delays reduced to ~1/10 on Hbase SSD cluster Somaxconn=128 Default Datanode backlog=50
  • 14.
    ²  Peer cachebucket adjustment
  • 15.
    ²  Connection/Socket timeoutof the DFSClient & Datanode dfs.client.socket-timeout dfs.datanode.socket.write.timeout l  Reduce the timeout to 15s l  Avoid pipeline timeout, upgrade the DFSClient first
  • 16.
  • 17.
    ²  60 secondstimeout if datanode dies ²  Dead nodes not shared ²  “Dead” node not actually dead DeadNodeDetector Global Dead Nodes DFSInputStream Local Dead Nodes Namenode DataNode DataNode DFSInputStream Local Dead Nodes DFSClient HDFS Suspicious Nodes Live Nodes
  • 18.
    ²  Node statemachine Init Live Suspicious Dead Open RPC Failure Removed Close RPC Failure RPC Success Read Failure Read Success
  • 19.
    Subtitle Text Maintain thedata on local host as much as possible and reduce the over head of the local read Make sure the response to Hbase is returned as soon as possible even a failed one Minor GC from both Hbase and HDFS affects the latency. Try to easy the one from HDFS on client side.
  • 20.