HBase backups and performance on MapR
Upcoming SlideShare
Loading in...5
×
 

HBase backups and performance on MapR

on

  • 3,810 views

Slides from HBase User Group about HBase backups and performance on MapR distribution for Apache Hadoop.

Slides from HBase User Group about HBase backups and performance on MapR distribution for Apache Hadoop.

Statistics

Views

Total Views
3,810
Slideshare-icon Views on SlideShare
3,809
Embed Views
1

Actions

Likes
3
Downloads
145
Comments
0

1 Embed 1

http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HBase backups and performance on MapR HBase backups and performance on MapR Presentation Transcript

    • HBase on MapR
      LohitVijayaRenu, MapR Technologies, Inc.
      HBasecontributor day at Yahoo, June 30 2011
    • Who am I?
      LohitVijayaRenu, Software Engineer at MapR Technologies (lohit@maprtech.com)
      MapR
      Combines the best of the Hadoop community contributions with significant internally financed infrastructure development to provide complete distribution for Apache Hadoop (www.mapr.com)
    • HBase on MapR
      Backups using Snapshots
      Performance on MapR
      Highly available MapR
      MapR Control System
    • HBase Backups
      "We're trying to come up with right strategy for backing up HBase tables ...Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data)...” ManojMurumkar  
      "...Recently I encountered a problem about data loss of HBase. So it comes to the question that how to backup HBase data to recover table records...What about copy the directory of HBase to another directory in HDFS?... " Liu Xianglong
      Source: hbase-user group
      Available options
      • Export/Import
      • CopyTable
      • Distcp
      • Backup from Mozilla
      • Cluster Replication
      • Table Snapshots 
      Source: http://blog.sematext.com/2011/03/11/hbase-backup-options/
    • MapR Snapshots
      HBASE
      • Entire /hbase can be snapshotted while HBase is running
      • Snapshots are consistent
      • Saves space by sharing blocks
      • Lightning fast
      • Zero performance loss on writing to original
      • Scheduled, or on-demand
      • REST API for creation and deletion of snapshots
      READ / WRITE
      /hbase
      /hbase/.snapshot/Snapshot20110630
      /hbase/.snaphsot/Snapshot20110629
      /hbase/.snaphsot/Snapshot3
      MapR
      REDIRECT ON WRITE
      FOR SNAPSHOT
      Data Blocks
      A
      B
      C
      C’
      D
      Snapshot 3
      Snapshot 20110629
      Snapshot 20110630
    • MapR Snapshots
      HBase table in DFS
      Take snapshot on running HBase
      Restore from snapshot
    • MapR Control System
      Snapshot information
      Snapshot Schedules
      All UI operations have REST APIs
      More info at www.mapr.com
    • MapR Mirroring
      • Mirror is physical copy of data
      • Consistent, point-in-time data replication to different cluster
      • Differential deltas areupdated
      • Compressed and check-summed
      • Scheduled or on-demand
      • REST API for setup, start and stop mirror
      Backup
      Production
      Datacenter 2
      Datacenter 1
      WAN
    • HBase performance
      "...Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers...“ EranKutner
      "...My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with value not greater than 300 bytes...Can we get 40-50k records/sec insertion speed in HBase??...“ GauravVashishth
      Source: hbase-user group
      • Modified YCSB to use ZooKeeper to have co-ordinated start.
      • HMaster and RegionServer running on MapR
      • YCSB Client running on RS nodes
      ZooKeeper
      YCSB setup
      YCSB
      YCSB
      YCSB
      YCSB
      RS
      RS
      RS
      RS
      Master
      MapR
      https://github.com/lohitvijayarenu/YCSB
      • YCSB Clients doing inserts from all cluster nodes.
      • Throughput rates were similar from all nodes
      • All operations in cluster completed around same time.
      YCSB operations from nodes
    • Insert performance
      Dataset: 1B rows
      Row size: 1K
      10 RS, 11 2TB @7200
      8 Cores, 24GB RAM, 2Gbps
      3 Replication, No compression
      Ops
      Seconds
      Insert (one node)
    • Read performance
      Dataset: 0.9B rows
      Row size: 1K
      9 RS, 5 500G @7200
      8 cores, 24GB RAM, 2Gbps
      Ops
      Seconds
      Read (one node)
    • HBase High Availability
      "...In HBase 0.90 I have seen that it has a fault tolerant behavior of triggering lease recovery and closing the file when the writer dies in the middle. Yet does hbase have any workaround/recovery when NameNode is restarted in the middle of the file write(possibly the HLog file , after some syncs)???..." Gokulakannan M 
      source: hbase-user group
    • MapR High Availability
      No single point of failure
      Distributed NameNode
      Automatic and transparent failover
      Better performance
      Replicated and persisted to disk
      Fully distributed and highly scalable
      Real time HBase on MapR
      HBASE
      READ / WRITE
      MapR
      (No Single Point of Failure)
      Node
      Node
      Node
      NN
      NN
      NN
      Node
      Node
      Node
      NN
      NN
      NN
    • MapR Heatmap™
      Intuitive
      Insightful
      Comprehensive
      One node or thousands
      More at www.mapr.com
    • Credits
      Michael Stack and Ryan Rawson for their valuable feedback.
      Brian Cooper and Adam Silberstein for their help with YCSB
      Active and helpful HBase community
      More Information
      • http://www.mapr.com
      • http://mapr.com/only-with-mapr.html
      • Follow us @mapr
      • Download and try from www.mapr.com