HBase backups and performance on MapR


Published on

Slides from HBase User Group about HBase backups and performance on MapR distribution for Apache Hadoop.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HBase backups and performance on MapR

  1. 1. HBase on MapR<br />LohitVijayaRenu, MapR Technologies, Inc.<br />HBasecontributor day at Yahoo, June 30 2011<br />
  2. 2. Who am I?<br />LohitVijayaRenu, Software Engineer at MapR Technologies (lohit@maprtech.com)<br />MapR <br />Combines the best of the Hadoop community contributions with significant internally financed infrastructure development to provide complete distribution for Apache Hadoop (www.mapr.com)<br />
  3. 3. HBase on MapR<br />Backups using Snapshots<br />Performance on MapR<br />Highly available MapR<br />MapR Control System<br />
  4. 4. HBase Backups<br />"We're trying to come up with right strategy for backing up HBase tables ...Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data)...” ManojMurumkar  <br />"...Recently I encountered a problem about data loss of HBase. So it comes to the question that how to backup HBase data to recover table records...What about copy the directory of HBase to another directory in HDFS?... " Liu Xianglong<br />Source: hbase-user group<br />Available options<br /><ul><li>Export/Import
  5. 5. CopyTable
  6. 6. Distcp
  7. 7. Backup from Mozilla
  8. 8. Cluster Replication
  9. 9. Table Snapshots </li></ul>Source: http://blog.sematext.com/2011/03/11/hbase-backup-options/<br />
  10. 10. MapR Snapshots<br />HBASE<br /><ul><li>Entire /hbase can be snapshotted while HBase is running
  11. 11. Snapshots are consistent
  12. 12. Saves space by sharing blocks
  13. 13. Lightning fast
  14. 14. Zero performance loss on writing to original
  15. 15. Scheduled, or on-demand
  16. 16. REST API for creation and deletion of snapshots</li></ul>READ / WRITE<br />/hbase<br />/hbase/.snapshot/Snapshot20110630<br />/hbase/.snaphsot/Snapshot20110629<br />/hbase/.snaphsot/Snapshot3<br />MapR<br />REDIRECT ON WRITE<br /> FOR SNAPSHOT<br />Data Blocks<br />A<br />B<br />C<br />C’<br />D<br />Snapshot 3<br />Snapshot 20110629<br />Snapshot 20110630<br />
  17. 17. MapR Snapshots<br />HBase table in DFS<br />Take snapshot on running HBase<br />Restore from snapshot <br />
  18. 18. MapR Control System<br />Snapshot information<br />Snapshot Schedules<br />All UI operations have REST APIs<br />More info at www.mapr.com<br />
  19. 19. MapR Mirroring<br /><ul><li>Mirror is physical copy of data
  20. 20. Consistent, point-in-time data replication to different cluster
  21. 21. Differential deltas areupdated
  22. 22. Compressed and check-summed
  23. 23. Scheduled or on-demand
  24. 24. REST API for setup, start and stop mirror</li></ul>Backup<br />Production<br />Datacenter 2<br />Datacenter 1<br />WAN<br />
  25. 25. HBase performance<br />"...Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers...“ EranKutner<br />"...My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with value not greater than 300 bytes...Can we get 40-50k records/sec insertion speed in HBase??...“ GauravVashishth<br />Source: hbase-user group<br />
  26. 26. <ul><li>Modified YCSB to use ZooKeeper to have co-ordinated start.
  27. 27. HMaster and RegionServer running on MapR
  28. 28. YCSB Client running on RS nodes</li></ul>ZooKeeper <br />YCSB setup<br />YCSB<br />YCSB<br />YCSB<br />YCSB<br />RS<br />RS<br />RS<br />RS<br />Master<br />MapR<br />https://github.com/lohitvijayarenu/YCSB<br />
  29. 29. <ul><li>YCSB Clients doing inserts from all cluster nodes.
  30. 30. Throughput rates were similar from all nodes
  31. 31. All operations in cluster completed around same time.</li></ul>YCSB operations from nodes<br />
  32. 32. Insert performance<br />Dataset: 1B rows<br />Row size: 1K<br />10 RS, 11 2TB @7200<br />8 Cores, 24GB RAM, 2Gbps<br />3 Replication, No compression<br />Ops<br />Seconds<br />Insert (one node)<br />
  33. 33. Read performance<br />Dataset: 0.9B rows<br />Row size: 1K<br />9 RS, 5 500G @7200<br />8 cores, 24GB RAM, 2Gbps<br />Ops<br />Seconds<br />Read (one node)<br />
  34. 34. HBase High Availability<br />"...In HBase 0.90 I have seen that it has a fault tolerant behavior of triggering lease recovery and closing the file when the writer dies in the middle. Yet does hbase have any workaround/recovery when NameNode is restarted in the middle of the file write(possibly the HLog file , after some syncs)???..." Gokulakannan M <br />source: hbase-user group<br />
  35. 35. MapR High Availability<br />No single point of failure<br />Distributed NameNode<br />Automatic and transparent failover<br />Better performance<br />Replicated and persisted to disk<br />Fully distributed and highly scalable<br />Real time HBase on MapR<br />HBASE<br />READ / WRITE<br />MapR<br />(No Single Point of Failure)<br />Node<br />Node<br />Node<br />NN<br />NN<br />NN<br />Node<br />Node<br />Node<br />NN<br />NN<br />NN<br />
  36. 36. MapR Heatmap™<br />Intuitive<br />Insightful<br />Comprehensive<br />One node or thousands<br />More at www.mapr.com<br />
  37. 37. Credits <br />Michael Stack and Ryan Rawson for their valuable feedback.<br />Brian Cooper and Adam Silberstein for their help with YCSB<br />Active and helpful HBase community<br />More Information<br /><ul><li>http://www.mapr.com
  38. 38. http://mapr.com/only-with-mapr.html
  39. 39. Follow us @mapr
  40. 40. Download and try from www.mapr.com</li>