HBase backups and performance on MapR
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

HBase backups and performance on MapR

  • 4,007 views
Uploaded on

Slides from HBase User Group about HBase backups and performance on MapR distribution for Apache Hadoop.

Slides from HBase User Group about HBase backups and performance on MapR distribution for Apache Hadoop.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,007
On Slideshare
4,006
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
147
Comments
0
Likes
3

Embeds 1

http://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HBase on MapR
    LohitVijayaRenu, MapR Technologies, Inc.
    HBasecontributor day at Yahoo, June 30 2011
  • 2. Who am I?
    LohitVijayaRenu, Software Engineer at MapR Technologies (lohit@maprtech.com)
    MapR
    Combines the best of the Hadoop community contributions with significant internally financed infrastructure development to provide complete distribution for Apache Hadoop (www.mapr.com)
  • 3. HBase on MapR
    Backups using Snapshots
    Performance on MapR
    Highly available MapR
    MapR Control System
  • 4. HBase Backups
    "We're trying to come up with right strategy for backing up HBase tables ...Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data)...” ManojMurumkar  
    "...Recently I encountered a problem about data loss of HBase. So it comes to the question that how to backup HBase data to recover table records...What about copy the directory of HBase to another directory in HDFS?... " Liu Xianglong
    Source: hbase-user group
    Available options
    • Export/Import
    • 5. CopyTable
    • 6. Distcp
    • 7. Backup from Mozilla
    • 8. Cluster Replication
    • 9. Table Snapshots 
    Source: http://blog.sematext.com/2011/03/11/hbase-backup-options/
  • 10. MapR Snapshots
    HBASE
    • Entire /hbase can be snapshotted while HBase is running
    • 11. Snapshots are consistent
    • 12. Saves space by sharing blocks
    • 13. Lightning fast
    • 14. Zero performance loss on writing to original
    • 15. Scheduled, or on-demand
    • 16. REST API for creation and deletion of snapshots
    READ / WRITE
    /hbase
    /hbase/.snapshot/Snapshot20110630
    /hbase/.snaphsot/Snapshot20110629
    /hbase/.snaphsot/Snapshot3
    MapR
    REDIRECT ON WRITE
    FOR SNAPSHOT
    Data Blocks
    A
    B
    C
    C’
    D
    Snapshot 3
    Snapshot 20110629
    Snapshot 20110630
  • 17. MapR Snapshots
    HBase table in DFS
    Take snapshot on running HBase
    Restore from snapshot
  • 18. MapR Control System
    Snapshot information
    Snapshot Schedules
    All UI operations have REST APIs
    More info at www.mapr.com
  • 19. MapR Mirroring
    • Mirror is physical copy of data
    • 20. Consistent, point-in-time data replication to different cluster
    • 21. Differential deltas areupdated
    • 22. Compressed and check-summed
    • 23. Scheduled or on-demand
    • 24. REST API for setup, start and stop mirror
    Backup
    Production
    Datacenter 2
    Datacenter 1
    WAN
  • 25. HBase performance
    "...Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers...“ EranKutner
    "...My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with value not greater than 300 bytes...Can we get 40-50k records/sec insertion speed in HBase??...“ GauravVashishth
    Source: hbase-user group
  • 26.
    • Modified YCSB to use ZooKeeper to have co-ordinated start.
    • 27. HMaster and RegionServer running on MapR
    • 28. YCSB Client running on RS nodes
    ZooKeeper
    YCSB setup
    YCSB
    YCSB
    YCSB
    YCSB
    RS
    RS
    RS
    RS
    Master
    MapR
    https://github.com/lohitvijayarenu/YCSB
  • 29.
    • YCSB Clients doing inserts from all cluster nodes.
    • 30. Throughput rates were similar from all nodes
    • 31. All operations in cluster completed around same time.
    YCSB operations from nodes
  • 32. Insert performance
    Dataset: 1B rows
    Row size: 1K
    10 RS, 11 2TB @7200
    8 Cores, 24GB RAM, 2Gbps
    3 Replication, No compression
    Ops
    Seconds
    Insert (one node)
  • 33. Read performance
    Dataset: 0.9B rows
    Row size: 1K
    9 RS, 5 500G @7200
    8 cores, 24GB RAM, 2Gbps
    Ops
    Seconds
    Read (one node)
  • 34. HBase High Availability
    "...In HBase 0.90 I have seen that it has a fault tolerant behavior of triggering lease recovery and closing the file when the writer dies in the middle. Yet does hbase have any workaround/recovery when NameNode is restarted in the middle of the file write(possibly the HLog file , after some syncs)???..." Gokulakannan M 
    source: hbase-user group
  • 35. MapR High Availability
    No single point of failure
    Distributed NameNode
    Automatic and transparent failover
    Better performance
    Replicated and persisted to disk
    Fully distributed and highly scalable
    Real time HBase on MapR
    HBASE
    READ / WRITE
    MapR
    (No Single Point of Failure)
    Node
    Node
    Node
    NN
    NN
    NN
    Node
    Node
    Node
    NN
    NN
    NN
  • 36. MapR Heatmap™
    Intuitive
    Insightful
    Comprehensive
    One node or thousands
    More at www.mapr.com
  • 37. Credits
    Michael Stack and Ryan Rawson for their valuable feedback.
    Brian Cooper and Adam Silberstein for their help with YCSB
    Active and helpful HBase community
    More Information
    • http://www.mapr.com
    • 38. http://mapr.com/only-with-mapr.html
    • 39. Follow us @mapr
    • 40. Download and try from www.mapr.com