• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,594
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Running HBase with the MapR distribution Tomer Shiran Director of Product Management, MapR Technologies 7/23/2012 ©MapR Technologies 1
  • 2. Agenda• The HBase volume• HBase backups with snapshots• Mirroring• Tuning memory settings• Architecting applications with many objects 7/23/2012 ©MapR Technologies 2
  • 3. MapR• Complete Hadoop distribution • Makes it easy to deploy HBase • MapR 1.2 includes HBase 0.90.4 + 15 patches• Seeing huge growth in HBase adoption • Thanks to everyone in this room!• MapR expands the market for HBase • Enterprises require HA, data protection and disaster recovery • MapR makes it easier to run HBase in production  One minute to set up hourly snapshots  One minute to set up cross-datacenter mirroring  No need to worry about NameNode 7/23/2012 ©MapR Technologies 3
  • 4. Volumes – easy data management• MapR makes data management easier with volumes• Volumes are directories with management policies • Replication, snapshots, mirroring, data placement control, quotas, usage tracking, …• Each user/project directory should be a volume • 100K volumes not a problem 7/23/2012 ©MapR Technologies 4
  • 5. The HBase volume• All HBase data should be in one volume • HBase WALs are per RegionServer, so can’t create per-table volumes• A volume for HBase data is created on installation • Name: hbase.volume • Mount path: /hbase• Replication optimized for low latency • Star replication beats chain replication for HBase• For bulk load, create the HFiles in the HBase volume (/hbase)# cd /mapr/default/hbase Reminder: A MapR# ls -latotal 7 cluster can be mounteddrwxrwxrwx 13 root root 12 2012-01-16 11:44 . via NFS so cd and lsdrwxrwxrwx 6 root root 7 2012-01-13 16:08 .. just workdrwxrwxrwx 3 root root 1 2012-01-15 11:30 AdImpressions-rwxrwxrwx 1 root root 3 2011-12-16 13:03 hbase.versiondrwxrwxrwx 5 root root 3 2012-01-12 15:28 .logs All WALs are in .logs,drwxrwxrwx 3 root root 1 2011-12-16 13:03 .META. not in the user tabledrwxrwxrwx 2 root root 0 2012-01-13 14:29 .oldlogsdrwxrwxrwx 3 root root 1 2011-12-16 13:03 -ROOT- directoriesdrwxrwxrwx 3 root root 1 2012-01-16 11:44 Users (AdImpressions, Users) 7/23/2012 ©MapR Technologies 5
  • 6. HBase backups with snapshots• Why snapshots? • Consistent – HFiles and HLogs at the same point in time • No downtime – snapshot a live HBase cluster, no performance impact • No data duplication – takes seconds to snapshot petabytes • Short RPOs – snapshot hourly or more frequently• Access HBase snapshots in /hbase/.snapshot: # cd .snapshot # pwd /mapr/default/hbase/.snapshot # ls -la total 3 drwxr-xr-x 5 root root 3 Jan 16 16:02 . drwxrwxrwx 7 root root 6 Jan 16 11:46 .. drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.14-02-02 drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.15-02-02 drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.16-02-02 # ls -a 2012-01-16.16-02-02 . .. AdImpressions hbase.version .logs .META. .oldlogs -ROOT- 7/23/2012 ©MapR Technologies 6
  • 7. Manage your schedules 7/23/2012 ©MapR Technologies 7
  • 8. Choose a snapshot schedule for HBase Use this GUI dialog, or the CLI or REST API Choose a snapshot schedule for this volume 7/23/2012 ©MapR Technologies 8
  • 9. Mirroring Mirror to… • Research cluster • Failover (DR) cluster • Remote backup cluster • Same cluster! •… Fast (and easy) Safe Flexible• Differential (deltas) • Consistent (snapshot) • Scheduled or on-• Compressed • Checksummed demand • Intranet, WAN or Sneakernet 7/23/2012 ©MapR Technologies 9
  • 10. Mirroring the HBase volume Create a new volume on destination cluster. Choose Remote Mirroring Volume type Choose source cluster and volume (mapr.hbase) Choose mirroring schedule 7/23/2012 ©MapR Technologies 10
  • 11. Mirroring vs. HBase master/slave replication• Block level • No need to run HBase on sink cluster • Only latest update to the a block needs to be sent  With master/slave every operation is sent• MapR mirroring is practically stateless • Each sink cluster keeps one integer – a serial number  When asking for the next update, sink provides most recently seen serial number • Master cluster does not keep any state  No resources consumed on the master cluster • No ZooKeeper involved • Master/slave replication is challenging when it gets out of sync• One system for mirroring both HBase and file/directories 7/23/2012 ©MapR Technologies 11
  • 12. Warden• Warden runs on each server • /etc/init.d/mapr-warden start• Warden starts/manages services on the node• Warden decides how much memory to give each service based on settings in warden.conf # cat /opt/mapr/conf/warden.conf … service.command.hbregion.heapsize.percent=25 service.command.hbregion.heapsize.max=4000 service.command.hbregion.heapsize.min=1000 service.command.mfs.heapsize.percent=20 service.command.mfs.heapsize.min=512 … 7/23/2012 ©MapR Technologies 12
  • 13. Tuning memory settings• The defaults are suitable in most cases• Guidelines: • Don’t exceed 100-200 regions per server • Don’t give RegionServer more than 16GB RAM  Garbage collection might kill you • Give spare memory to FileServer  Written in C/C++ (unlike HDFS DataNode)  Advanced caching and prefetching • Don’t enable TaskTracker unless you need it  Or Warden will reserve memory for tasks  If TaskTracker not enabled and mfs.heapsize.max not in warden.conf, Warden assigns spare memory to FileServer 7/23/2012 ©MapR Technologies 13
  • 14. Architecting applications with many objects• MapR supports up to 1 trillion files (small files OK) • Fully distributed metadata  No NameNode or block reports • Extremely fast random I/O (10-1000x compared to HDFS) • With HDFS Federation the upcoming HA NameNode you would need 20K NameNodes and an HA NetApp :-)• Keep smaller objects in HBase and larger objects (> 100KB) in MapR storage services Metadata (IDs, attributes, etc.) Content (messages, attachments, etc.) HBase MapR storage services 7/23/2012 ©MapR Technologies 14
  • 15. Three ways to access the files• NFS • Mount the cluster over NFS • NFS HA ensures availability – MapR assigns and manages virtual IPs • No client library, works with any language $ mount –o … mycluster:/mapr /mapr $ python >>> with open(r/mapr/mycluster/images/asdfghjkl, w) as f: ... f.write(…)• Java – Hadoop FileSystem API FileSystem fs = FileSystem.get(new Configuration()); FSDataOutputStream out = fs.create(…); out.write(…)• C/C++ – native libhdfs library (MapR 1.2+) • Same API (header file) as libhdfs, but no Java involved hdfsFS fs = hdfsConnect(...); hdfsFile f = hdfsOpenFile(fs, ...); hdfsWrite(fs, f, ...); 7/23/2012 ©MapR Technologies 15
  • 16. Questions? 7/23/2012 ©MapR Technologies 16