• Save
HBase with MapR
Upcoming SlideShare
Loading in...5
×
 

HBase with MapR

on

  • 1,843 views

 

Statistics

Views

Total Views
1,843
Views on SlideShare
1,843
Embed Views
0

Actions

Likes
5
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HBase with MapR HBase with MapR Presentation Transcript

    • Running HBase with the MapR distribution Tomer Shiran Director of Product Management, MapR Technologies 7/23/2012 ©MapR Technologies 1
    • Agenda• The HBase volume• HBase backups with snapshots• Mirroring• Tuning memory settings• Architecting applications with many objects 7/23/2012 ©MapR Technologies 2
    • MapR• Complete Hadoop distribution • Makes it easy to deploy HBase • MapR 1.2 includes HBase 0.90.4 + 15 patches• Seeing huge growth in HBase adoption • Thanks to everyone in this room!• MapR expands the market for HBase • Enterprises require HA, data protection and disaster recovery • MapR makes it easier to run HBase in production  One minute to set up hourly snapshots  One minute to set up cross-datacenter mirroring  No need to worry about NameNode 7/23/2012 ©MapR Technologies 3
    • Volumes – easy data management• MapR makes data management easier with volumes• Volumes are directories with management policies • Replication, snapshots, mirroring, data placement control, quotas, usage tracking, …• Each user/project directory should be a volume • 100K volumes not a problem 7/23/2012 ©MapR Technologies 4
    • The HBase volume• All HBase data should be in one volume • HBase WALs are per RegionServer, so can’t create per-table volumes• A volume for HBase data is created on installation • Name: hbase.volume • Mount path: /hbase• Replication optimized for low latency • Star replication beats chain replication for HBase• For bulk load, create the HFiles in the HBase volume (/hbase)# cd /mapr/default/hbase Reminder: A MapR# ls -latotal 7 cluster can be mounteddrwxrwxrwx 13 root root 12 2012-01-16 11:44 . via NFS so cd and lsdrwxrwxrwx 6 root root 7 2012-01-13 16:08 .. just workdrwxrwxrwx 3 root root 1 2012-01-15 11:30 AdImpressions-rwxrwxrwx 1 root root 3 2011-12-16 13:03 hbase.versiondrwxrwxrwx 5 root root 3 2012-01-12 15:28 .logs All WALs are in .logs,drwxrwxrwx 3 root root 1 2011-12-16 13:03 .META. not in the user tabledrwxrwxrwx 2 root root 0 2012-01-13 14:29 .oldlogsdrwxrwxrwx 3 root root 1 2011-12-16 13:03 -ROOT- directoriesdrwxrwxrwx 3 root root 1 2012-01-16 11:44 Users (AdImpressions, Users) 7/23/2012 ©MapR Technologies 5
    • HBase backups with snapshots• Why snapshots? • Consistent – HFiles and HLogs at the same point in time • No downtime – snapshot a live HBase cluster, no performance impact • No data duplication – takes seconds to snapshot petabytes • Short RPOs – snapshot hourly or more frequently• Access HBase snapshots in /hbase/.snapshot: # cd .snapshot # pwd /mapr/default/hbase/.snapshot # ls -la total 3 drwxr-xr-x 5 root root 3 Jan 16 16:02 . drwxrwxrwx 7 root root 6 Jan 16 11:46 .. drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.14-02-02 drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.15-02-02 drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.16-02-02 # ls -a 2012-01-16.16-02-02 . .. AdImpressions hbase.version .logs .META. .oldlogs -ROOT- 7/23/2012 ©MapR Technologies 6
    • Manage your schedules 7/23/2012 ©MapR Technologies 7
    • Choose a snapshot schedule for HBase Use this GUI dialog, or the CLI or REST API Choose a snapshot schedule for this volume 7/23/2012 ©MapR Technologies 8
    • Mirroring Mirror to… • Research cluster • Failover (DR) cluster • Remote backup cluster • Same cluster! •… Fast (and easy) Safe Flexible• Differential (deltas) • Consistent (snapshot) • Scheduled or on-• Compressed • Checksummed demand • Intranet, WAN or Sneakernet 7/23/2012 ©MapR Technologies 9
    • Mirroring the HBase volume Create a new volume on destination cluster. Choose Remote Mirroring Volume type Choose source cluster and volume (mapr.hbase) Choose mirroring schedule 7/23/2012 ©MapR Technologies 10
    • Mirroring vs. HBase master/slave replication• Block level • No need to run HBase on sink cluster • Only latest update to the a block needs to be sent  With master/slave every operation is sent• MapR mirroring is practically stateless • Each sink cluster keeps one integer – a serial number  When asking for the next update, sink provides most recently seen serial number • Master cluster does not keep any state  No resources consumed on the master cluster • No ZooKeeper involved • Master/slave replication is challenging when it gets out of sync• One system for mirroring both HBase and file/directories 7/23/2012 ©MapR Technologies 11
    • Warden• Warden runs on each server • /etc/init.d/mapr-warden start• Warden starts/manages services on the node• Warden decides how much memory to give each service based on settings in warden.conf # cat /opt/mapr/conf/warden.conf … service.command.hbregion.heapsize.percent=25 service.command.hbregion.heapsize.max=4000 service.command.hbregion.heapsize.min=1000 service.command.mfs.heapsize.percent=20 service.command.mfs.heapsize.min=512 … 7/23/2012 ©MapR Technologies 12
    • Tuning memory settings• The defaults are suitable in most cases• Guidelines: • Don’t exceed 100-200 regions per server • Don’t give RegionServer more than 16GB RAM  Garbage collection might kill you • Give spare memory to FileServer  Written in C/C++ (unlike HDFS DataNode)  Advanced caching and prefetching • Don’t enable TaskTracker unless you need it  Or Warden will reserve memory for tasks  If TaskTracker not enabled and mfs.heapsize.max not in warden.conf, Warden assigns spare memory to FileServer 7/23/2012 ©MapR Technologies 13
    • Architecting applications with many objects• MapR supports up to 1 trillion files (small files OK) • Fully distributed metadata  No NameNode or block reports • Extremely fast random I/O (10-1000x compared to HDFS) • With HDFS Federation the upcoming HA NameNode you would need 20K NameNodes and an HA NetApp :-)• Keep smaller objects in HBase and larger objects (> 100KB) in MapR storage services Metadata (IDs, attributes, etc.) Content (messages, attachments, etc.) HBase MapR storage services 7/23/2012 ©MapR Technologies 14
    • Three ways to access the files• NFS • Mount the cluster over NFS • NFS HA ensures availability – MapR assigns and manages virtual IPs • No client library, works with any language $ mount –o … mycluster:/mapr /mapr $ python >>> with open(r/mapr/mycluster/images/asdfghjkl, w) as f: ... f.write(…)• Java – Hadoop FileSystem API FileSystem fs = FileSystem.get(new Configuration()); FSDataOutputStream out = fs.create(…); out.write(…)• C/C++ – native libhdfs library (MapR 1.2+) • Same API (header file) as libhdfs, but no Java involved hdfsFS fs = hdfsConnect(...); hdfsFile f = hdfsOpenFile(fs, ...); hdfsWrite(fs, f, ...); 7/23/2012 ©MapR Technologies 15
    • Questions? 7/23/2012 ©MapR Technologies 16