MapR M7: Providing an enterprise quality Apache HBase API

20,896 views
20,679 views

Published on

Provides an overview of M7, which is the first unified data platform for tables and files. Does a deep dive into the MapR architecture, especially containers, and how M7 tables integrates with the rest of MapR architecture, including volumes, management and Hadoop.

Describes some of the problems with Apache HBase, and how M7 from MapR solves many of these issues.

Published in: Technology
0 Comments
19 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
20,896
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
3
Comments
0
Likes
19
Embeds 0
No embeds

No notes for slide

MapR M7: Providing an enterprise quality Apache HBase API

  1. 1. 1©MapR Technologies - Confidential M7 Technical Overview M. C. Srivas CTO/Founder, MapR
  2. 2. 2©MapR Technologies - Confidential MapR: Lights Out Data Center Ready • Automated stateful failover • Automated re-replication • Self-healing from HW and SW failures • Load balancing • Rolling upgrades • No lost jobs or data • 99999’s of uptime Reliable Compute Dependable Storage • Business continuity with snapshots and mirrors • Recover to a point in time • End-to-end check summing • Strong consistency • Built-in compression • Mirror between two sites by RTO policy
  3. 3. 3©MapR Technologies - Confidential MapR does MapReduce (fast) TeraSort Record 1 TB in 54 seconds 1003 nodes MinuteSort Record 1.5 TB in 59 seconds 2103 nodes
  4. 4. 4©MapR Technologies - Confidential MapR does MapReduce (faster) TeraSort Record 1 TB in 54 seconds 1003 nodes MinuteSort Record 1.5 TB in 59 seconds 2103 nodes 1.65 300
  5. 5. 5©MapR Technologies - Confidential Dynamo DB ZopeDB Shoal CloudKit Vertex DB FlockD B NoSQL
  6. 6. 6©MapR Technologies - Confidential HBase Table Architecture  Tables are divided into key ranges (regions)  Regions are served by nodes (RegionServers)  Columns are divided into access groups (columns families) CF1 CF2 CF3 CF4 CF5 R1 R2 R3 R4
  7. 7. 7©MapR Technologies - Confidential HBase Architecture is Better  Strong consistency model – when a write returns, all readers will see same value – "eventually consistent" is often "eventually inconsistent"  Scan works – does not broadcast – ring-based NoSQL databases (eg, Cassandra, Riak) suffer on scans  Scales automatically – Splits when regions become too large – Uses HDFS to spread data, manage space  Integrated with Hadoop – map-reduce on HBase is straightforward
  8. 8. 8©MapR Technologies - Confidential M7 An integrated system for unstructured and structured data
  9. 9. 9©MapR Technologies - Confidential MapR M7 Tables  Binary compatible with Apache HBase – no recompilation needed to access M7 tables – Just set CLASSPATH – including HBase CLI  M7 tables accessed via pathname – openTable( "hello") … uses HBase – openTable( "/hello") … uses M7 – openTable( "/user/srivas/hello") … uses M7 9
  10. 10. 10©MapR Technologies - Confidential Binary Compatible  HBase applications work "as is" with M7 – No need to recompile , just set CLASSPATH  Can run M7 and HBase side-by-side on the same cluster – eg, during a migration – can access both M7 table and HBase table in same program  Use standard Apache HBase CopyTable tool to copy a table from HBase to M7 or vice-versa, viz., % hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=/user/srivas/mytable oldtable
  11. 11. 11©MapR Technologies - Confidential Features  Unlimited number of tables – HBase is typically 10-20 tables (max 100)  No compaction  Instant-On – zero recovery time  8x insert/update perf  10x random scan perf  10x faster with flash - special flash support 11
  12. 12. 12©MapR Technologies - Confidential M7: Remove Layers, Simplify MapR M7
  13. 13. 13©MapR Technologies - Confidential M7 tables in a MapR Cluster  M7 tables integrated into storage – always available on every node – no separate process to start/stop/monitor – zero administration – no tuning parameters … just works  M7 tables work 'as expected' – First copy local to writing client – Snapshots and mirrors – Quotas , repl factor, data placement 13
  14. 14. 14©MapR Technologies - Confidential Unified Namespace for Files and Tables $ pwd /mapr/default/user/dave $ ls file1 file2 table1 table2 $ hbase shell hbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3' 0 row(s) in 0.1570 seconds $ ls file1 file2 table1 table2 table3 $ hadoop fs -ls /user/dave Found 5 items -rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/dave/file1 -rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/dave/file2 trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/dave/table1 trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/dave/table2 trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/dave/table3
  15. 15. 15©MapR Technologies - Confidential M7 – An Integrated System
  16. 16. 16©MapR Technologies - Confidential Tables for End Users  Users can create and manage their own tables – Unlimited # of tables – first copy local  Tables can be created in any directory – Tables count towards volume and user quotas  No admin intervention needed – do stuff on the fly, no stop/restart servers  Automatic data protection and disaster recovery – Users can recover from snapshots/mirrors on their own
  17. 17. 17©MapR Technologies - Confidential M7 combines the best of LSM and BTrees  LSM Trees reduce insert cost by deferring and batching index changes – If don't compact often, read perf is impacted – If compact too often, write perf is impacted  B-Trees are great for reads – but expensive to update in real-time Can we combine both ideas? Writes cannot be done better than W = 2.5x write to log + write data to somewhere + update meta-data
  18. 18. 18©MapR Technologies - Confidential M7 from MapR  Twisting BTree's – leaves are variable size (8K - 8M or larger) – can stay unbalanced for long periods of time • more inserts will balance it eventually • automatically throttles updates to interior btree nodes – M7 inserts "close to" where the data is supposed to go  Reads – Uses BTree structure to get "close" very fast • very high branching with key-prefix-compression – Utilizes a separate lower-level index to find it exactly • updated "in-place"bloom-filters for gets, range-maps for scans  Overhead – 1K record read will transfer about 32K from disk in logN seeks
  19. 19. 19©MapR Technologies - Confidential M7 Comparative Analysis with Apache HBase, Level-DB and a BTree
  20. 20. 20©MapR Technologies - Confidential Apache HBase HFile Structure 64Kbyte blocks are compressed An index into the compressed blocks is created as a btree Key-value pairs are laid out in increasing order Each cell is an individual key + value - a row repeats the key for each column
  21. 21. 21©MapR Technologies - Confidential HBase Region Operation  Typical region size is a few GB, sometimes even 10G or 20G  RS holds data in memory until full, then writes a new HFile – Logical view of database constructed by layering these files, with the latest on top Key range represented by this region newest oldest
  22. 22. 22©MapR Technologies - Confidential HBase Read Amplification  When a get/scan comes in, all the files have to be examined – schema-less, so where is the column? – Done in-memory and does not change what's on disk • Bloom-filters do not help in scans newest oldest With 7 files, a 1K-record get () takes about 30 seeks, 7 block decompressions, and a total data transfer of about 130K from HDFS.
  23. 23. 23©MapR Technologies - Confidential HBase Write Amplification  To reduce the read-amplification, HBase merges the HFiles periodically – process called compaction – runs automatically when too many files – usually turned off due to I/O storms – and kicked-off manually on weekends Compaction reads all files and merges into a single HFile
  24. 24. 24©MapR Technologies - Confidential HBase Compaction Analysis  Assume 10G per region, write 10% per day, grow 10% per week – 1G of writes – after 7 days, 7 files of 1G and 1file of 10G  Compaction – Total reads: 17G (= 7 x 1G + 1 x 10G) – Total writes: 25G (= 7G wal + 7G flush + 11G write to new HFile)  500 regions – read 8.5T, write 12.5T  major outage on node – with fewer hfiles, it only gets worse  Best practice, serve < 500g per node (50 regions)
  25. 25. 25©MapR Technologies - Confidential Level-DB  Tiered, logarithmic increase – L1: 2 x 1M files – L2: 10 x 1M – L3: 100 x 1M – L4: 1,000 x 1M, etc  Compaction overhead – avoids IO storms (i/o done in smaller increments of ~10M) – but significantly extra bandwidth compared to HBase  Read overhead is still high – 10-15 seeks, perhaps more if the lowest level is very large – 40K - 60K read from disk to retrieve a 1K record
  26. 26. 26©MapR Technologies - Confidential BTree analysis  Read finds data directly, proven to be fastest – interior nodes only hold keys – very large branching factor – values only at leaves – thus caches work – R = logN seeks, if no caching – 1K record read will transfer about logN blocks from disk  Writes are slow on inserts – inserted into correct place right away – otherwise read will not find it – requires btree to be continuously rebalanced – causes extreme random i/o in insert path – W = 2.5x + logN seeks if no caching
  27. 27. Let’s look at some Performance Numbers for proof
  28. 28. 29©MapR Technologies - Confidential M7 vs. CDH: 50-50 Mix (Reads)
  29. 29. 30©MapR Technologies - Confidential M7 vs. CDH: 50-50 load (read latency)
  30. 30. 31©MapR Technologies - Confidential M7 vs. CDH: 50-50 Mix (Updates)
  31. 31. 32©MapR Technologies - Confidential M7 vs. CDH: 50-50 mix (update latency)
  32. 32. 33©MapR Technologies - Confidential MapR M7 Accelerates HBase Applications Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 8000 1695 5.5x 95% read, 5% update 3716 602 6x Reads 5520 764 7.2x Scans (50 rows) 1080 156 6.9x CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores RAM: 48GB Disk: 12 x 3TB (7200 RPM) Record size: 1KB Data size: 2TB OS: CentOS Release 6.2 (Final) Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 21328 2547 8.4x 95% read, 5% update 13455 2660 5x Reads 18206 1605 11.3x Scans (50 rows) 1298 116 11.2x CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores RAM: 24GB Disk: 1 x 1.2TB Fusion I/O ioDrive2 Record size: 1KB Data size: 600GB OS: CentOS Release 6.3 (Final) MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x
  33. 33. 34©MapR Technologies - Confidential M7: Fileservers Serve Regions  Region lives entirely inside a container – Does not coordinate through ZooKeeper  Containers support distributed transactions – with replication built-in  Only coordination in the system is for splits – Between region-map and data-container – already solved this problem for files and its chunks
  34. 34. 35©MapR Technologies - Confidential Server Reboot  Full container-reports are tiny – CLDB needs 2G dram for 1000-node cluster  Volumes come online very fast – each volume independent of others – as soon as min-repl # of containers ready
  35. 35. 36©MapR Technologies - Confidential Server Reboot  Full container-reports are tiny – CLDB needs 2G dram for 1000-node cluster  Volumes come online very fast – each volume independent of others – as soon as min-repl # of containers ready – does not wait for whole cluster (eg, HDFS waits for 99.9% blocks reporting)
  36. 36. 37©MapR Technologies - Confidential Server Reboot  Full container-reports are tiny – CLDB needs 2G dram for 1000-node cluster  Volumes come online very fast – each volume independent of others – as soon as min-repl # of containers ready – does not wait for whole cluster (eg, HDFS waits for 99.9% blocks reporting)  1000-node cluster restart < 5 mins
  37. 37. 38©MapR Technologies - Confidential M7 provides Instant Recovery  0-40 microWALs per region – idle WALs go to zero quickly, so most are empty – region is up before all microWALs are recovered – recovers region in background in parallel – when a key is accessed, that microWAL is recovered inline – 1000-10000x faster recovery
  38. 38. 39©MapR Technologies - Confidential M7 provides Instant Recovery  0-40 microWALs per region – idle WALs go to zero quickly, so most are empty – region is up before all microWALs are recovered – recovers region in background in parallel – when a key is accessed, that microWAL is recovered inline – 1000-10000x faster recovery  Why doesn't HBase do this? – M7 leverages unique MapR-FS capabilities, not impacted by HDFS limitations – No limit to # of files on disk – No limit to # open files – I/O path translates random writes to sequential writes on disk
  39. 39. 40©MapR Technologies - Confidential MapR M7 Accelerates HBase Applications Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 8000 1695 5.5x 95% read, 5% update 3716 602 6x Reads 5520 764 7.2x Scans (50 rows) 1080 156 6.9x CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores RAM: 48GB Disk: 12 x 3TB (7200 RPM) Record size: 1KB Data size: 2TB OS: CentOS Release 6.2 (Final) Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 21328 2547 8.4x 95% read, 5% update 13455 2660 5x Reads 18206 1605 11.3x Scans (50 rows) 1298 116 11.2x CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores RAM: 24GB Disk: 1 x 1.2TB Fusion I/O ioDrive2 Record size: 1KB Data size: 600GB OS: CentOS Release 6.3 (Final) MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x

×