Hadoop/HBase POC v1 Review

 A framework for Hadoop/HBase POC
POC
• Proof Of Concept, usually in competition with
  another product.
• Given use case:
  – Performance: critical path (speed), most
    benchmark read performance,shard for write
    performance
  – Cost: H/W + administrative cost
  – Look at Hbase+Hadoop vs. MongoDB
HBase
• Transactional store; 70k messages/sec
  1.5kb/message. >1GB ethernet speeds
• What is Hbase?, Sources
Cloudera HBase Training Materials
• Exercises:
  http://async.pbworks.com/w/file/55596671/H
  Base_exercise_instructions.pdf
• Training Slides:
  http://async.pbworks.com/w/file/54915308/Cl
  oudera_hbase_training.pdf
• Training VM; 2GB put somewhere else.
System Design on working
      components
HDFS vs. Hbase
• Replication and distributed FS. Think NFS not just replicas.
  Metadata at central NameNode, single point of failure.
  Secondary NN as hot backup. Failure and recovery protocol
  testing not part of POC
• Blocks, larger is better. Blocks are replicated. Not cells.
• HDFS write once, was modified to append to file for HBase.
• MapR HDFS compatible:
   –   fast adoption w/Hbase; snapshots
   –   Cross data center mirroring, consistent mirroring
   –   Star replication vs. chain replication
   –   FileServer vs. TaskTracker, Warden vs. NN. No Single point failure
RS + DN on same machine
HBase Memory(book)
Hbase Disks(book)
• No RAID on slaves, master ok. Use IOPS
HBase Networking(book)
Transactional Write Perf.
• Factor out network, multiple clients, any disk
  seeks from test program
• Create test packets in memory only.
• Write perf function of Instance
  memory, packet size,
HBase Write Path
Run on Amazon AWS first
• INSTANCES:
  – SMALL INSTANCE: 1.7GB
  – LARGE INSTANCE: 7.5GB
  – HIMEM XLARGE: 17GB, 34GB, 68GB
  – SSD DRIVES!!
Write performance, 300k m/s 1500
          bytes synthetic data.
                   Series 2
3500

3000

2500

2000

1500                                    Series 2

1000

500

   0
       1.7   7.5   17         34   68
Dell Notes:
•   MapR says 16GB/Cloudera 24GB,
•   plot heap size instead.
•   Dell, is this slowing down performance?
•   Take out a dimm?
•   Reproduce results first?
HBase write perf, 1M byte/s
• http://www.slideshare.net/jdcryans/performa
  nce-bof12, 100k-40k/second 10 byte packets
Write test code
• No network, no disk accesses. Run on local
  node
Hbase AWS Packet Size 16-1500 bytes
• http://async.pbworks.com/w/file/55320973/A
  WSHBasePerf16_1500bytepacket.xlsx
Hbase Write Perf, 1500 byte packets
• Single thread, single node. Should be >>
  w/more threads or async client
• 16 Byte: 11235p/s
• 40 Byte: 8064p/s
• 80 Byte: 5263p/s
• 1500 Byte:3311p/s
• 8GB Heap, big regions(optimizations in file
  names), etc…12-20 tried, 4 make diff
AWS Reduce #RPC
• Batch Mode, 1000 inserts = 1000 RPCs, reduce
  to 1 RPC w/Batch, 3610 p/s (5.4Mb/s, pass
  error check, m22xlarge instance). Note:mongo
           2.5


            2


           1.5

                                  Series1
            1


           0.5


            0
                   1
                  16
                  31
                  46
                  61
                  76
                  91
                 106
                 121
                 136
                 151
                 166
                 181
                 196
                 211
                 226
                 241
                 256
                 271
                 286
Dell H/W Perf. (default config) worse
       2262p/s vs 3311(aws)
       http://async.pbworks.com/w/file/55225682/graphdell1500bytepacket8gb.txt
0.1
                0.2
                      0.3
                            0.4
                                            0.5
                                                  0.6
                                                        0.7
                                                              0.8
                                                                    0.9




      0
                                                                          1
  1
  8
 15
 22
 29
 36
 43
 50
 57
 64
 71
 78
 85
 92
 99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246
253
260
267
274
281
288
295
                                                                              DELL WAL off, 2262->2688(+18.5%)




                                  Series1
0
          1
              2
                            3
                                4
                                    5
                                        6
  1
  8
 15
 22
 29
 36
 43
 50
 57
 64
 71
 78
 85
 92
 99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246
253
260
                                            >3557p/s. 57% increase




267
274
281
288
295
                                        Dell WAL disabled,big heap, big
                                        regions, need more time 2262-




                  Series1
AWS SSD(3267p/s) vs.
      EBS(4042p/s), no compaction. Red
 5
       m2large. Maybe AWS using SSD?
4.5


 4


3.5


 3


2.5                                      Series1
                                         Series2
 2


1.5


 1


0.5


 0
        1
        8
       15
       22
       29
       36
       43
       50
       57
       64
       71
       78
       85
       92
       99
      106
      113
      120
      127
      134
      141
      148
      155
      162
      169
      176
      183
      190
      197
      204
      211
      218
      225
      232
      239
      246
      253
      260
      267
      274
      281
      288
      295
AWS(3500-4k packets/sec) vs. DELL
• AWS 3-4k p/s default configuration w/o optimization.
• Dell (3557p/s) slower than AWS(3610 optimized
  m22xlarge, 4240p/s m2large)
• Faster h/w instances in AWS makes a difference.
  Lesson(4210p/s): contolling the regions and
  compactions have impact on performance, fast IO.
  Spend time later on this.
• User error w/Dell h/w somewhere. Can’t be that slow!
• Could run a benchmark on m22xlarge over 24h period
  to see variability in perf. Not worth time investment
Dell Tuning
• Ext3/4 5% diff in benchmarks. No diff in p/s
  performance.
• Raid levels? JBOD not avail.
• Maybe m2.2xlarge are high perf AWS drives
  are SSD? Seems funny w/pricing structure.
• Noatime, 256k block sizes,
• Goal: 4k P/S?
Bulk Load (worth time investment?)
• Quick error check
• Take existing table, export it, bulk load.
  Command line; very rough.
• Should redo w/Java program. WAL off is
  approximation
Write Clients for NOSQL
• HBase, Mongo, Cassandra have threads
  behind them, need a threaded or async client
  to get full performance.
• need more time, higher priority than dist
  mode, needed in dist mode
• lock timeout behavior; insert 1 row
• Need a threaded or async client. Most get
  threaded design wrong?
Write Load Tool (multiple clients)
• 300k rows single thread single client: 14430
  ms, 2079p/s; about right….
• 300k rows 3 threads:22804ms
• M/R 30 mappers:24289
• M/R better when need to do combining or
  processing of input data. M/R & Threads
  comparison about right. Threads should
  increase performance… ok writing my own…
Application Level Perf
• Not transactional…
• Simulate reporting store; writes concurrent
  w/web page read.
• Compare w/SQL Server, MongoDB which have
  column indexes.
• You may not need column indexes if designed
  correctly. ESN not key, will need consecutive
  keys to split into balanced regions.
Web GUI
• Demo, webpage & writes into DB. Test MS SQL
  Server packets/sec using same.
• Do a like %asdf% with no data to see if there
  is a timeout
Read Performance
• Index search through webpage w/writer is fast, 50-100ms, <10-
  20ms if in cache
• Don’t do all table scans. Like in hbase shell count ‘table name’
    – Count * from table
• PIG/HIVE are faster on top of Hbase b/c they store metadata
• All table scan:
    10 rows:18ms
    100 rows:11-166ms
    1000 rows: 638 ms
    10k rows: 4.3 s
    100k rows: 38 s (not printed)
• Use filters for search, exact match, regex, substring, more
Read Path/SCAN/Filters
SingleValueColumn Filter
• Search for specific
  value, constant, regex, prefix. Did not try
  others
• Same queries as before, search for specific
  values testing 100k-1M rows.
• W/O filters, use iterator to hold result set and
  iterate through each result, test each result
  value. Like DB drivers. Filter reduces result set
  size from all rows to only rows which meet
  condition
Column Value
• Filter filter = new
  SingleColumnValueFilter(“CF”, “Key5”, Compar
  eOp.EQUAL, “bob”).
• Filter f = new
  SingleColumnValutFilter(“CF”,”COLUMN”, Com
  pareOp.EQUAL, new
  RegexStringComparator(“z*”));
• 565ms for 200k rows, 115 result set returned
  (printed), small result sets are faster.
Column Value Searches
• 100k row table
  – Returning .1% of results , (10):5s
  – Returning 1% of results, (100): 11.29s
• 1M row table
  – 1% results:212 s (10k)
  – .1% results:204.057s (1k)
Compose row key w/values or index
              tables
• Add second table where the row keys are
  composed partially of the values
Secondary table Consistency, don’t need for a
reporting system? Consistent on inserts or bulk
import.
Build Environment
• Ready for CI, (Jenkins)
• Ubuntu specific process for changing
  code, make all, make deb, make apt, then
  install using apt-get install hadoop* hbase*.
• Need to start over for yum for centos.
• Demo
• Also ready for command line w/o GUI
Hbase org.apache.hadoop.hbase.PerfEval xx xx
Distributed mode
• Setup build environment
• Distributed mode setup. Zookeeper error
  message:
• Disable ipv6? Debugging
Docs:
• Bigtop/updated version of CDH
• Installation:
• Build Docs: Ubuntu/deb; big change to rpms;
  takes time to document and debug. Can do
  both, takes time.
• Distributed Mode:
• NXServer/NXClient:
• Screen:

Hadoop/HBase POC framework

  • 1.
    Hadoop/HBase POC v1Review A framework for Hadoop/HBase POC
  • 2.
    POC • Proof OfConcept, usually in competition with another product. • Given use case: – Performance: critical path (speed), most benchmark read performance,shard for write performance – Cost: H/W + administrative cost – Look at Hbase+Hadoop vs. MongoDB
  • 3.
    HBase • Transactional store;70k messages/sec 1.5kb/message. >1GB ethernet speeds • What is Hbase?, Sources
  • 4.
    Cloudera HBase TrainingMaterials • Exercises: http://async.pbworks.com/w/file/55596671/H Base_exercise_instructions.pdf • Training Slides: http://async.pbworks.com/w/file/54915308/Cl oudera_hbase_training.pdf • Training VM; 2GB put somewhere else.
  • 6.
    System Design onworking components
  • 7.
    HDFS vs. Hbase •Replication and distributed FS. Think NFS not just replicas. Metadata at central NameNode, single point of failure. Secondary NN as hot backup. Failure and recovery protocol testing not part of POC • Blocks, larger is better. Blocks are replicated. Not cells. • HDFS write once, was modified to append to file for HBase. • MapR HDFS compatible: – fast adoption w/Hbase; snapshots – Cross data center mirroring, consistent mirroring – Star replication vs. chain replication – FileServer vs. TaskTracker, Warden vs. NN. No Single point failure
  • 8.
    RS + DNon same machine
  • 12.
  • 13.
    Hbase Disks(book) • NoRAID on slaves, master ok. Use IOPS
  • 15.
  • 16.
    Transactional Write Perf. •Factor out network, multiple clients, any disk seeks from test program • Create test packets in memory only. • Write perf function of Instance memory, packet size,
  • 17.
  • 18.
    Run on AmazonAWS first • INSTANCES: – SMALL INSTANCE: 1.7GB – LARGE INSTANCE: 7.5GB – HIMEM XLARGE: 17GB, 34GB, 68GB – SSD DRIVES!!
  • 19.
    Write performance, 300km/s 1500 bytes synthetic data. Series 2 3500 3000 2500 2000 1500 Series 2 1000 500 0 1.7 7.5 17 34 68
  • 20.
    Dell Notes: • MapR says 16GB/Cloudera 24GB, • plot heap size instead. • Dell, is this slowing down performance? • Take out a dimm? • Reproduce results first?
  • 21.
    HBase write perf,1M byte/s • http://www.slideshare.net/jdcryans/performa nce-bof12, 100k-40k/second 10 byte packets
  • 22.
    Write test code •No network, no disk accesses. Run on local node
  • 23.
    Hbase AWS PacketSize 16-1500 bytes • http://async.pbworks.com/w/file/55320973/A WSHBasePerf16_1500bytepacket.xlsx
  • 24.
    Hbase Write Perf,1500 byte packets • Single thread, single node. Should be >> w/more threads or async client • 16 Byte: 11235p/s • 40 Byte: 8064p/s • 80 Byte: 5263p/s • 1500 Byte:3311p/s • 8GB Heap, big regions(optimizations in file names), etc…12-20 tried, 4 make diff
  • 25.
    AWS Reduce #RPC •Batch Mode, 1000 inserts = 1000 RPCs, reduce to 1 RPC w/Batch, 3610 p/s (5.4Mb/s, pass error check, m22xlarge instance). Note:mongo 2.5 2 1.5 Series1 1 0.5 0 1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286
  • 26.
    Dell H/W Perf.(default config) worse 2262p/s vs 3311(aws) http://async.pbworks.com/w/file/55225682/graphdell1500bytepacket8gb.txt
  • 27.
    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218 225 232 239 246 253 260 267 274 281 288 295 DELL WAL off, 2262->2688(+18.5%) Series1
  • 28.
    0 1 2 3 4 5 6 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218 225 232 239 246 253 260 >3557p/s. 57% increase 267 274 281 288 295 Dell WAL disabled,big heap, big regions, need more time 2262- Series1
  • 29.
    AWS SSD(3267p/s) vs. EBS(4042p/s), no compaction. Red 5 m2large. Maybe AWS using SSD? 4.5 4 3.5 3 2.5 Series1 Series2 2 1.5 1 0.5 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218 225 232 239 246 253 260 267 274 281 288 295
  • 30.
    AWS(3500-4k packets/sec) vs.DELL • AWS 3-4k p/s default configuration w/o optimization. • Dell (3557p/s) slower than AWS(3610 optimized m22xlarge, 4240p/s m2large) • Faster h/w instances in AWS makes a difference. Lesson(4210p/s): contolling the regions and compactions have impact on performance, fast IO. Spend time later on this. • User error w/Dell h/w somewhere. Can’t be that slow! • Could run a benchmark on m22xlarge over 24h period to see variability in perf. Not worth time investment
  • 31.
    Dell Tuning • Ext3/45% diff in benchmarks. No diff in p/s performance. • Raid levels? JBOD not avail. • Maybe m2.2xlarge are high perf AWS drives are SSD? Seems funny w/pricing structure. • Noatime, 256k block sizes, • Goal: 4k P/S?
  • 32.
    Bulk Load (worthtime investment?) • Quick error check • Take existing table, export it, bulk load. Command line; very rough. • Should redo w/Java program. WAL off is approximation
  • 33.
    Write Clients forNOSQL • HBase, Mongo, Cassandra have threads behind them, need a threaded or async client to get full performance. • need more time, higher priority than dist mode, needed in dist mode • lock timeout behavior; insert 1 row • Need a threaded or async client. Most get threaded design wrong?
  • 34.
    Write Load Tool(multiple clients) • 300k rows single thread single client: 14430 ms, 2079p/s; about right…. • 300k rows 3 threads:22804ms • M/R 30 mappers:24289 • M/R better when need to do combining or processing of input data. M/R & Threads comparison about right. Threads should increase performance… ok writing my own…
  • 35.
    Application Level Perf •Not transactional… • Simulate reporting store; writes concurrent w/web page read. • Compare w/SQL Server, MongoDB which have column indexes. • You may not need column indexes if designed correctly. ESN not key, will need consecutive keys to split into balanced regions.
  • 36.
    Web GUI • Demo,webpage & writes into DB. Test MS SQL Server packets/sec using same. • Do a like %asdf% with no data to see if there is a timeout
  • 37.
    Read Performance • Indexsearch through webpage w/writer is fast, 50-100ms, <10- 20ms if in cache • Don’t do all table scans. Like in hbase shell count ‘table name’ – Count * from table • PIG/HIVE are faster on top of Hbase b/c they store metadata • All table scan: 10 rows:18ms 100 rows:11-166ms 1000 rows: 638 ms 10k rows: 4.3 s 100k rows: 38 s (not printed) • Use filters for search, exact match, regex, substring, more
  • 38.
  • 39.
    SingleValueColumn Filter • Searchfor specific value, constant, regex, prefix. Did not try others • Same queries as before, search for specific values testing 100k-1M rows. • W/O filters, use iterator to hold result set and iterate through each result, test each result value. Like DB drivers. Filter reduces result set size from all rows to only rows which meet condition
  • 40.
    Column Value • Filterfilter = new SingleColumnValueFilter(“CF”, “Key5”, Compar eOp.EQUAL, “bob”). • Filter f = new SingleColumnValutFilter(“CF”,”COLUMN”, Com pareOp.EQUAL, new RegexStringComparator(“z*”)); • 565ms for 200k rows, 115 result set returned (printed), small result sets are faster.
  • 41.
    Column Value Searches •100k row table – Returning .1% of results , (10):5s – Returning 1% of results, (100): 11.29s • 1M row table – 1% results:212 s (10k) – .1% results:204.057s (1k)
  • 42.
    Compose row keyw/values or index tables • Add second table where the row keys are composed partially of the values Secondary table Consistency, don’t need for a reporting system? Consistent on inserts or bulk import.
  • 43.
    Build Environment • Readyfor CI, (Jenkins) • Ubuntu specific process for changing code, make all, make deb, make apt, then install using apt-get install hadoop* hbase*. • Need to start over for yum for centos. • Demo • Also ready for command line w/o GUI Hbase org.apache.hadoop.hbase.PerfEval xx xx
  • 44.
    Distributed mode • Setupbuild environment • Distributed mode setup. Zookeeper error message: • Disable ipv6? Debugging
  • 45.
    Docs: • Bigtop/updated versionof CDH • Installation: • Build Docs: Ubuntu/deb; big change to rpms; takes time to document and debug. Can do both, takes time. • Distributed Mode: • NXServer/NXClient: • Screen: