Director Software Engineering at Rocket Internet SE
Feb. 27, 2012•0 likes•20,269 views
1 of 15
Hbase: Introduction to column oriented databases
Feb. 27, 2012•0 likes•20,269 views
Download to read offline
Report
Technology
Big Data is getting more attention each day, followed by new storage paradigms. This presentation shows a fast intro to HBase, a column oriented database used by Facebook and other big players to store and extract knowledge of high volume of data.
3. intro
“A BigTable HBase is a sparse,
distributed, persistent
multidimensional sorted map”
http://research.google.com/archive/bigtable.html
sexta-feira, 24 de fevereiro de 12
4. intro > data model
{ <-- table
// ...
"aaaaa" : { <-- row
"A" : { <-- column family
"foo" : { <-- column (qualifier)
15: "y", <-- timestamp, value
4: "m"
}
"bar" : {...}
},
"B" : {
"" : {...}
}
},
"aaaab" : {
"A" : {
"foo" : {...},
"bar" : {...},
"joe" : {...}
},
"B" : {
"" : {...}
}
},
// ...
}
sexta-feira, 24 de fevereiro de 12
5. intro > data model
(Table, RowKey, Family, Column, Timestamp) → Value
sexta-feira, 24 de fevereiro de 12
6. intro > hadoop stack
• hadoop HDFS (or not)
• hadoop MapReduce
• hadoop ZooKeeper
• hadoop HBase
• hadoop Hue, Whirr, etc...
sexta-feira, 24 de fevereiro de 12
8. key design > read/write model
• randon reads (get)
• sequential reads (scan)
• partial key scans
• writes (put = update)
sexta-feira, 24 de fevereiro de 12
9. key design > storage model
http://ofps.oreilly.com/titles/9781449396107/advanced.html
sexta-feira, 24 de fevereiro de 12
10. key design > strategies
• tall-narrow vs flat-wide
• partial key scans
• pagination
• time series
• salting
• field swap
• randomization
• secondary indexes
sexta-feira, 24 de fevereiro de 12
11. key design > example
sexta-feira, 24 de fevereiro de 12
12. development
• installation modes
• standalone, pseudo-distributed, distributed
• JRuby console
• Access
• java/jruby API (more features)
• entrypoints REST, Thrift, Avro, Protobuffers
• there several other libs
sexta-feira, 24 de fevereiro de 12
13. cons
• complex config and maintenance
• hot regions
• no secondary index built-in
• no transactions built-in
• complex schema design
sexta-feira, 24 de fevereiro de 12
14. pros
• distributed
• scalable (auto-sharding)
• built on Hadoop stack
• handles Big Data
• high performance for write and read
• no SPOF
• fault tolerant, no data loss
• active community
sexta-feira, 24 de fevereiro de 12
15. Reformulação Box de Login Abril ID
http://engineering.abril.com.br/
http://abr.io/hbase-intro
https://pinboard.in/u:lfcipriani/t:hbase/
http://hbase.apache.org/
? http://shop.oreilly.com/product/0636920014348.do
sexta-feira, 24 de fevereiro de 12