SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Big Data is getting more attention each day, followed by new storage paradigms. This presentation shows a fast intro to HBase, a column oriented database used by Facebook and other big players to store and extract knowledge of high volume of data.
Director Software Engineering at Rocket Internet SE
Big Data is getting more attention each day, followed by new storage paradigms. This presentation shows a fast intro to HBase, a column oriented database used by Facebook and other big players to store and extract knowledge of high volume of data.
3.
intro
“A BigTable HBase is a sparse,
distributed, persistent
multidimensional sorted map”
http://research.google.com/archive/bigtable.html
sexta-feira, 24 de fevereiro de 12
4.
intro > data model
{ <-- table
// ...
"aaaaa" : { <-- row
"A" : { <-- column family
"foo" : { <-- column (qualifier)
15: "y", <-- timestamp, value
4: "m"
}
"bar" : {...}
},
"B" : {
"" : {...}
}
},
"aaaab" : {
"A" : {
"foo" : {...},
"bar" : {...},
"joe" : {...}
},
"B" : {
"" : {...}
}
},
// ...
}
sexta-feira, 24 de fevereiro de 12
5.
intro > data model
(Table, RowKey, Family, Column, Timestamp) → Value
sexta-feira, 24 de fevereiro de 12
6.
intro > hadoop stack
• hadoop HDFS (or not)
• hadoop MapReduce
• hadoop ZooKeeper
• hadoop HBase
• hadoop Hue, Whirr, etc...
sexta-feira, 24 de fevereiro de 12
7.
architecture
sexta-feira, 24 de fevereiro de 12
8.
key design > read/write model
• randon reads (get)
• sequential reads (scan)
• partial key scans
• writes (put = update)
sexta-feira, 24 de fevereiro de 12
9.
key design > storage model
http://ofps.oreilly.com/titles/9781449396107/advanced.html
sexta-feira, 24 de fevereiro de 12
10.
key design > strategies
• tall-narrow vs flat-wide
• partial key scans
• pagination
• time series
• salting
• field swap
• randomization
• secondary indexes
sexta-feira, 24 de fevereiro de 12
11.
key design > example
sexta-feira, 24 de fevereiro de 12
12.
development
• installation modes
• standalone, pseudo-distributed, distributed
• JRuby console
• Access
• java/jruby API (more features)
• entrypoints REST, Thrift, Avro, Protobuffers
• there several other libs
sexta-feira, 24 de fevereiro de 12
13.
cons
• complex config and maintenance
• hot regions
• no secondary index built-in
• no transactions built-in
• complex schema design
sexta-feira, 24 de fevereiro de 12
14.
pros
• distributed
• scalable (auto-sharding)
• built on Hadoop stack
• handles Big Data
• high performance for write and read
• no SPOF
• fault tolerant, no data loss
• active community
sexta-feira, 24 de fevereiro de 12
15.
Reformulação Box de Login Abril ID
http://engineering.abril.com.br/
http://abr.io/hbase-intro
https://pinboard.in/u:lfcipriani/t:hbase/
http://hbase.apache.org/
? http://shop.oreilly.com/product/0636920014348.do
sexta-feira, 24 de fevereiro de 12