Hbase jdd

HBase
Tame your BigData

Andrzej Grzesik
LunarLogicPolska

Questions?
Ask them right away!

HBase

open-‐‑source
high-‐‑performance
BigTable

fast
distributed
NoSQL
datastore
scalable
built upon
Hadoop
fault tolerant

Cool and fun to work with!

Hadoop stack
By my count — and it’s very possible I’m missing someone —
Hadoop-‐‑based startups have raised $104.5 million since May.
The same set of companies has raised $159.7 million since 2009
when Cloudera closed its ﬁrst round.

By comparison, the handful of popular NoSQL database vendors,
often lumped into the big data category as well, and similar to
Hadoop in their focus on unstructured data, have announced just
more than $90 million in funding overall.

via (hKp://gigaom.com/cloud/with-‐‑40m-‐‑for-‐‑cloudera-‐‑how-‐‑much-‐‑is-‐‑hadoop-‐‑worth/)

architecture
HBase

Zookeeper

m/r
hdfs
hadoop

servers
node
node
node

Related projects:
•  Chukwa
o  Log analysis tool

•  Hive

o  Or, if Hive is slow:

•  Pig
o  High level data manipulation language
o  Don’t write all MapReduce jobs by hand!

Brewer’s CAP theorem
Availability

HBase
RDBMS

Pick
2
Partition Consistency
Tolerance

CouchDB

Data organisation

Rowkey 1
Rowkey n+1
…
…
Rowkey n
…

Region 1
Region 2

Data organisation

Region

Column family Column family
col1, col2, col3
col1, col2

Column family
Column family

Data organisation
ColumnKey

Region
column1
column2
column3
Timestamp

v1@t1
v1@t1
v1@t1

v1@t2
v1@t2

v1@t3

Integration testing?
Start cluster locally

?
Use a remote one

How to start hacking?
Grab hadoop
http://hadoop.apache.org/

and Hbase
http://hbase.apache.org/

Spend an eon learning more than you wanted about
plumbing

Better (faster) way:

Grab a VM/packages from

Pro tip
Don’t run HBase on or face problems

It’s doable
(http://hbase.apache.org/docs/r0.20.6/cygwin.html)
but VMs are faster!

Situation will improve, since

modes
Develop with
•  local mode
o  single instance, single JVM

Then
•  Pseudo-distributed
o  multiple instances, single machine

For production
•  Distributed mode
o  many nodes

One more
Befriend some admins, you will need them

Example from X
•  Customer-provided user data
•  Schema varying between customers
o  kept in RDBMS,

•  Data in HBase

Example from Facebook
HBase drives Facebook messages

•  Key: UserId
•  Column: Word
•  Version: MessageId

See for more details
(http://www.infoq.com/presentations/HBase-at-Facebook)

When to use Hbase?
•  Lots of key/value data
•  Need good scalability
•  Need good query times with random access
•  Data analytics

What is HBase poor at?
•  transactions
•  relying on indexes
•  security

Useful
Brewer’s CAP theorem
http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.20.1495&rep=rep1&type=pdf

Google BigTable
http://labs.google.com/papers/bigtable-osdi06.pdf

Dzone Refcards
http://refcardz.dzone.com/refcardz/getting-started-apache-hadoop
http://refcardz.dzone.com/refcardz/deploying-hadoop

Hbase jdd

More Related Content

What's hot

Viewers also liked

Similar to Hbase jdd

Recently uploaded

Hbase jdd