Hbase jdd
Upcoming SlideShare
Loading in...5
×
 

Hbase jdd

on

  • 951 views

 

Statistics

Views

Total Views
951
Slideshare-icon Views on SlideShare
917
Embed Views
34

Actions

Likes
0
Downloads
25
Comments
0

2 Embeds 34

http://www.linkedin.com 26
https://www.linkedin.com 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hbase jdd Hbase jdd Presentation Transcript

    • HBase Tame your BigData Andrzej  Grzesik   LunarLogicPolska
    • me:  present past
    • Questions? Ask them right away!
    • So
    • HBase open-­‐‑source high-­‐‑performance BigTable fast distributed NoSQL datastore scalable built  upon     Hadoop fault  tolerant Cool  and  fun  to  work  with!
    • Who  uses  Hbase?
    • Beware! Lots of text
    • Hadoop  stack By  my  count  —  and  it’s  very  possible  I’m  missing  someone  —   Hadoop-­‐‑based  startups  have  raised  $104.5  million  since  May.   The  same  set  of  companies  has  raised  $159.7  million  since  2009   when  Cloudera  closed  its  first  round. By  comparison,  the  handful  of  popular  NoSQL  database  vendors,   often  lumped  into  the  big  data  category  as  well,  and  similar  to  Hadoop  in  their  focus  on  unstructured  data,  have  announced  just  more  than  $90  million  in  funding  overall. via  (hKp://gigaom.com/cloud/with-­‐‑40m-­‐‑for-­‐‑cloudera-­‐‑how-­‐‑much-­‐‑is-­‐‑hadoop-­‐‑worth/)
    • Some  theory
    • architecture HBase Zookeeper m/r hdfs hadoop servers node node node
    • Related  projects: •  Chukwa o  Log analysis tool•  Hive o  Or, if Hive is slow:•  Pig o  High level data manipulation language o  Don’t write all MapReduce jobs by hand!
    • Brewer’s  CAP  theorem Availability HBase RDBMS Pick   2 Partition   Consistency Tolerance CouchDB
    • Data  organisation Rowkey  1 Rowkey  n+1 … … Rowkey  n … Region  1 Region  2
    • Data  organisation Region Column  family   Column  family   col1,  col2,  col3 col1,  col2 Column  family Column  family
    • Data  organisation ColumnKey Region column1 column2 column3 Timestamp v1@t1 v1@t1 v1@t1 v1@t2 v1@t2 v1@t3
    • Let’s  see  some  code?
    • Integration  testing? Start cluster locally ? Use a remote one
    • How  to  start  hacking? Grab hadoop http://hadoop.apache.org/and Hbase http://hbase.apache.org/Spend an eon learning more than you wanted aboutplumbing
    • How  to  start  hacking? Better (faster) way:Grab a VM/packages from
    • Pro  tip Don’t run HBase on or face problemsIt’s doable(http://hbase.apache.org/docs/r0.20.6/cygwin.html)but VMs are faster!
    • How  to  start  hacking? Situation will improve, since
    • modes Develop with•  local mode o  single instance, single JVMThen•  Pseudo-distributed o  multiple instances, single machineFor production•  Distributed mode o  many nodes
    • One  more Befriend some admins, you will need them
    • Use  cases?
    • Example  from  X •  Customer-provided user data•  Schema varying between customers o  kept in RDBMS,•  Data in HBase
    • Example  from  Facebook HBase drives Facebook messages•  Key: UserId•  Column: Word•  Version: MessageIdSee for more details(http://www.infoq.com/presentations/HBase-at-Facebook)
    • When  to  use  Hbase? •  Lots of key/value data•  Need good scalability•  Need good query times with random access•  Data analytics
    • What  is  HBase  poor  at? •  transactions•  relying on indexes•  security
    • T(h)ank  you!
    • Useful Brewer’s CAP theoremhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdfGoogle BigTablehttp://labs.google.com/papers/bigtable-osdi06.pdfDzone Refcardshttp://refcardz.dzone.com/refcardz/getting-started-apache-hadoophttp://refcardz.dzone.com/refcardz/deploying-hadoop