Hbase jdd

935 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
935
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hbase jdd

  1. 1. HBase Tame your BigData Andrzej  Grzesik   LunarLogicPolska
  2. 2. me:  present past
  3. 3. Questions? Ask them right away!
  4. 4. So
  5. 5. HBase open-­‐‑source high-­‐‑performance BigTable fast distributed NoSQL datastore scalable built  upon     Hadoop fault  tolerant Cool  and  fun  to  work  with!
  6. 6. Who  uses  Hbase?
  7. 7. Beware! Lots of text
  8. 8. Hadoop  stack By  my  count  —  and  it’s  very  possible  I’m  missing  someone  —   Hadoop-­‐‑based  startups  have  raised  $104.5  million  since  May.   The  same  set  of  companies  has  raised  $159.7  million  since  2009   when  Cloudera  closed  its  first  round. By  comparison,  the  handful  of  popular  NoSQL  database  vendors,   often  lumped  into  the  big  data  category  as  well,  and  similar  to  Hadoop  in  their  focus  on  unstructured  data,  have  announced  just  more  than  $90  million  in  funding  overall. via  (hKp://gigaom.com/cloud/with-­‐‑40m-­‐‑for-­‐‑cloudera-­‐‑how-­‐‑much-­‐‑is-­‐‑hadoop-­‐‑worth/)
  9. 9. Some  theory
  10. 10. architecture HBase Zookeeper m/r hdfs hadoop servers node node node
  11. 11. Related  projects: •  Chukwa o  Log analysis tool•  Hive o  Or, if Hive is slow:•  Pig o  High level data manipulation language o  Don’t write all MapReduce jobs by hand!
  12. 12. Brewer’s  CAP  theorem Availability HBase RDBMS Pick   2 Partition   Consistency Tolerance CouchDB
  13. 13. Data  organisation Rowkey  1 Rowkey  n+1 … … Rowkey  n … Region  1 Region  2
  14. 14. Data  organisation Region Column  family   Column  family   col1,  col2,  col3 col1,  col2 Column  family Column  family
  15. 15. Data  organisation ColumnKey Region column1 column2 column3 Timestamp v1@t1 v1@t1 v1@t1 v1@t2 v1@t2 v1@t3
  16. 16. Let’s  see  some  code?
  17. 17. Integration  testing? Start cluster locally ? Use a remote one
  18. 18. How  to  start  hacking? Grab hadoop http://hadoop.apache.org/and Hbase http://hbase.apache.org/Spend an eon learning more than you wanted aboutplumbing
  19. 19. How  to  start  hacking? Better (faster) way:Grab a VM/packages from
  20. 20. Pro  tip Don’t run HBase on or face problemsIt’s doable(http://hbase.apache.org/docs/r0.20.6/cygwin.html)but VMs are faster!
  21. 21. How  to  start  hacking? Situation will improve, since
  22. 22. modes Develop with•  local mode o  single instance, single JVMThen•  Pseudo-distributed o  multiple instances, single machineFor production•  Distributed mode o  many nodes
  23. 23. One  more Befriend some admins, you will need them
  24. 24. Use  cases?
  25. 25. Example  from  X •  Customer-provided user data•  Schema varying between customers o  kept in RDBMS,•  Data in HBase
  26. 26. Example  from  Facebook HBase drives Facebook messages•  Key: UserId•  Column: Word•  Version: MessageIdSee for more details(http://www.infoq.com/presentations/HBase-at-Facebook)
  27. 27. When  to  use  Hbase? •  Lots of key/value data•  Need good scalability•  Need good query times with random access•  Data analytics
  28. 28. What  is  HBase  poor  at? •  transactions•  relying on indexes•  security
  29. 29. T(h)ank  you!
  30. 30. Useful Brewer’s CAP theoremhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdfGoogle BigTablehttp://labs.google.com/papers/bigtable-osdi06.pdfDzone Refcardshttp://refcardz.dzone.com/refcardz/getting-started-apache-hadoophttp://refcardz.dzone.com/refcardz/deploying-hadoop

×