Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Gaurav Kohli                              XebiaBreaking with   DBMS andDating with                1
meGaurav Kohligaurav.in@gmail.comConsultantXebia IT Architects                      2
   Why are we here ?   Something about RDBMS   Limitations of RDBMS   Why Hbase or any NoSql solution   Overview of H...
Databases            4
Relational Databases have a lot of                        5
   Data Set going into PetaBytes   RDBMS dont scale inherently       Scale up/Scale out ( Load Balancing + Replication)...
MasterReplication              Slave         7
Master                                             Writes                                                           ReadsS...
Master                    MasterReplication                  Slave              9
10
11
   2006.11      Google releases paper on BigTable   2007.2      Initial HBase prototype created as Hadoop contrib.   ...
   Distributed       uses HDFS for storage   Column-Oriented   Multi-Dimensional       versions   High-Availability...
Hbase is     A Sql Database         No Joins, no query engine, no datatypes, no sql     No Schema     Denormalized dat...
   Bigness       Big data, big number of users, big number of computers   Massive write performance       Facebook nee...
Specific     Managing large streams of non-transactional data: Apache      logs, application logs, MySQL logs, etc.     ...
   Column-oriented database   Table are sorted by Row   Table schema only defines Column families       column family ...
18
19
Sorted Map(    RowKey, List(        SortedMap(          Column, List(             value, Timestamp          )        )    ...
 A BIG SORTED MAP     Row Key+ Column Key + timestamp => value                                Column family             ...
   Example of a Student and Subject      Student Table                     Subject Table      PK   id                    ...
RDBMS       Example of a Student and SubjectStudent table    key     name             age               sex    1       Ga...
Hbase   Student-Subject schema - HbaseStudent tableRow Key           Column family Column Keysstudent_id        info     ...
Hbase       Student-Subject schema - HbaseStudent tablekey               info                              subjects1     ...
Attribute     Possible Values         DefaultCOMPRESSION   NONE,GZ,LZO             NONEVERSIONS      1+                   ...
   Region: Contiguous set of lexicographically sorted    rows       hbase.hregion.max.filesize (default:256 Mb)   Regio...
Regions and     row1     row200     row201     row500     new row               28
Regions and     row1     row200     row201     row350     row 351     row 501               29
   Master   Zookeeper   RegionServers   HDFS   MapReduce                    30
31
– Java API, Thrift...            32
– Java API, Thrift...   Java   Thrift ( Ruby, Php, Python, Perl, C++... )   REST   Groovy DSL   MapReduce   Hbase Sh...
– Java API, Thrift...   Java       Get       Put       Delete       Scan       IncrementalColumnValue               ...
35
   Hbase v/s RDBMS       Not a replacement       Solves only a small subset(~5%)                              36
   Where Sql makes life easy       Joining       Secondary Indexing       Referential Integrity (updates)       ACID...
38
39
   Hbase Apache (http://hbase.apache.org/)   Hbase Wiki (wiki.apache.org/hadoop/Hbase)   Hbase blog (blog.hbase.org)  ...
Upcoming SlideShare
Loading in …5
×

Breaking with relational dbms and dating with hbase

6,173 views

Published on

Session on Hbase at IndicThread Conference on Java, Dec 2010 http://j10.indicthreads.com/

Published in: Technology

Breaking with relational dbms and dating with hbase

  1. 1. Gaurav Kohli XebiaBreaking with DBMS andDating with 1
  2. 2. meGaurav Kohligaurav.in@gmail.comConsultantXebia IT Architects 2
  3. 3.  Why are we here ? Something about RDBMS Limitations of RDBMS Why Hbase or any NoSql solution Overview of Hbase Specific Use cases Paradigm shift in Schema Design Architecture of Hbase Hbase Interface – Java API, Thrift Conclusion 3
  4. 4. Databases 4
  5. 5. Relational Databases have a lot of 5
  6. 6.  Data Set going into PetaBytes RDBMS dont scale inherently  Scale up/Scale out ( Load Balancing + Replication) Hard to shard / partition Both read / write throughput not possible  Transactional / Analytical databases Specialized Hardware …... is very expensive  Oracle clustering 6
  7. 7. MasterReplication Slave 7
  8. 8. Master Writes ReadsSlave nodes  MySQL master becomes a problem  All Slaves must have the same write capacity as master  Single point of failure, no easy failover 8
  9. 9. Master MasterReplication Slave 9
  10. 10. 10
  11. 11. 11
  12. 12.  2006.11  Google releases paper on BigTable 2007.2  Initial HBase prototype created as Hadoop contrib. 2007.10  First usable HBase 2008.1  Hadoop become Apache top-level project and HBase becomes subproject 2010.5~  Hbase becomes Apache top-level project 2010.6  Hbase 0.26.5 released. 2010.10 12  HBase 0.89.2010092 – third developer release
  13. 13.  Distributed  uses HDFS for storage Column-Oriented Multi-Dimensional  versions High-Availability High-Performance Storage System 13
  14. 14. Hbase is  A Sql Database  No Joins, no query engine, no datatypes, no sql  No Schema  Denormalized data  Wide and sparsely populated data structure(key- value)  No DBA needed 14
  15. 15.  Bigness  Big data, big number of users, big number of computers Massive write performance  Facebook needs 135 billion messages a month  Twitter stores 7 TB data per day Fast key-value access Write availability No Single point of failure 15
  16. 16. Specific  Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.  Real-time inserts, updates, and queries.  Fraud detection by comparing transactions to known patterns in real-time.  Analytics - Use MapReduce, Hive, or Pig to perform analytical queries 16
  17. 17.  Column-oriented database Table are sorted by Row Table schema only defines Column families  column family can have any number of columns Each cell value has a timestamp 17
  18. 18. 18
  19. 19. 19
  20. 20. Sorted Map( RowKey, List( SortedMap( Column, List( value, Timestamp ) ) ))SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp))) 20
  21. 21.  A BIG SORTED MAP  Row Key+ Column Key + timestamp => value Column family Student table Row Key Column Key Timestamp Value 1 info:name 1273516197868 Gaurav 1 info:age 1273871824184 28 Sorted by 2 VersionsRow key and 1 info:age 1273871823022 34 of this row column key 1 info:sex 1273746281432 Male 2 info:name 1273863723227 Harsh 3 Info:name 1273822456433 Raman Column Qualifier/Name Timestamp is a long value 21
  22. 22.  Example of a Student and Subject Student Table Subject Table PK id PK id m n name title age introduction sex teacher_id Student-Subject Table student_id subject_id type 22
  23. 23. RDBMS Example of a Student and SubjectStudent table key name age sex 1 Gaurav 28 MaleSubject table id title introduction teacher_id 1 Hbase Hbase is cool 10Student-Subject table student_id subject_id type 1 1 elective 23
  24. 24. Hbase Student-Subject schema - HbaseStudent tableRow Key Column family Column Keysstudent_id info name, age, sexstudent_id subjects Subject Ids as qualifier(key)Subject tableRow Key Column family Column Keyssubject_id info title, introduction, teacher_idsubject_id students Student ids as qualifier(key) 24
  25. 25. Hbase Student-Subject schema - HbaseStudent tablekey info subjects1 info:name=Gaurav subjects:1=”elective” info:age=28 subjects:2=”main” info:sex=MaleSubject table key info students 1 info:title=Hbase students:1 info:introduction=Hbase is cool students:2 info:teacher_id=10 25
  26. 26. Attribute Possible Values DefaultCOMPRESSION NONE,GZ,LZO NONEVERSIONS 1+ 3TTL 1-2147483647(seconds) 2147483647BLOCKSIZE 1 byte – 2 GB 64kIN_MEMORY true,false falseBLOCKCACHE true,false true 26
  27. 27.  Region: Contiguous set of lexicographically sorted rows  hbase.hregion.max.filesize (default:256 Mb) Region hosted by Region Servers Each Table is partitioned into Regions 27
  28. 28. Regions and row1 row200 row201 row500 new row 28
  29. 29. Regions and row1 row200 row201 row350 row 351 row 501 29
  30. 30.  Master Zookeeper RegionServers HDFS MapReduce 30
  31. 31. 31
  32. 32. – Java API, Thrift... 32
  33. 33. – Java API, Thrift... Java Thrift ( Ruby, Php, Python, Perl, C++... ) REST Groovy DSL MapReduce Hbase Shell 33
  34. 34. – Java API, Thrift... Java  Get  Put  Delete  Scan  IncrementalColumnValue 34
  35. 35. 35
  36. 36.  Hbase v/s RDBMS  Not a replacement  Solves only a small subset(~5%) 36
  37. 37.  Where Sql makes life easy  Joining  Secondary Indexing  Referential Integrity (updates)  ACID Where Hbase makes life easy  Dataset scale  Read/Write scale  Replication  Batch analysis 37
  38. 38. 38
  39. 39. 39
  40. 40.  Hbase Apache (http://hbase.apache.org/) Hbase Wiki (wiki.apache.org/hadoop/Hbase) Hbase blog (blog.hbase.org) Images from Google Search http://www.larsgeorge.com/2009/10/hbase- architecture-101-storage.html http://highscalability.com/blog/2010/12/6/what-the- heck-are-you-actually-using-nosql-for.html 40

×