0
Gaurav Kohli Xebia Breaking with  DBMS and Dating with  Relational  Hbase
me  Gaurav Kohli [email_address] Consultant Xebia IT Architects About
<ul><li>Why are we here ?
Something about RDBMS
Limitations of RDBMS
Why Hbase or any NoSql solution
Overview of Hbase
Specific Use cases
Paradigm shift in Schema Design
Architecture of Hbase
Hbase Interface – Java API, Thrift
Conclusion </li></ul>Agenda
Databases  Relational
Relational Databases have a lot of   limitations
<ul><li>Data Set going into PetaBytes
RDBMS don't scale inherently </li><ul><li>Scale up/Scale out ( Load Balancing + Replication) </li></ul><li>Hard to shard /...
Both read / write throughput not possible </li><ul><li>Transactional / Analytical databases </li></ul><li>Specialized Hard...
Master Slave Replication Replication Master Slave Scaling Out
<ul><li>MySQL master becomes a problem
All Slaves must have the same write capacity as master
Single point of failure, no easy failover </li></ul>Master Reads Writes Slave nodes Master - Many Slave Scaling Out
Master Master Slave Replication Dual Master
NoSQL
 
<ul><li>2006.11 </li><ul><li>Google releases paper on BigTable </li></ul><li>2007.2 </li><ul><li>Initial HBase prototype c...
<ul><li>Distributed </li><ul><li>uses HDFS for storage </li></ul><li>Column-Oriented
Multi-Dimensional </li><ul><li>versions </li></ul><li>High-Availability
High-Performance
Storage System </li></ul>Hbase
<ul><li>A Sql Database </li><ul><li>No Joins, no query engine, no datatypes, no sql </li></ul><li>No Schema
Denormalized data
Wide and sparsely populated data structure(key-value)
No DBA needed  </li></ul>Hbase is  Not
<ul><li>Bigness  </li><ul><li>Big data, big number of users, big number of computers </li></ul><li>Massive write performan...
Twitter stores 7 TB data per day  </li></ul><li>Fast key-value access
Write availability
No Single point of failure </li></ul>Use Case
<ul><li>Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.
Real-time inserts, updates, and queries.
Fraud detection by comparing transactions to known patterns in real-time.
Upcoming SlideShare
Loading in...5
×

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Conference On Java, Pune, India]

4,629

Published on

Session Presented at 5th IndicThreads.com Conference On Java held on 10-11 December 2010 in Pune, India
WEB: http://J10.IndicThreads.com

------------
Hbase is an open-source, non-relational, distributed, sparse, column-oriented data-store modeled after Google’s BigTable and is written in Java.
In this presentation we will talk about how to migrate a RDBMS based Java application to Hbase based application. We will have a discussion on following points:
• Hbase schema design (a paradigm shift from the way we think about data-storage right now) compared to RDBMS based schema design.
• The challenges faced while porting the application with HBase.
• Introduction to HBql to query the data from Hbase.
• Monitoring example application for Hbase (JMX APIs exposed) and Machine’s performance with Gangila.
• Discussion on Thrift interface and how can we used Rest interface to integrate hbase with non java based applications.
• Cluster replication and what is coming in the next major 0.90 release of Hbase.
• We will end up the session, with the demo of ported application.
Takeaways for the Audience 1. When is Hbase appropriate and when not?
2. Hbase architecture and schema design
3. RDBMS vs Hbase
4. Interfacing Hbase with applications using Thrift or REST
5. Hbase cluster and Replication
6. Hbase monitoring

Published in: Technology, Travel
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,629
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
287
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide
  • Every row has a row key Rows are stored sorted by row key A table may have 1 or more column families Common to have small number of column families They should rarely change Column family can have no. of columns Each row has a timestamp Each cell can have multiple versions
  • Transcript of "Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Conference On Java, Pune, India]"

    1. 1. Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase
    2. 2. me Gaurav Kohli [email_address] Consultant Xebia IT Architects About
    3. 3. <ul><li>Why are we here ?
    4. 4. Something about RDBMS
    5. 5. Limitations of RDBMS
    6. 6. Why Hbase or any NoSql solution
    7. 7. Overview of Hbase
    8. 8. Specific Use cases
    9. 9. Paradigm shift in Schema Design
    10. 10. Architecture of Hbase
    11. 11. Hbase Interface – Java API, Thrift
    12. 12. Conclusion </li></ul>Agenda
    13. 13. Databases Relational
    14. 14. Relational Databases have a lot of limitations
    15. 15. <ul><li>Data Set going into PetaBytes
    16. 16. RDBMS don't scale inherently </li><ul><li>Scale up/Scale out ( Load Balancing + Replication) </li></ul><li>Hard to shard / partition
    17. 17. Both read / write throughput not possible </li><ul><li>Transactional / Analytical databases </li></ul><li>Specialized Hardware …... is very expensive </li><ul><li>Oracle clustering </li></ul></ul>Limitations
    18. 18. Master Slave Replication Replication Master Slave Scaling Out
    19. 19. <ul><li>MySQL master becomes a problem
    20. 20. All Slaves must have the same write capacity as master
    21. 21. Single point of failure, no easy failover </li></ul>Master Reads Writes Slave nodes Master - Many Slave Scaling Out
    22. 22. Master Master Slave Replication Dual Master
    23. 23. NoSQL
    24. 25. <ul><li>2006.11 </li><ul><li>Google releases paper on BigTable </li></ul><li>2007.2 </li><ul><li>Initial HBase prototype created as Hadoop contrib. </li></ul><li>2007.10 </li><ul><li>First usable HBase </li></ul><li>2008.1 </li><ul><li>Hadoop become Apache top-level project and HBase becomes subproject </li></ul><li>2010.5~ </li><ul><li>Hbase becomes Apache top-level project </li></ul><li>2010.6 </li><ul><li>Hbase 0.26.5 released. </li></ul><li>2010.10 </li><ul><li>HBase 0.89.2010092 – third developer release </li></ul></ul>Background
    25. 26. <ul><li>Distributed </li><ul><li>uses HDFS for storage </li></ul><li>Column-Oriented
    26. 27. Multi-Dimensional </li><ul><li>versions </li></ul><li>High-Availability
    27. 28. High-Performance
    28. 29. Storage System </li></ul>Hbase
    29. 30. <ul><li>A Sql Database </li><ul><li>No Joins, no query engine, no datatypes, no sql </li></ul><li>No Schema
    30. 31. Denormalized data
    31. 32. Wide and sparsely populated data structure(key-value)
    32. 33. No DBA needed </li></ul>Hbase is Not
    33. 34. <ul><li>Bigness </li><ul><li>Big data, big number of users, big number of computers </li></ul><li>Massive write performance </li><ul><li>Facebook needs 135 billion messages a month
    34. 35. Twitter stores 7 TB data per day </li></ul><li>Fast key-value access
    35. 36. Write availability
    36. 37. No Single point of failure </li></ul>Use Case
    37. 38. <ul><li>Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.
    38. 39. Real-time inserts, updates, and queries.
    39. 40. Fraud detection by comparing transactions to known patterns in real-time.
    40. 41. Analytics - Use MapReduce, Hive, or Pig to perform analytical queries </li></ul>Specific Use Case
    41. 42. <ul><li>Column-oriented database
    42. 43. Table are sorted by Row
    43. 44. Table schema only defines Column families </li><ul><li>column family can have any number of columns </li></ul><li>Each cell value has a timestamp </li></ul>Storage Model
    44. 45. Storage Model
    45. 46. Storage Model
    46. 47. <ul>Sorted Map( <ul>RowKey, List( <ul>SortedMap( <ul>Column, List( <ul>value, Timestamp </ul>) </ul>) </ul>) </ul>) SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp))) </ul>Storage Model
    47. 48. <ul><li>A BIG SORTED MAP
    48. 49. Row Key+ Column Key + timestamp => value </li></ul>Row Key Column Key Timestamp Value 1 info:name 1273516197868 Gaurav 1 info:age 1273871824184 28 1 info:age 1273871823022 34 1 info:sex 1273746281432 Male 2 info:name 1273863723227 Harsh 3 Info:name 1273822456433 Raman 2 Versions of this row Timestamp is a long value Column Qualifier/Name Sorted by Row key and column key Column family Student table Schema Design
    49. 50. <ul><li>Example of a Student and Subject </li></ul>Student Table PK id name age sex <ul><li>Example of a Student and Subject </li></ul>Subject Table PK id title introduction teacher_id Student-Subject Table student_id subject_id type m n Schema Design
    50. 51. <ul><li>Example of a Student and Subject </li></ul>RDBMS key name age sex 1 Gaurav 28 Male id title introduction teacher_id 1 Hbase Hbase is cool 10 Student table Subject table student_id subject_id type 1 1 elective Student-Subject table Schema Design Three tables
    51. 52. Hbase <ul><li>Student-Subject schema - Hbase </li></ul>Row Key Column family Column Keys student_id info name, age, sex student_id subjects Subject Id's as qualifier(key) Row Key Column family Column Keys subject_id info title, introduction, teacher_id subject_id students Student id's as qualifier(key) Student table Subject table Schema Design Only two table
    52. 53. Hbase key info subjects 1 info:name=Gaurav info:age=28 info:sex=Male subjects:1=”elective” subjects:2=”main” key info students 1 info:title=Hbase info:introduction=Hbase is cool info:teacher_id=10 students:1 students:2 <ul><li>Student-Subject schema - Hbase </li></ul>Student table Subject table Schema Design Only two table
    53. 54. Attribute Possible Values Default COMPRESSION NONE,GZ,LZO NONE VERSIONS 1+ 3 TTL 1-2147483647(seconds) 2147483647 BLOCKSIZE 1 byte – 2 GB 64k IN_MEMORY true,false false BLOCKCACHE true,false true Column families attributes
    54. 55. <ul><li>Region: Contiguous set of lexicographically sorted rows </li><ul><li>hbase.hregion.max.filesize (default:256 Mb) </li></ul><li>Region hosted by Region Servers
    55. 56. Each Table is partitioned into Regions </li></ul>Regions
    56. 57. Regions and row200 row201 row500 row1 new row Splitting
    57. 58. Regions and row200 row201 row350 row1 row 351 row 501 Splitting
    58. 59. <ul><li>Master
    59. 60. Zookeeper
    60. 61. RegionServers
    61. 62. HDFS
    62. 63. MapReduce </li></ul>Architecture
    63. 64. Architecture
    64. 65. – Java API, Thrift... Tools
    65. 66. – Java API, Thrift... <ul><li>Java
    66. 67. Thrift ( Ruby, Php, Python, Perl, C++... )
    67. 68. REST
    68. 69. Groovy DSL
    69. 70. MapReduce
    70. 71. Hbase Shell </li></ul>Tools
    71. 72. – Java API, Thrift... <ul><li>Java </li><ul><li>Get
    72. 73. Put
    73. 74. Delete
    74. 75. Scan
    75. 76. IncrementalColumnValue </li></ul></ul>Tools
    76. 78. <ul><li>Hbase v/s RDBMS </li><ul><li>Not a replacement
    77. 79. Solves only a small subset(~5%) </li></ul></ul>Conclusion
    78. 80. <ul><li>Where Sql makes life easy </li><ul><li>Joining
    79. 81. Secondary Indexing
    80. 82. Referential Integrity (updates)
    81. 83. ACID </li></ul><li>Where Hbase makes life easy </li><ul><li>Dataset scale
    82. 84. Read/Write scale
    83. 85. Replication
    84. 86. Batch analysis </li></ul></ul>Conclusion
    85. 89. <ul><li>Hbase Apache (http://hbase.apache.org/)
    86. 90. Hbase Wiki (wiki.apache.org/hadoop/Hbase)
    87. 91. Hbase blog (blog.hbase.org)
    88. 92. Images from Google Search
    89. 93. http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
    90. 94. http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html </li></ul>References & Credit
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×