Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Conference On Java, Pune, India]
Upcoming SlideShare
Loading in...5
×
 

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Conference On Java, Pune, India]

on

  • 4,833 views

Session Presented at 5th IndicThreads.com Conference On Java held on 10-11 December 2010 in Pune, India ...

Session Presented at 5th IndicThreads.com Conference On Java held on 10-11 December 2010 in Pune, India
WEB: http://J10.IndicThreads.com

------------
Hbase is an open-source, non-relational, distributed, sparse, column-oriented data-store modeled after Google’s BigTable and is written in Java.
In this presentation we will talk about how to migrate a RDBMS based Java application to Hbase based application. We will have a discussion on following points:
• Hbase schema design (a paradigm shift from the way we think about data-storage right now) compared to RDBMS based schema design.
• The challenges faced while porting the application with HBase.
• Introduction to HBql to query the data from Hbase.
• Monitoring example application for Hbase (JMX APIs exposed) and Machine’s performance with Gangila.
• Discussion on Thrift interface and how can we used Rest interface to integrate hbase with non java based applications.
• Cluster replication and what is coming in the next major 0.90 release of Hbase.
• We will end up the session, with the demo of ported application.
Takeaways for the Audience 1. When is Hbase appropriate and when not?
2. Hbase architecture and schema design
3. RDBMS vs Hbase
4. Interfacing Hbase with applications using Thrift or REST
5. Hbase cluster and Replication
6. Hbase monitoring

Statistics

Views

Total Views
4,833
Views on SlideShare
4,799
Embed Views
34

Actions

Likes
10
Downloads
277
Comments
0

2 Embeds 34

http://j10.indicthreads.com 33
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-NoDerivs LicenseCC Attribution-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Every row has a row key Rows are stored sorted by row key A table may have 1 or more column families Common to have small number of column families They should rarely change Column family can have no. of columns Each row has a timestamp Each cell can have multiple versions

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Conference On Java, Pune, India] Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Conference On Java, Pune, India] Presentation Transcript

  • Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase
  • me Gaurav Kohli [email_address] Consultant Xebia IT Architects About
    • Why are we here ?
    • Something about RDBMS
    • Limitations of RDBMS
    • Why Hbase or any NoSql solution
    • Overview of Hbase
    • Specific Use cases
    • Paradigm shift in Schema Design
    • Architecture of Hbase
    • Hbase Interface – Java API, Thrift
    • Conclusion
    Agenda
  • Databases Relational
  • Relational Databases have a lot of limitations
    • Data Set going into PetaBytes
    • RDBMS don't scale inherently
      • Scale up/Scale out ( Load Balancing + Replication)
    • Hard to shard / partition
    • Both read / write throughput not possible
      • Transactional / Analytical databases
    • Specialized Hardware …... is very expensive
      • Oracle clustering
    Limitations
  • Master Slave Replication Replication Master Slave Scaling Out
    • MySQL master becomes a problem
    • All Slaves must have the same write capacity as master
    • Single point of failure, no easy failover
    Master Reads Writes Slave nodes Master - Many Slave Scaling Out
  • Master Master Slave Replication Dual Master
  • NoSQL
  •  
    • 2006.11
      • Google releases paper on BigTable
    • 2007.2
      • Initial HBase prototype created as Hadoop contrib.
    • 2007.10
      • First usable HBase
    • 2008.1
      • Hadoop become Apache top-level project and HBase becomes subproject
    • 2010.5~
      • Hbase becomes Apache top-level project
    • 2010.6
      • Hbase 0.26.5 released.
    • 2010.10
      • HBase 0.89.2010092 – third developer release
    Background
    • Distributed
      • uses HDFS for storage
    • Column-Oriented
    • Multi-Dimensional
      • versions
    • High-Availability
    • High-Performance
    • Storage System
    Hbase
    • A Sql Database
      • No Joins, no query engine, no datatypes, no sql
    • No Schema
    • Denormalized data
    • Wide and sparsely populated data structure(key-value)
    • No DBA needed
    Hbase is Not
    • Bigness
      • Big data, big number of users, big number of computers
    • Massive write performance
      • Facebook needs 135 billion messages a month
      • Twitter stores 7 TB data per day
    • Fast key-value access
    • Write availability
    • No Single point of failure
    Use Case
    • Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.
    • Real-time inserts, updates, and queries.
    • Fraud detection by comparing transactions to known patterns in real-time.
    • Analytics - Use MapReduce, Hive, or Pig to perform analytical queries
    Specific Use Case
    • Column-oriented database
    • Table are sorted by Row
    • Table schema only defines Column families
      • column family can have any number of columns
    • Each cell value has a timestamp
    Storage Model
  • Storage Model
  • Storage Model
    • Sorted Map(
        RowKey, List(
          SortedMap(
            Column, List(
              value, Timestamp
            )
          )
        )
      ) SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))
    Storage Model
    • A BIG SORTED MAP
    • Row Key+ Column Key + timestamp => value
    Row Key Column Key Timestamp Value 1 info:name 1273516197868 Gaurav 1 info:age 1273871824184 28 1 info:age 1273871823022 34 1 info:sex 1273746281432 Male 2 info:name 1273863723227 Harsh 3 Info:name 1273822456433 Raman 2 Versions of this row Timestamp is a long value Column Qualifier/Name Sorted by Row key and column key Column family Student table Schema Design
    • Example of a Student and Subject
    Student Table PK id name age sex
    • Example of a Student and Subject
    Subject Table PK id title introduction teacher_id Student-Subject Table student_id subject_id type m n Schema Design
    • Example of a Student and Subject
    RDBMS key name age sex 1 Gaurav 28 Male id title introduction teacher_id 1 Hbase Hbase is cool 10 Student table Subject table student_id subject_id type 1 1 elective Student-Subject table Schema Design Three tables
  • Hbase
    • Student-Subject schema - Hbase
    Row Key Column family Column Keys student_id info name, age, sex student_id subjects Subject Id's as qualifier(key) Row Key Column family Column Keys subject_id info title, introduction, teacher_id subject_id students Student id's as qualifier(key) Student table Subject table Schema Design Only two table
  • Hbase key info subjects 1 info:name=Gaurav info:age=28 info:sex=Male subjects:1=”elective” subjects:2=”main” key info students 1 info:title=Hbase info:introduction=Hbase is cool info:teacher_id=10 students:1 students:2
    • Student-Subject schema - Hbase
    Student table Subject table Schema Design Only two table
  • Attribute Possible Values Default COMPRESSION NONE,GZ,LZO NONE VERSIONS 1+ 3 TTL 1-2147483647(seconds) 2147483647 BLOCKSIZE 1 byte – 2 GB 64k IN_MEMORY true,false false BLOCKCACHE true,false true Column families attributes
    • Region: Contiguous set of lexicographically sorted rows
      • hbase.hregion.max.filesize (default:256 Mb)
    • Region hosted by Region Servers
    • Each Table is partitioned into Regions
    Regions
  • Regions and row200 row201 row500 row1 new row Splitting
  • Regions and row200 row201 row350 row1 row 351 row 501 Splitting
    • Master
    • Zookeeper
    • RegionServers
    • HDFS
    • MapReduce
    Architecture
  • Architecture
  • – Java API, Thrift... Tools
  • – Java API, Thrift...
    • Java
    • Thrift ( Ruby, Php, Python, Perl, C++... )
    • REST
    • Groovy DSL
    • MapReduce
    • Hbase Shell
    Tools
  • – Java API, Thrift...
    • Java
      • Get
      • Put
      • Delete
      • Scan
      • IncrementalColumnValue
    Tools
  •  
    • Hbase v/s RDBMS
      • Not a replacement
      • Solves only a small subset(~5%)
    Conclusion
    • Where Sql makes life easy
      • Joining
      • Secondary Indexing
      • Referential Integrity (updates)
      • ACID
    • Where Hbase makes life easy
      • Dataset scale
      • Read/Write scale
      • Replication
      • Batch analysis
    Conclusion
  •  
  •  
    • Hbase Apache (http://hbase.apache.org/)
    • Hbase Wiki (wiki.apache.org/hadoop/Hbase)
    • Hbase blog (blog.hbase.org)
    • Images from Google Search
    • http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
    • http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
    References & Credit