Your SlideShare is downloading. ×
Introduction to HBase
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Introduction to HBase

1,375
views

Published on


0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,375
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
59
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to HBaseByeongweon Moon / REDDUCKbyeongweon.moon@reddduck.com
  • 2. HBase Key Point Clustered, commodity(-ish) hardware Mostly schema-less Dynamic distribution Spread writes out over the cluster
  • 3. HBase Distributed database modeled on Bigtable  Bigtable : A Distributed Storage System for Structured Data by Chang et al. Runs on top of Hadoop Core Layers on HDFS for storage Native connections to MapReduce Distributed, High Availability, High Performance, Strong Consistency
  • 4. HBase (cont.) Column-oriented store  Wide table costs only the data stored  NULLs in row are ‘free’  Good compression: columns of similar type  Column name is arbitrary Rows stored in sorted order Can random read and write Goal of billions of rows X millions of cells  Petabytes of data across thousands of servers
  • 5. Column Oriented Storage
  • 6. !HBase “NoSQL” Database  No joins  No sophisticated query engine  No transactions (sort of)  No column typing  No SQL, no ODBC/JDBC, etc. Not a replacement for RDBMS Matching Impedance
  • 7. Why HBase? Datasets are reaching Petabytes Traditional databases are expensive to scale and difficult to distribute Commodity hardware is cheap and powerful Need for random access and batch processing (which Hadoop does not offer)
  • 8. Tables Table is split into roughly equal sized “regions” Each region is a contiguous range of keys Regions split as they grow, thus dynamically adjusting to your data set
  • 9. Table (cont.) Tables are sorted by Row Table schema defines column families  Families consist of any number of columns  Columns consist of any number of versions  Everything except table name is byte[](Table, Row, Family:Column, Timestamp) -> Value
  • 10. Table (cont.) As a data structrue SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • 11. HBase Open Source Stack ZooKeeper : Small Data Coordination Service HBase : Database Storage Engine HDFS : Distributed File system Hadoop : Asynchrous Map-Reduce Jobs
  • 12. Server Architecture Similar to HDFS  Master == Namenode  Regionserver == Datanode Often run these alongside each other! Difference: HBase stores state in HDFS HDFS provides robust data storage across machines, insulating against failure Master and Regionserver fairly stateless and machine independent
  • 13. Region Assignment Each region from every table is assigned to a Regionserver Master Duties:  Responsible for assignment and handling regionserver problems (if any!)  When machines fail, move regions  When regions split, move regions to balance  Could move regions to respond to load  Can run multiple backup masters
  • 14. Master The master does NOT  Handle any write request (not a DB master!)  Handle location finding requests  Not involved in the read/write path  Generally does very little most of the time
  • 15. Distributed Coordination Zookeeper is used to manage master election and server availability Set up as a cluster, provides distributed coordination primitives An excellent tool for building cluster management systems
  • 16. HBase Architecturehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  • 17. How data actually stored
  • 18. Write-ahead-Loghttp://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
  • 19. HLog
  • 20. Demo
  • 21. HBase - Roadmap HBase 0.92.0  Coprocessors  Distributed Log Splitting  Running Tasks in UI  Performance Improvements HBase 0.94.0  Security  Secondary Indexes  Search Integration  HFile v2
  • 22. Reference http://ofps.oreilly.com/titles/9781449396107/ index.html http://hbase.apache.org/book.html#quicksta rt http://www.larsgeorge.com/2010/02/fosdem- 2010-nosql-talk.html