20090713 Hbase Schema Design Case Studies
Upcoming SlideShare
Loading in...5
×
 

20090713 Hbase Schema Design Case Studies

on

  • 46,559 views

I collected and organized some cases about how to design hbase table schema, in contrast to classical RDBMS. Any suggestions are welcomed!

I collected and organized some cases about how to design hbase table schema, in contrast to classical RDBMS. Any suggestions are welcomed!

Statistics

Views

Total Views
46,559
Views on SlideShare
40,990
Embed Views
5,569

Actions

Likes
72
Downloads
1,386
Comments
8

29 Embeds 5,569

http://nosql.mypopescu.com 4622
http://a.prokhorenko.us 375
http://www.slideshare.net 253
http://traackit.blogspot.sg 125
http://www.e-presentations.us 76
http://traackit.blogspot.com 42
http://traackit.blogspot.in 15
http://paper.li 9
http://webcache.googleusercontent.com 8
http://atalks.prokhorenko.us 6
http://traackit.blogspot.ru 6
http://web.archive.org 4
http://traackit.blogspot.kr 4
http://lj-toys.com 3
http://www.mefeedia.com 3
http://translate.googleusercontent.com 3
http://coral.corp.apple.com 2
http://l.lj-toys.com 2
http://a0.twimg.com 1
http://traackit.blogspot.mx 1
http://www.docshut.com 1
http://traackit.blogspot.co.il 1
http://buildingtrust.pbworks.com 1
http://traackit.blogspot.fr 1
http://trunk.ly 1
http://static.slidesharecdn.com 1
http://smallerror-java.blogspot.com 1
http://traackit.blogspot.ca 1
http://confluence.corp.apple.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • good
    Are you sure you want to
    Your message goes here
    Processing…
  • 几种表的设计模式
    Are you sure you want to
    Your message goes here
    Processing…
  • good one!
    Are you sure you want to
    Your message goes here
    Processing…
  • Good
    Are you sure you want to
    Your message goes here
    Processing…
  • Would also recommend putting a few points as to why you would use a reverse timestamp on recording user actions: In hbase scan is lexicographically incrementing, so if you wish to query recent actions, you need to store the data in an order fashion so that the scan operation is in reality descending (inverse timestamp).
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

20090713 Hbase Schema Design Case Studies 20090713 Hbase Schema Design Case Studies Presentation Transcript

  • HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
  • The Tao is ... De-normalization
  • Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
  • In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
  • In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
  • Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
  • In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
  • In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
  • Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
  • In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
  • In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
  • Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
  • In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
  • In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
  • Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
  • In RDBMS Accesslog time ip IDX domain url referer browser_cookie IDX login_id IDX
  • In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.