HBase schema design
    case studies

    Organized by Evan/Qingyan Liu
     qingyan123 (AT) gmail.com
              2009....
The Tao is ...




De-normalization
Case 1: locations
●
    China
    ●
        Beijing
    ●
        Shanghai
    ●
        Guangzhou
    ●
        Shandong
...
In RDBMS
loc_id PK   loc_name      parent_id   child_id
1           China                     2,3,4,5
2           Beijing ...
In HBase
row                      column families
           name:         parent:           child:
<loc_id>              ...
Case 2: student-course
●
    Student
    ●
        1 S ~ many C
●
    Course
    ●
        1 C ~ many S
In RDBMS


Students                Courses
id PK      SCs          id PK
name       student_id   title
sex        course_i...
In HBase
row                           column families
               info:               course:
<student_id>   info:name...
Case 3: user-action
●
    users performs actions now and then
    ●
        store every events
    ●
        query recent ...
In RDBMS
                      Actions
                      id PK
                      user_id IDX
                     ...
In HBase
row                                   column families
                              name:
<user><Long.MAX_VALUE -...
Case 4: user-friends
●
    1 user has 1+ friends
●
    will lookup all friends of a user
In RDBMS

        Users
                           Friendships
        id IDX
                           user_id IDX
     ...
In HBase

row                           column families
                 info:            friend:
<user_id>        info:na...
Case 5: access log
●
    each log line contains time, ip, domain, url,
    referer, browser_cookie, login_id, etc
●
    wi...
In RDBMS

Accesslog
time
ip IDX
domain
url
referer
browser_cookie IDX
login_id IDX
In HBase

row                                                  column families
                                          h...
Upcoming SlideShare
Loading in...5
×

20090713 Hbase Schema Design Case Studies

41,525

Published on

I collected and organized some cases about how to design hbase table schema, in contrast to classical RDBMS. Any suggestions are welcomed!

Published in: Technology
9 Comments
79 Likes
Statistics
Notes
No Downloads
Views
Total Views
41,525
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1,505
Comments
9
Likes
79
Embeds 0
No embeds

No notes for slide

20090713 Hbase Schema Design Case Studies

  1. 1. HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
  2. 2. The Tao is ... De-normalization
  3. 3. Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
  4. 4. In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
  5. 5. In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
  6. 6. Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
  7. 7. In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
  8. 8. In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
  9. 9. Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
  10. 10. In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
  11. 11. In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
  12. 12. Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
  13. 13. In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
  14. 14. In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
  15. 15. Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
  16. 16. In RDBMS Accesslog time ip IDX domain url referer browser_cookie IDX login_id IDX
  17. 17. In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×