20090713 Hbase Schema Design Case Studies

42,752
-1

Published on

I collected and organized some cases about how to design hbase table schema, in contrast to classical RDBMS. Any suggestions are welcomed!

Published in: Technology
9 Comments
80 Likes
Statistics
Notes
No Downloads
Views
Total Views
42,752
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1,533
Comments
9
Likes
80
Embeds 0
No embeds

No notes for slide

20090713 Hbase Schema Design Case Studies

  1. 1. HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
  2. 2. The Tao is ... De-normalization
  3. 3. Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
  4. 4. In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
  5. 5. In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
  6. 6. Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
  7. 7. In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
  8. 8. In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
  9. 9. Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
  10. 10. In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
  11. 11. In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
  12. 12. Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
  13. 13. In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
  14. 14. In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
  15. 15. Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
  16. 16. In RDBMS Accesslog time ip IDX domain url referer browser_cookie IDX login_id IDX
  17. 17. In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×