Your SlideShare is downloading. ×
0
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

20090713 Hbase Schema Design Case Studies

41,066

Published on

I collected and organized some cases about how to design hbase table schema, in contrast to classical RDBMS. Any suggestions are welcomed!

I collected and organized some cases about how to design hbase table schema, in contrast to classical RDBMS. Any suggestions are welcomed!

Published in: Technology
8 Comments
79 Likes
Statistics
Notes
  • good
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 几种表的设计模式
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • good one!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Good
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Would also recommend putting a few points as to why you would use a reverse timestamp on recording user actions: In hbase scan is lexicographically incrementing, so if you wish to query recent actions, you need to store the data in an order fashion so that the scan operation is in reality descending (inverse timestamp).
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
41,066
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1,490
Comments
8
Likes
79
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
  • 2. The Tao is ... De-normalization
  • 3. Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
  • 4. In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
  • 5. In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
  • 6. Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
  • 7. In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
  • 8. In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
  • 9. Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
  • 10. In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
  • 11. In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
  • 12. Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
  • 13. In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
  • 14. In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
  • 15. Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
  • 16. In RDBMS Accesslog time ip IDX domain url referer browser_cookie IDX login_id IDX
  • 17. In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.

×