20090713 Hbase Schema Design Case Studies

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    15 Favorites

    20090713 Hbase Schema Design Case Studies - Presentation Transcript

    1. HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
    2. The Tao is ... De-normalization
    3. Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
    4. In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
    5. In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
    6. Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
    7. In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
    8. In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
    9. Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
    10. In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
    11. In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
    12. Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
    13. In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
    14. In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
    15. Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
    16. In RDBMS Accesslog time ip IDX domain url referer browser_cookie IDX login_id IDX
    17. In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.

    + hmistyhmisty, 4 months ago

    custom

    4948 views, 15 favs, 2 embeds more stats

    I collected and organized some cases about how to d more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 4948
      • 4945 on SlideShare
      • 3 from embeds
    • Comments 0
    • Favorites 15
    • Downloads 144
    Most viewed embeds
    • 2 views on http://www.mefeedia.com
    • 1 views on http://buildingtrust.pbworks.com

    more

    All embeds
    • 2 views on http://www.mefeedia.com
    • 1 views on http://buildingtrust.pbworks.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories