Facebook Data Team Presentation(2009 12 11) V0.2
Upcoming SlideShare
Loading in...5
×
 

Facebook Data Team Presentation(2009 12 11) V0.2

on

  • 2,164 views

 

Statistics

Views

Total Views
2,164
Views on SlideShare
2,149
Embed Views
15

Actions

Likes
1
Downloads
42
Comments
0

2 Embeds 15

http://www.slideshare.net 14
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Facebook Data Team Presentation(2009 12 11) V0.2 Facebook Data Team Presentation(2009 12 11) V0.2 Presentation Transcript

  • RCFile (columnar storage on Hive)
    Yongqiang He
    12/09/09
  • Agenda
  • Why Columnar Storages
    Better Compression
    Light weight compression
    RLE
    Bit-map
    Etc
    CPU, Memory, Storage
    Columnar Operator
    Cache conscious (MonetDB)
  • Columnar Storages
    Traditional Columnar Store
    C-Store/Vertica
    MonetDB
    Etc
    Columnar Store on Hadoop/Cloud
    Zebra (Y!’s effort in Pig)
    RCFile (Hive)
  • Category
    Pure Columnar
    MonetDB (in-memory, very fast)
    Columnar Group (Projection)
    C-Store/Vertica (FlexStore?)
    Zebra (Y!’s effort in Pig)
    Row Columnar (PAX)
    RCFile (Hive)
  • Row Construction
    Why needed?
    Columns data is stored separately, and may sorted in different order.
    Need Join
    MonetDB (in-memory, very fast)
    BAT
    C-Store/Vertica (use more projections to avoid, ?)
    Not need join (Join in Cloud is EXPENSIVE)
    Zebra (Y!’s effort in Pig)
    RCFile (Hive)
  • Data Sort Property
    Can data be sorted in anyway after loaded?
    No.
    Yes.
    Good for row construction
    Operating on sorted data. But need complex row construction.
  • RCFile
    Storage Layout
    4
    5
    4
    MetaData
    4
    4
    4
    4
    4
    Compressed
    …….
    2
    0
    3
    4
    Built-in RLE
    ‘ABCD’, 1234, ‘haha’
    ‘DEFG’, 3456, ‘ha’
    Column compressed
    ‘ABCD’
    ‘DEFG’
    ‘Hadoop’
    ‘Hadoop’, 01, ‘’
    ‘Hive’
    1234
    01
    3456
    01
    Lazy decompress
    ‘Hive’, 01, ‘waa’
    ‘haha’
    ‘ha’
    ‘waa’
    …….
    All data, no meta data
    Work with Column Pruning
    • Only touch (read and decompress) needed columns
    • Lazy decompress
  • Acknowledgement
    Namit Jain
    ZhengShao
    JoydeepSenSarma
    Ning Zhang
    Prasad Chakka
    DhrubaBorthakur
    Suresh Antony
    AshishThusoo