Successfully reported this slideshow.
RCFile (columnar storage on Hive)<br />Yongqiang He<br />12/09/09<br />
Agenda<br />
Why Columnar Storages<br />Better Compression<br />Light weight compression<br />RLE<br />Bit-map<br />Etc<br />CPU, Memor...
Columnar Storages<br />Traditional Columnar Store<br />C-Store/Vertica<br />MonetDB<br />Etc<br />Columnar Store on Hadoop...
Category<br />Pure Columnar<br />MonetDB (in-memory, very fast)<br />Columnar Group (Projection)<br />C-Store/Vertica (Fle...
Row Construction<br />Why needed?<br />Columns data is stored separately, and may sorted in different order.<br />Need Joi...
Data Sort Property<br />Can data be sorted in anyway after loaded?<br />No. <br />Yes.<br />Good for row construction<br /...
RCFile<br />Storage Layout<br />4<br />5<br />4<br />MetaData<br />4<br />4<br />4<br />4<br />4<br />Compressed<br />    ...
Lazy decompress</li></li></ul><li>Acknowledgement<br />Namit Jain<br />ZhengShao<br />JoydeepSenSarma<br />Ning Zhang <br ...
Facebook Data Team Presentation(2009 12 11) V0.2
Upcoming SlideShare
Loading in …5
×

Facebook Data Team Presentation(2009 12 11) V0.2

1,813 views

Published on

Published in: Technology, Business
  • Be the first to comment

Facebook Data Team Presentation(2009 12 11) V0.2

  1. 1.
  2. 2. RCFile (columnar storage on Hive)<br />Yongqiang He<br />12/09/09<br />
  3. 3. Agenda<br />
  4. 4. Why Columnar Storages<br />Better Compression<br />Light weight compression<br />RLE<br />Bit-map<br />Etc<br />CPU, Memory, Storage<br />Columnar Operator <br />Cache conscious (MonetDB)<br />
  5. 5. Columnar Storages<br />Traditional Columnar Store<br />C-Store/Vertica<br />MonetDB<br />Etc<br />Columnar Store on Hadoop/Cloud<br />Zebra (Y!’s effort in Pig)<br />RCFile (Hive)<br />
  6. 6. Category<br />Pure Columnar<br />MonetDB (in-memory, very fast)<br />Columnar Group (Projection)<br />C-Store/Vertica (FlexStore?)<br />Zebra (Y!’s effort in Pig)<br />Row Columnar (PAX)<br />RCFile (Hive)<br />
  7. 7. Row Construction<br />Why needed?<br />Columns data is stored separately, and may sorted in different order.<br />Need Join<br />MonetDB (in-memory, very fast)<br />BAT<br />C-Store/Vertica (use more projections to avoid, ?)<br />Not need join (Join in Cloud is EXPENSIVE)<br />Zebra (Y!’s effort in Pig)<br />RCFile (Hive)<br />
  8. 8. Data Sort Property<br />Can data be sorted in anyway after loaded?<br />No. <br />Yes.<br />Good for row construction<br />Operating on sorted data. But need complex row construction.<br />
  9. 9. RCFile<br />Storage Layout<br />4<br />5<br />4<br />MetaData<br />4<br />4<br />4<br />4<br />4<br />Compressed<br /> …….<br />2<br />0<br />3<br />4<br />Built-in RLE<br />‘ABCD’, 1234, ‘haha’<br />‘DEFG’, 3456, ‘ha’<br />Column compressed<br />‘ABCD’<br />‘DEFG’<br />‘Hadoop’<br />‘Hadoop’, 01, ‘’<br />‘Hive’<br />1234<br />01<br />3456<br />01<br />Lazy decompress<br />‘Hive’, 01, ‘waa’<br />‘haha’<br />‘ha’<br />‘waa’<br /> …….<br />All data, no meta data<br />Work with Column Pruning<br /><ul><li>Only touch (read and decompress) needed columns
  10. 10. Lazy decompress</li></li></ul><li>Acknowledgement<br />Namit Jain<br />ZhengShao<br />JoydeepSenSarma<br />Ning Zhang <br />Prasad Chakka<br />DhrubaBorthakur<br />Suresh Antony <br />AshishThusoo<br />

×