Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ORC File Introduction

10,338 views

Published on

I present the Optimized Row Columnar (ORC) file format for Apache Hive.

Published in: Technology

ORC File Introduction

  1. 1. ORC FilesOwen O’Malleyowen@hortonworks.comDecember 2012© Hortonworks Inc. 2012 Page 1
  2. 2. Top Level Page 2 © Hortonworks Inc. 2012
  3. 3. File Structure Page 3 © Hortonworks Inc. 2012
  4. 4. Stripe Structure Page 4 © Hortonworks Inc. 2012
  5. 5. File Layout Page 5 © Hortonworks Inc. 2012
  6. 6. Integer Column Serialization Page 6 © Hortonworks Inc. 2012
  7. 7. String Column Serialization Page 7 © Hortonworks Inc. 2012
  8. 8. Compression Page 8 © Hortonworks Inc. 2012
  9. 9. Projection and Predicate Filtering Page 9 © Hortonworks Inc. 2012
  10. 10. Example File Sizes Page 10 © Hortonworks Inc. 2012
  11. 11. Final notes Page 11 © Hortonworks Inc. 2012
  12. 12. Comparison RC File Trevni ORC File Hive Type Model N N Y Separate complex columns N Y Y Splits found quickly N Y Y Default column group size 4MB 64MB* 250MB Files per a bucket 1 >1 1 Store min, max, sum, count N N Y Versioned metadata N Y Y Run length data encoding N N Y Store strings in dictionary N N Y Store row count N Y Y Skip compressed blocks N N Y Store internal indexes N N Y Page 12 © Hortonworks Inc. 2012

×