ORC File Introduction

7,724 views
7,299 views

Published on

I present the Optimized Row Columnar (ORC) file format for Apache Hive.

Published in: Technology

ORC File Introduction

  1. 1. ORC FilesOwen O’Malleyowen@hortonworks.comDecember 2012© Hortonworks Inc. 2012 Page 1
  2. 2. Top Level Page 2 © Hortonworks Inc. 2012
  3. 3. File Structure Page 3 © Hortonworks Inc. 2012
  4. 4. Stripe Structure Page 4 © Hortonworks Inc. 2012
  5. 5. File Layout Page 5 © Hortonworks Inc. 2012
  6. 6. Integer Column Serialization Page 6 © Hortonworks Inc. 2012
  7. 7. String Column Serialization Page 7 © Hortonworks Inc. 2012
  8. 8. Compression Page 8 © Hortonworks Inc. 2012
  9. 9. Projection and Predicate Filtering Page 9 © Hortonworks Inc. 2012
  10. 10. Example File Sizes Page 10 © Hortonworks Inc. 2012
  11. 11. Final notes Page 11 © Hortonworks Inc. 2012
  12. 12. Comparison RC File Trevni ORC File Hive Type Model N N Y Separate complex columns N Y Y Splits found quickly N Y Y Default column group size 4MB 64MB* 250MB Files per a bucket 1 >1 1 Store min, max, sum, count N N Y Versioned metadata N Y Y Run length data encoding N N Y Store strings in dictionary N N Y Store row count N Y Y Skip compressed blocks N N Y Store internal indexes N N Y Page 12 © Hortonworks Inc. 2012

×