44. 44
Yes, we’re hiring!
info@svds.com
THANK YOU
Stephen O’Sullivan
stephen@svds.com
@steveos
Demo code is here:
github.com/silicon-valley-data-
science/hadoopsummit-2015
Editor's Notes
Description
You have your Hadoop cluster, and you are ready to fill it up with data, but wait: Which format should you use to store your data? Should you store it in Plain Text, Sequence File, Avro, or Parquet? (And should you compress it?) This talk will take a closer look at some of the trade-offs, and will cover the How, Why, and When of choosing one format over another.
Do not support block compression
Once they are compressed they are not splittable anymore increasing read performance cost
Each data file contains the values for a set of rows
Within a data file, the values from each column are organized so that they are adjacent, enabling good compression values
No results query 1 (which is count no conditions). This is because stinger is has meta data about the amount of data in the table (only when it’s an internal table).