2. Why Parquet
Columnar storage format
Can store nested Data
Efficiency in file size and query performance
Nested field can be read independently of other fields
Many data processing understand avro format (Hive, Spark, Pig and
MapReduce,etc)
3. Data Model
boolean int32 int64 int96 float double binary fixed_len
_byte_ar
ray
UTF8 ENUM DECIMAL(pr
ecision,scale
DATE LIST MAP
4. Record parquet
message Nom {
(required,optional,repeated) type nom ;
required int32 age ;
required binary nom (UTF8)
}
{"name": "right", "type":
"string", "order":
"descending"}
Avro
Parquet
5. Parquet File Format
Header Block .. Block Footer
Magic number
Using parquet
Magic number
,Schema,Encoding
method,Block position,..
Columns
Pages
128MB