Hive and Hadoop in Data-InfraYongqiang He12/09/09
Agenda
RCFile (columnar storage on Hive)Open source effortEffectSaves up to 30% storage spaces. on average >20%Reduces IO, CPU costs, memoryWhat others we can save?An outside fb usage reports it is 8 times faster in readingOn deployment to  Facebook Hadoop Hive cluster now
Harness Sort/bucket propertyData is grouped, and sometimes sortedBut this property is not used right now
Why it is useful?
Okay, CPU and MemoryEffectAn Optimization drop CPU costs to half
Group by operator in Hive (group by is used everywhere in fb)

Facebook Intern Presentation V0.1

  • 2.
    Hive and Hadoopin Data-InfraYongqiang He12/09/09
  • 3.
  • 4.
    RCFile (columnar storageon Hive)Open source effortEffectSaves up to 30% storage spaces. on average >20%Reduces IO, CPU costs, memoryWhat others we can save?An outside fb usage reports it is 8 times faster in readingOn deployment to Facebook Hadoop Hive cluster now
  • 5.
    Harness Sort/bucket propertyDatais grouped, and sometimes sortedBut this property is not used right now
  • 6.
    Why it isuseful?
  • 7.
    Okay, CPU andMemoryEffectAn Optimization drop CPU costs to half
  • 8.
    Group by operatorin Hive (group by is used everywhere in fb)