Your SlideShare is downloading. ×
0
Hive and Hadoop in Data-Infra<br />Yongqiang He<br />12/09/09<br />
Agenda<br />
RCFile (columnar storage on Hive)<br />Open source effort<br />Effect<br />Saves up to 30% storage spaces. <br />on averag...
Harness Sort/bucket property<br />Data is grouped, and sometimes sorted<br /><ul><li>But this property is not used right now
Why it is useful?
Okay, CPU and Memory</li></ul>Effect<br /><ul><li>An Optimization drop CPU costs to half
Group by operator in Hive (group by is used everywhere in fb)
Upcoming SlideShare
Loading in...5
×

Facebook Intern Presentation V0.1

508

Published on

Published in: Technology, Spiritual
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
508
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Facebook Intern Presentation V0.1"

  1. 1.
  2. 2. Hive and Hadoop in Data-Infra<br />Yongqiang He<br />12/09/09<br />
  3. 3. Agenda<br />
  4. 4. RCFile (columnar storage on Hive)<br />Open source effort<br />Effect<br />Saves up to 30% storage spaces. <br />on average &gt;20%<br />Reduces IO, CPU costs, memory<br />What others we can save?<br />An outside fb usage reports it is 8 times faster in reading<br />On deployment to Facebook Hadoop Hive cluster now<br />
  5. 5. Harness Sort/bucket property<br />Data is grouped, and sometimes sorted<br /><ul><li>But this property is not used right now
  6. 6. Why it is useful?
  7. 7. Okay, CPU and Memory</li></ul>Effect<br /><ul><li>An Optimization drop CPU costs to half
  8. 8. Group by operator in Hive (group by is used everywhere in fb)
  9. 9. Also ease memory
  10. 10. Deployed to Facebook Hadoop Hive cluster now</li></li></ul><li>Future work<br />Join Skew Optimization<br /><ul><li>One of the most common problems at fb</li></ul>Indexing<br /><ul><li>Why indexing is useful?
  11. 11. What’s our plan?</li></ul>Data sharing<br /><ul><li>Why this is useful?</li></ul>Piece 1<br />Break this and do it in multiple machines <br />in parallel<br />Piece 2<br />huge<br />Piece 3<br />Piece 4<br />Small<br />
  12. 12. Acknowledgement<br />Namit Jain<br />ZhengShao<br />JoydeepSenSarma<br />Ning Zhang <br />Prasad Chakka<br />DhrubaBorthakur<br />Suresh Antony <br />AshishThusoo<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×