Facebook Intern Presentation V0.1

650 views

Published on

Published in: Technology, Spiritual
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
650
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Facebook Intern Presentation V0.1

  1. 1.
  2. 2. Hive and Hadoop in Data-Infra<br />Yongqiang He<br />12/09/09<br />
  3. 3. Agenda<br />
  4. 4. RCFile (columnar storage on Hive)<br />Open source effort<br />Effect<br />Saves up to 30% storage spaces. <br />on average &gt;20%<br />Reduces IO, CPU costs, memory<br />What others we can save?<br />An outside fb usage reports it is 8 times faster in reading<br />On deployment to Facebook Hadoop Hive cluster now<br />
  5. 5. Harness Sort/bucket property<br />Data is grouped, and sometimes sorted<br /><ul><li>But this property is not used right now
  6. 6. Why it is useful?
  7. 7. Okay, CPU and Memory</li></ul>Effect<br /><ul><li>An Optimization drop CPU costs to half
  8. 8. Group by operator in Hive (group by is used everywhere in fb)
  9. 9. Also ease memory
  10. 10. Deployed to Facebook Hadoop Hive cluster now</li></li></ul><li>Future work<br />Join Skew Optimization<br /><ul><li>One of the most common problems at fb</li></ul>Indexing<br /><ul><li>Why indexing is useful?
  11. 11. What’s our plan?</li></ul>Data sharing<br /><ul><li>Why this is useful?</li></ul>Piece 1<br />Break this and do it in multiple machines <br />in parallel<br />Piece 2<br />huge<br />Piece 3<br />Piece 4<br />Small<br />
  12. 12. Acknowledgement<br />Namit Jain<br />ZhengShao<br />JoydeepSenSarma<br />Ning Zhang <br />Prasad Chakka<br />DhrubaBorthakur<br />Suresh Antony <br />AshishThusoo<br />

×