Let's Talk Operations! (Hadoop Summit 2014)

739 views
685 views

Published on

These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
739
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Let's Talk Operations! (Hadoop Summit 2014)

  1. 1. Let’s Talk Operations! Allen Wittenauer!
  2. 2. Twitter: @_a__w_ Email: aw @ apache.org!
  3. 3. How many individual grids should I have?
  4. 4. One big grid Grid per project • Pros! • Lower ops overhead! • One location for all data! • Cons ! • Dev and Prod on one system • Pros! • Capacity planning per project! • Cons ! • More headcount to maintain! • Multiple copies of data! • Data ingress is a mess
  5. 5. Data Center Production ETL Development
  6. 6. ETL Dev Prod Base ETL Pull Event Feeds Database Feeds Base ETL Pull Base ETL Pull Post-Processed Data
  7. 7. DC2DC1 Production ETL Development
  8. 8. How do I solve some common distcp issues?
  9. 9. • Common issues! • Version incompatibilities! • Network bandwidth consumption! ! • Some tricks! • Use WebHDFS! • All modern versions support it! • Read and write in both directions! • Create a separate queue with hard limits! • Pull from larger, push from smaller
  10. 10. Q&A Allen  Wittenauer   Twitter:  @_a__w_ Email:  aw  @  apache.org  
  11. 11. Bonus Slide!
  12. 12. 20 GB /, ... 200 GB task space (rest) HDFS • root partitioning ! ! ! ! ! • non-root partitioning 5 GB swap 200 GB task space (rest) HDFS

×