Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Let's Talk Operations! (Hadoop Summit 2014)

1,348 views

Published on

These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Let's Talk Operations! (Hadoop Summit 2014)

  1. 1. Let’s Talk Operations! Allen Wittenauer!
  2. 2. Twitter: @_a__w_ Email: aw @ apache.org!
  3. 3. How many individual grids should I have?
  4. 4. One big grid Grid per project • Pros! • Lower ops overhead! • One location for all data! • Cons ! • Dev and Prod on one system • Pros! • Capacity planning per project! • Cons ! • More headcount to maintain! • Multiple copies of data! • Data ingress is a mess
  5. 5. Data Center Production ETL Development
  6. 6. ETL Dev Prod Base ETL Pull Event Feeds Database Feeds Base ETL Pull Base ETL Pull Post-Processed Data
  7. 7. DC2DC1 Production ETL Development
  8. 8. How do I solve some common distcp issues?
  9. 9. • Common issues! • Version incompatibilities! • Network bandwidth consumption! ! • Some tricks! • Use WebHDFS! • All modern versions support it! • Read and write in both directions! • Create a separate queue with hard limits! • Pull from larger, push from smaller
  10. 10. Q&A Allen  Wittenauer   Twitter:  @_a__w_ Email:  aw  @  apache.org  
  11. 11. Bonus Slide!
  12. 12. 20 GB /, ... 200 GB task space (rest) HDFS • root partitioning ! ! ! ! ! • non-root partitioning 5 GB swap 200 GB task space (rest) HDFS

×