0
Let’s Talk Operations!
Allen Wittenauer!
Twitter: @_a__w_
Email: aw @ apache.org!
How many individual
grids should I have?
One big
grid
Grid per
project
• Pros!
• Lower ops overhead!
• One location for all data!
• Cons !
• Dev and Prod on one
sy...
Data Center
Production
ETL
Development
ETL
Dev Prod
Base ETL Pull
Event Feeds
Database Feeds
Base ETL Pull
Base ETL Pull
Post-Processed
Data
DC2DC1
Production
ETL
Development
How do I solve some
common distcp issues?
• Common issues!
• Version incompatibilities!
• Network bandwidth consumption!
!
• Some tricks!
• Use WebHDFS!
• All moder...
Q&A
Allen	
  Wittenauer	
  
Twitter:	
  @_a__w_
Email:	
  aw	
  @	
  apache.org	
  
Bonus Slide!
20 GB /, ... 200 GB task space (rest) HDFS
• root partitioning
!
!
!
!
!
• non-root partitioning
5 GB
swap
200 GB task spa...
Let's Talk Operations! (Hadoop Summit 2014)
Upcoming SlideShare
Loading in...5
×

Let's Talk Operations! (Hadoop Summit 2014)

476

Published on

These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
476
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Let's Talk Operations! (Hadoop Summit 2014)"

  1. 1. Let’s Talk Operations! Allen Wittenauer!
  2. 2. Twitter: @_a__w_ Email: aw @ apache.org!
  3. 3. How many individual grids should I have?
  4. 4. One big grid Grid per project • Pros! • Lower ops overhead! • One location for all data! • Cons ! • Dev and Prod on one system • Pros! • Capacity planning per project! • Cons ! • More headcount to maintain! • Multiple copies of data! • Data ingress is a mess
  5. 5. Data Center Production ETL Development
  6. 6. ETL Dev Prod Base ETL Pull Event Feeds Database Feeds Base ETL Pull Base ETL Pull Post-Processed Data
  7. 7. DC2DC1 Production ETL Development
  8. 8. How do I solve some common distcp issues?
  9. 9. • Common issues! • Version incompatibilities! • Network bandwidth consumption! ! • Some tricks! • Use WebHDFS! • All modern versions support it! • Read and write in both directions! • Create a separate queue with hard limits! • Pull from larger, push from smaller
  10. 10. Q&A Allen  Wittenauer   Twitter:  @_a__w_ Email:  aw  @  apache.org  
  11. 11. Bonus Slide!
  12. 12. 20 GB /, ... 200 GB task space (rest) HDFS • root partitioning ! ! ! ! ! • non-root partitioning 5 GB swap 200 GB task space (rest) HDFS
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×