Managing Virtual Sprawl
Upcoming SlideShare
Loading in...5

Managing Virtual Sprawl





Total Views
Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Managing Virtual Sprawl Managing Virtual Sprawl Presentation Transcript

  • Managing Virtual Sprawl (How to not let this happen to you) Jeremy Hitchcock,
  • Why care? What you have What you want What you got
  • Managing clouds like managing single systems increases "system" management by 10x-20x
  • Clouds Promise • Greater efficiency • Faster deploys/less management • Little/no capital costs and no step functions
  • Sprawl Eats Potential • Greater efficiency • Faster deploys/less management • Little/no capital costs and no step functions
  • Just not good, yet 15 years 3 years
  • Don’t just change broken light bulbs
  • Wait until it gets dark, then change them all
  • Let’s get started 1. Architectures 2. Pain points 3. Best practices 4. What do we get?
  • 1: Architectures • Architecture changes • Decoupling • Geography/load balancing • Disaster recovery
  • 2004
  • 2007-2008
  • Opera dynamic resource pricing model
  • Decoupling • Apps and infrastructure mirror each other • Years of coupled development • Hard to retrofit, easier to do from start
  • Decoupling Old: Web App DB New: Processing Dispatcher Storage
  • Decoupling is Hard • Logging/debugging • Common scratch • Images and provisioning • Configuration data (run/boot) • Job dispatch (async/sync)
  • Images and provisioning Add __ new front ends Publish New Code Even better is that is automatic
  • Configuration Data • Most config data is on each image • Instead, auto populate into source control • Config, image, controller re-architected
  • Job Dispatch (sync) Request for photo Read photo off disc Resize/reformat Log Return photo to user
  • Job Dispatch (async) 2 Read photo off disc 1 4 Log Request for photo 5 Return photo to user 3 Resize/reformat
  • Geography/load balancing • Data centers do not house eyeballs • Intra/inter-site load balancing • Names to numbers (users think names) • Between clouds/interoperability?
  • Disaster Recovery • Practice them • Failovers should be automatic • DNS (Quick DNS nit: use short TTLs) • Contingency plans
  • Case Study:
  • Case Study:
  • Case Study: ; QUESTION SECTION: ; IN A ;; ANSWER SECTION: 86400 IN A 86400 IN A
  • Case Study:
  • Case Study: ; QUESTION SECTION: ; IN A ;; ANSWER SECTION: 86400 IN A 86400 IN A GAH!
  • 2: Pain Points • Inventory • Delivery speed • Supply/demand • Configuration • Points of failure
  • “I can ping it but I don’t know where it is!”
  • Inventory • Does it matter? • Not an asset tag but provisioning scripts • Audit bills (operational costs)
  • Delivery Speed • May actually suffer (more pieces, not iron) • Be analytical about what can be slow • Limiting factor of what’s virtualized • Were you looking before?
  • Delivery Speed •Where is the testing from? •Is this load dependent? •Do users notice/care? Graph from Gomez •Does it matter? •Cost to make it faster? •Savings to make it slower?
  • Supply/demand • Capital investments versus operating costs • Big architecture changes to constant tuning • Sampling time
  • Configuration • Configuration in source control • Has to move to a centralized location • Patches, updates, revision images • Lot of hard work here (no return)
  • Points of Failure • It’s about risk • All in the name, DNS • 99.9% is different from 99.99% • Any page is better than nothing
  • 3: Best Practices • App rewrite • Controller (code, monitoring) • Configuration (chef, puppet, etc) • Dev/staging/production (Django/Rails) • Security • Monitoring and verification
  • Dev/Staging/Production • This stuff works, use it • Clouds make this possible • ONLY exception is load testing (big exception) • Nothing going to work out of the box
  • Security • No “behind the firewall” • Not an after thought, core feature • Something to test • Two hash encryption (private data) • Centralized management makes security easier (At least double or nothing)
  • Monitoring and Verification What your user sees What you monitor Are they the same? Test transactions
  • 4: What do we get? • More choice on availability • Less step functions (capacity, cost) • Reduce computational marginal cost
  • Final Remarks • Sprawl eats away from the promised good • Never truly decoupled, apps dictate arch • Management tools still lacking, more homegrown • Make it all automatic, not easy
  • Questions? Jeremy Hitchcock, offers a suite of The Dynect Platform provides DNS, email, domain registration the enterprise with external and virtual servers for the home managed DNS and traffic and small business user. management services.