Managing Virtual Sprawl
         (How to not let this happen to you)




Jeremy Hitchcock, jeremy@dyn.com
Why care?


What you have


                 What you want   What you got
Managing clouds like managing single
   systems increases "system"
     management by 10x-20x
Clouds Promise

• Greater efficiency
• Faster deploys/less management
• Little/no capital costs and no step functions
Sprawl Eats Potential

• Greater efficiency
• Faster deploys/less management
• Little/no capital costs and no step functions
Just not good, yet




15 years        3 years
Don’t just change
broken light bulbs
Wait until it gets dark,
then change them all
Let’s get started

1. Architectures
2. Pain points
3. Best practices
4. What do we get?
1: Architectures

• Architecture changes
• Decoupling
• Geography/load balancing
• Disaster recovery
2004
2007-2008
Opera dynamic resource pricing model
Decoupling

• Apps and infrastructure mirror each other
• Years of coupled development
• Hard to retrofit, easier to do fro...
Decoupling
Old:
       Web          App        DB



New:
                          Processing

       Dispatcher
        ...
Decoupling is Hard

• Logging/debugging
• Common scratch
• Images and provisioning
• Configuration data (run/boot)
• Job di...
Images and provisioning
     Add __ new front ends

       Publish New Code



   Even better is that is automatic
Configuration Data

• Most config data is on each image
• Instead, auto populate into source control
• Config, image, control...
Job Dispatch (sync)
     Request for photo

     Read photo off disc

      Resize/reformat

            Log

    Return p...
Job Dispatch (async)
                           2 Read photo off disc

1                          4         Log
     Reque...
Geography/load balancing

 • Data centers do not house eyeballs
 • Intra/inter-site load balancing
 • Names to numbers (us...
Disaster Recovery

• Practice them
• Failovers should be automatic
• DNS (Quick DNS nit: use short TTLs)
• Contingency pla...
Case Study: Authorize.net
Case Study: Authorize.net
Case Study: Authorize.net
; QUESTION SECTION:
;secure.authorize.net.
 IN
                       
  A

;; ANSWER SECTION:
s...
Case Study: Authorize.net
Case Study: Authorize.net
; QUESTION SECTION:
;secure.authorize.net.
 IN
                       
  A

;; ANSWER SECTION:
s...
2: Pain Points

• Inventory
• Delivery speed
• Supply/demand
• Configuration
• Points of failure
“I can ping it but I don’t know where it is!”
Inventory

• Does it matter?
• Not an asset tag but provisioning scripts
• Audit bills (operational costs)
Delivery Speed

• May actually suffer (more pieces, not iron)
• Be analytical about what can be slow
• Limiting factor of ...
Delivery Speed
                   •Where is the testing from?
                   •Is this load dependent?
                ...
Supply/demand
• Capital investments versus operating costs
• Big architecture changes to constant tuning
• Sampling time
Configuration

• Configuration in source control
• Has to move to a centralized location
• Patches, updates, revision images...
Points of Failure

• It’s about risk
• All in the name, DNS
• 99.9% is different from 99.99%
• Any page is better than not...
3: Best Practices
• App rewrite
• Controller (code, monitoring)
• Configuration (chef, puppet, etc)
• Dev/staging/productio...
Dev/Staging/Production

• This stuff works, use it
• Clouds make this possible
• ONLY exception is load testing (big excep...
Security
• No “behind the firewall”
• Not an after thought, core feature
• Something to test
• Two hash encryption (private...
Monitoring and Verification




 What your user sees   What you monitor


Are they the same? Test transactions
4: What do we get?

• More choice on availability
• Less step functions (capacity, cost)
• Reduce computational marginal c...
Final Remarks

• Sprawl eats away from the promised good
• Never truly decoupled, apps dictate arch
• Management tools sti...
Questions?
             Jeremy Hitchcock, jeremy@dyn.com




DynDNS.com offers a suite of       The Dynect Platform provid...
Upcoming SlideShare
Loading in …5
×

Managing Virtual Sprawl

2,058 views
1,942 views

Published on

http://twitter.com/jhitchco

Published in: Technology, Real Estate
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,058
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Managing Virtual Sprawl

  1. 1. Managing Virtual Sprawl (How to not let this happen to you) Jeremy Hitchcock, jeremy@dyn.com
  2. 2. Why care? What you have What you want What you got
  3. 3. Managing clouds like managing single systems increases "system" management by 10x-20x
  4. 4. Clouds Promise • Greater efficiency • Faster deploys/less management • Little/no capital costs and no step functions
  5. 5. Sprawl Eats Potential • Greater efficiency • Faster deploys/less management • Little/no capital costs and no step functions
  6. 6. Just not good, yet 15 years 3 years
  7. 7. Don’t just change broken light bulbs
  8. 8. Wait until it gets dark, then change them all
  9. 9. Let’s get started 1. Architectures 2. Pain points 3. Best practices 4. What do we get?
  10. 10. 1: Architectures • Architecture changes • Decoupling • Geography/load balancing • Disaster recovery
  11. 11. 2004
  12. 12. 2007-2008
  13. 13. Opera dynamic resource pricing model
  14. 14. Decoupling • Apps and infrastructure mirror each other • Years of coupled development • Hard to retrofit, easier to do from start
  15. 15. Decoupling Old: Web App DB New: Processing Dispatcher Storage
  16. 16. Decoupling is Hard • Logging/debugging • Common scratch • Images and provisioning • Configuration data (run/boot) • Job dispatch (async/sync)
  17. 17. Images and provisioning Add __ new front ends Publish New Code Even better is that is automatic
  18. 18. Configuration Data • Most config data is on each image • Instead, auto populate into source control • Config, image, controller re-architected
  19. 19. Job Dispatch (sync) Request for photo Read photo off disc Resize/reformat Log Return photo to user
  20. 20. Job Dispatch (async) 2 Read photo off disc 1 4 Log Request for photo 5 Return photo to user 3 Resize/reformat
  21. 21. Geography/load balancing • Data centers do not house eyeballs • Intra/inter-site load balancing • Names to numbers (users think names) • Between clouds/interoperability?
  22. 22. Disaster Recovery • Practice them • Failovers should be automatic • DNS (Quick DNS nit: use short TTLs) • Contingency plans
  23. 23. Case Study: Authorize.net
  24. 24. Case Study: Authorize.net
  25. 25. Case Study: Authorize.net ; QUESTION SECTION: ;secure.authorize.net. IN A ;; ANSWER SECTION: secure.authorize.net. 86400 IN A 64.94.118.32 secure.authorize.net. 86400 IN A 64.94.118.33
  26. 26. Case Study: Authorize.net
  27. 27. Case Study: Authorize.net ; QUESTION SECTION: ;secure.authorize.net. IN A ;; ANSWER SECTION: secure.authorize.net. 86400 IN A 64.94.118.32 secure.authorize.net. 86400 IN A 64.94.118.33 GAH!
  28. 28. 2: Pain Points • Inventory • Delivery speed • Supply/demand • Configuration • Points of failure
  29. 29. “I can ping it but I don’t know where it is!”
  30. 30. Inventory • Does it matter? • Not an asset tag but provisioning scripts • Audit bills (operational costs)
  31. 31. Delivery Speed • May actually suffer (more pieces, not iron) • Be analytical about what can be slow • Limiting factor of what’s virtualized • Were you looking before?
  32. 32. Delivery Speed •Where is the testing from? •Is this load dependent? •Do users notice/care? Graph from Gomez •Does it matter? •Cost to make it faster? •Savings to make it slower?
  33. 33. Supply/demand • Capital investments versus operating costs • Big architecture changes to constant tuning • Sampling time
  34. 34. Configuration • Configuration in source control • Has to move to a centralized location • Patches, updates, revision images • Lot of hard work here (no return)
  35. 35. Points of Failure • It’s about risk • All in the name, DNS • 99.9% is different from 99.99% • Any page is better than nothing
  36. 36. 3: Best Practices • App rewrite • Controller (code, monitoring) • Configuration (chef, puppet, etc) • Dev/staging/production (Django/Rails) • Security • Monitoring and verification
  37. 37. Dev/Staging/Production • This stuff works, use it • Clouds make this possible • ONLY exception is load testing (big exception) • Nothing going to work out of the box
  38. 38. Security • No “behind the firewall” • Not an after thought, core feature • Something to test • Two hash encryption (private data) • Centralized management makes security easier (At least double or nothing)
  39. 39. Monitoring and Verification What your user sees What you monitor Are they the same? Test transactions
  40. 40. 4: What do we get? • More choice on availability • Less step functions (capacity, cost) • Reduce computational marginal cost
  41. 41. Final Remarks • Sprawl eats away from the promised good • Never truly decoupled, apps dictate arch • Management tools still lacking, more homegrown • Make it all automatic, not easy
  42. 42. Questions? Jeremy Hitchcock, jeremy@dyn.com DynDNS.com offers a suite of The Dynect Platform provides DNS, email, domain registration the enterprise with external and virtual servers for the home managed DNS and traffic and small business user. management services.

×