The Cloud is Broken
Those who ignore history are doomed
to repeat it
Edgar Román
emroman@pbs.org
March 3rd, 2015
DC Python Meetup
Caveats, Disclaimer, etc
• These are my opinions
• I am not yet omniscient so my knowledge of
tools mentioned may be inaccurate
• We’re really talking about Cloud Orchestration
• For moderate to complex environments (my blog
doesn’t count)
– Beyond web app / db
Our Architecture – V1
• Web App tier
– Runs code from git repo
• DB Master with slaves
– Hopefully managed by DDL in repo (i.e.
Django Migrations)
• Memcache/Redis layer
– Simple and self-configuring
• Celery Queue
– Asynchronous jobs, persistent queue
• Job worker pool
And more…
• Web App tier
– Lives in Auto-Scaling group
– Allows inbound tcp connections on 80/443 via load
balancer
• DB Master with slaves
– Only one inbound tcp port allowed
– Defined set of network connection for replication
• Memcache/Redis layer
– Restricted access to this from Web Apps only
• Celery Queue
– Web App can queue jobs, works can pop
• Job worker pool
– No inbound access at all!
Then we evolve
• V2
– Adds ElasticSearch tier
• V3
– Adds nightly Hadoop batch
Add some environments…
• Production, Staging, QA
• Then the devs want a local copy to work on
The challenge
• Production is on v1
• V2 is in QA
• Devs working on V3
And I need to manage them all quickly and easily
Philosophy Shift
• Olden days
– Used Visio to track changes to the physical
hardware
• Now
– Use tools to track multiple environments or
tiers in the cloud now
• Why not
– Create the entire architecture as needed,
preconfigured, and on-demand
If you create a single virtual entity in a
cloud without a script, it is like writing a
perl script on a server somewhere
without telling anyone
We’ve learned so much from software
development,
why can’t we use this knowledge for
cloud orchestration and management?
Modules / Decomposition
Versioning
Code Reuse / DRY
Abstraction
Compilations / Build Workflow
Modules / Decomposition
• We know from software:
– Grouping makes sense
– Helps organize logical sets of things
• What we have in cloud management:
– Default view of chef management consoles is
a flat list of nodes
– Vast majority of tutorials and examples put all
hosts in a single network
– AWS EC2, Chef, Ansible supports optional
groups by tagging
• Conclusion: Poor holistic support
Versioning
• We know from software:
– Versioning is critical for tracking features and
bugs
– Allows recovery from errors, mistakes, and
disasters
– Versioning important not just at file level, but
whole project
• What we have in cloud management:
– Ansible, Chef only version individual
playbooks/cookbooks, not
projects/environments/collections
– Restoring a known state for cloud project is a
manual process
• Conclusion: Poor holistic support
Code Reuse / DRY
• We know from software:
– Repeating yourself causes bloat and often errors
when refactoring / updating code
– Updates in normalized code are easier and well
understood
• What we have in cloud management:
– Minimal support for extra variables in
Ansible/Chef/Cloudformation per class of server
– Global variables for credentials
– Generally would need to cut/paste extra variables
in multiple places
• Conclusion: We’re getting there
Abstraction
• What we know from software:
– Using abstractions like file i/o allow use on
multiple platforms
• What we have in cloud management:
– Mostly tools support multiple clouds (AWS,
Rackspace, etc)
– OpenStack is closest analogy to cloud
abstraction
• Conclusion: Very Promising
Compilation / Workflow
• What we know from software:
– Compilation of code enables easy transport
and packaging
– Enables DRY capabilities
• What we have in cloud management:
– Workflow support is generally supported, but
not necessarily holistically or with versioning
of workflow support
• Conclusion: Not Bad
So…we should extend tools…
• to deal with not just servers, but networks and
other entities (abstraction)
• to manage collections of these entities
(modules)
• to manage versioning of these collections
(versioning)
• to allow configuration of these versioned
collections per environments (dry)
• to allow deployment (workflow) of these
versioned collections with configurations to
specific environments
Keep an eye on…
• Apache CloudStack
– http://cloudstack.apache.org/
• Cloudify
– http://getcloudify.org/
Questions?
Oh yeah, we’re hiring…

Cloud Orchestration is Broken

  • 1.
    The Cloud isBroken Those who ignore history are doomed to repeat it Edgar Román emroman@pbs.org March 3rd, 2015 DC Python Meetup
  • 2.
    Caveats, Disclaimer, etc •These are my opinions • I am not yet omniscient so my knowledge of tools mentioned may be inaccurate • We’re really talking about Cloud Orchestration • For moderate to complex environments (my blog doesn’t count) – Beyond web app / db
  • 3.
    Our Architecture –V1 • Web App tier – Runs code from git repo • DB Master with slaves – Hopefully managed by DDL in repo (i.e. Django Migrations) • Memcache/Redis layer – Simple and self-configuring • Celery Queue – Asynchronous jobs, persistent queue • Job worker pool
  • 4.
    And more… • WebApp tier – Lives in Auto-Scaling group – Allows inbound tcp connections on 80/443 via load balancer • DB Master with slaves – Only one inbound tcp port allowed – Defined set of network connection for replication • Memcache/Redis layer – Restricted access to this from Web Apps only • Celery Queue – Web App can queue jobs, works can pop • Job worker pool – No inbound access at all!
  • 5.
    Then we evolve •V2 – Adds ElasticSearch tier • V3 – Adds nightly Hadoop batch
  • 6.
    Add some environments… •Production, Staging, QA • Then the devs want a local copy to work on
  • 7.
    The challenge • Productionis on v1 • V2 is in QA • Devs working on V3 And I need to manage them all quickly and easily
  • 8.
    Philosophy Shift • Oldendays – Used Visio to track changes to the physical hardware • Now – Use tools to track multiple environments or tiers in the cloud now • Why not – Create the entire architecture as needed, preconfigured, and on-demand
  • 9.
    If you createa single virtual entity in a cloud without a script, it is like writing a perl script on a server somewhere without telling anyone
  • 10.
    We’ve learned somuch from software development, why can’t we use this knowledge for cloud orchestration and management?
  • 11.
    Modules / Decomposition Versioning CodeReuse / DRY Abstraction Compilations / Build Workflow
  • 12.
    Modules / Decomposition •We know from software: – Grouping makes sense – Helps organize logical sets of things • What we have in cloud management: – Default view of chef management consoles is a flat list of nodes – Vast majority of tutorials and examples put all hosts in a single network – AWS EC2, Chef, Ansible supports optional groups by tagging • Conclusion: Poor holistic support
  • 13.
    Versioning • We knowfrom software: – Versioning is critical for tracking features and bugs – Allows recovery from errors, mistakes, and disasters – Versioning important not just at file level, but whole project • What we have in cloud management: – Ansible, Chef only version individual playbooks/cookbooks, not projects/environments/collections – Restoring a known state for cloud project is a manual process • Conclusion: Poor holistic support
  • 14.
    Code Reuse /DRY • We know from software: – Repeating yourself causes bloat and often errors when refactoring / updating code – Updates in normalized code are easier and well understood • What we have in cloud management: – Minimal support for extra variables in Ansible/Chef/Cloudformation per class of server – Global variables for credentials – Generally would need to cut/paste extra variables in multiple places • Conclusion: We’re getting there
  • 15.
    Abstraction • What weknow from software: – Using abstractions like file i/o allow use on multiple platforms • What we have in cloud management: – Mostly tools support multiple clouds (AWS, Rackspace, etc) – OpenStack is closest analogy to cloud abstraction • Conclusion: Very Promising
  • 16.
    Compilation / Workflow •What we know from software: – Compilation of code enables easy transport and packaging – Enables DRY capabilities • What we have in cloud management: – Workflow support is generally supported, but not necessarily holistically or with versioning of workflow support • Conclusion: Not Bad
  • 17.
    So…we should extendtools… • to deal with not just servers, but networks and other entities (abstraction) • to manage collections of these entities (modules) • to manage versioning of these collections (versioning) • to allow configuration of these versioned collections per environments (dry) • to allow deployment (workflow) of these versioned collections with configurations to specific environments
  • 18.
    Keep an eyeon… • Apache CloudStack – http://cloudstack.apache.org/ • Cloudify – http://getcloudify.org/
  • 19.

Editor's Notes

  • #9 The software analogy is: we write code and then compile it. But do we ever edit the compiled binary? That’s like manually editing your cloud infrastructure
  • #13 Starting to see a little bit with Chef’s environments