Hybrid cloud is becoming a necessity for many organizations. But building and managing an environment that effectively leverages the strengths of both public and private clouds can be a greater challenge than anticipated. One of the most critical elements of a hybrid cloud scenario is the management solutions that manage the cloud application lifecycle effectively. This presentation focuses on how organizations can manage their hybrid environments to ensure they achieve cloud computing success.
3. The era of delayed gratification
is over
The Internet allows innovations
to be delivered as a constant flow
that incorporates user needs.
We live in a fast paced world
10. Right tools for the right job
Focus on what matters
Outsource everything else
11. Why is it important?
Picking the right system – accumulates less
technical debt
Every project has different needs – What matters is
higher level business goals
Vendor relationships may exist – It’s time to forget
them
12. Evaluating Cloud Platforms
Criteria
• Data Management
• How and where will the data be stored?
• Who can access the data and who owns it?
• Security
• Terms of Service
• Support
• Privacy Policy
• Service Level Agreements (be careful about this one)
• Ethics
• Disclaimers
• Breakup penalty
• Price, Billing and Accounting
• Technical Capabilities
• Data and application architecture
• APIs and data transformations
• Performance
• Geographies
13. Step 2
Plan for Failure
Complexity increases , defects accumulate
No single component can guarantee 100% uptime
Failure Happens
And not JUST in the public cloud
14. Test for Failure
The best defense against major unexpected
failures is to fail often
Tools:
• Simian Army - All those damn Monkeys
• Game Day
Increase resilience through large scale fault
injection across critical systems
How:
Start Small
Learn Lessons
Build Confidence
Full scale live exercises
Build resiliency into coding practices
15. Design for Failure
Redundancy, Fault-Tolerance and Graceful Degradation
Enables a system to continue operating properly in the event of the failure of some
of its components.
Circuit Breaker
Protects clients from slow or broken services .
Protects services from demand in excess of
capacity.
Feature Flags
Restrict features to certain environments, while still using the same code base
on all servers.
17. What to automate?
Plan
Develop
DeployOperate
Optimize
Create and configure
lightweight, reproduci
ble, and portable
development
environments
Trigger builds, tests,
manage features in
real time
Monitor
applications, track
costs
Manage
resources, scale
up/down rapidly
on-demand
23. There is a new attack in town …
Bring the service down not by stopping the
service but by making it extremely
expensive to run.
Botnets can make seemingly legitimate requests for service to
generate an economic denial of sustainability (EDoS) -- where the
dynamism of the infrastructure allows scaling of service beyond the
economic means of the vendor to pay their cloud-based service bills.
24. Measuring costs
Subscription Billing
(manage online
subscription services)
IT Accounting,
Charge-back, Show-
back (charging-back
variable IT costs. A
foundation for providing
basic IT cost transparency.)
IT Finance and
Technology Business
Management
(A more strategic role to
manage and forecast
costs, evaluate overall
value, and assist in
IT/business decision-
making)
Aria
Monexa
Zuora
Cloudability
CloudRows
Cloudyn
Costnomics
Newvem
Nicus Software
Pace Applied
Technology
uptimeCloud
Apptio
BMC
Claritia
CloudCruiser
Comsci
Cube Billing
26. Challenges with User Management
APPS APPS APPS
Users belong to one
or more groups or
departments which
may interact with
one another
causing a human
scale &
coordination
problem
Apps created by the
teams can run in
one or more cloud.
Each cloud has its
own
authentication, keys,
certificates causing
operations sprawl
APPS
27. Leverage cloud security brokers
Use cloud security
broker solutions
without exposing
internal services to
manage access to
clouds, cloud
resources or keys
ENTERPRISE DIRECTORY
DEPLOYMENT
New or
existing user
Removed
User
CLOUD MANAGEMENT SOLN
• Users
• Groups
• Access Rights
• Keys
Add / Sync Remove
29. If you do nothing else
Hire smart people to figure things
out
You cannot automate everything –
YMMV
Get them to talk to each other
Communication is key
30. The 7 Steps
1 Choose your path wisely
2 Plan for failure
3 Automate all the things
4 Be data-driven
5 Design and operate with costs in mind
6 Security is not an after-thought
7 Invest in your people and culture
32. Enterprise Scenario – with Enstratius
Single point of control for implementation of governance policies
Directory drives
access &
authentication
Full self service
within approved
governance
framework
Complete,
persistent audit
trail
Budget controls
Security policy
compliance
Editor's Notes
We live in a fast paced world where new capabilities that satisfy some user need can be delivered quickly. The user has come to expect the continual delivery of functionality + operability. If we are not set up to be agile, somebody else is and we can quickly fall behind
Hence application lifecycle has transitioned from serial big bang releases to smaller releases with continuous delivery to accommodate constant changing business needs
We are creating an environment of corporate anxiety and raised blood pressure.Developers start to take shortcutswrt testing and documentation and when you use the code in production, it doesn’t work as expected
We could end up dealing with stress by drinking heavily or if you are like me - eating chocolates
I am going to talk about some of the steps we can take to manage cloud application lifecycle better. These are things we have tried ourselves or with our customers.
There are a lot of options available in the market today, public, private, hybrid, iaas, paas, saas and a variety of vendors satisfying each need
What is key as the foundation is choosing the right platform that solves the business need. Focusing our efforts on what matters helps us develop a competitive advantage.
Plug-in based architectures are excellent examples of the contextual abstraction. The plug-in API provides a plethora of data structures and other useful context developers inherit from or summon via already existing methods. But to use the API, a developer must understand what that context provides, and that understanding is sometimes expensive…Eclipse and IntelliJComposable systems tend to consist of finer grained parts that are expected to be wired together in specific ways.e.e. parsing a file using a higher generation language vs shell scriptscomposable build tools scale (in time, complexity, and usefulness) better than contextual ones. Contextual tools like Ant and Maven allow extension via a plug-in API, making extensions the original authors envisioned easy. However, trying to extend it in ways not designed into the API range in difficultly from hard to impossiblehttp://gigaom.com/2013/02/16/devops-complexity-and-anti-fragility-in-it-context-and-composition/(Examples)Developer community – Eclipse vscmd lineOperations community - Chef and Puppet vs RBA (connection vs plugin model)
This page lists some of the things you can keep in mind when evaulation cloud platforms. SLA in particular is a tricky one because it’s the SLA of a component. When you tie the various components together, the SLA of a system is lower. And you will have down time even with 5 9shttp://www.datacenterknowledge.com/archives/2010/06/01/how-to-evaluate-cloud-computing-providers/http://www.dummies.com/how-to/content/how-to-choose-the-right-cloud-computing-service-pr.html
WE have all been victim to Murphy’s lay - Anything that can go wrong, will go wrongAnd so we have to be prepared and plan for failure to avoid major losses to our business
The best defense against unexpected failures is to fail often. Game Day and the Monkeys from the Simian Army are about about discovering and learning from failures more quickly and proactively. As we learn more about our applications we can work resilience engineering practices into our application architectures Chaos MonkeyLatency MonkeyConformity MonkeyDoctor MonkeyJanitor MonkeySecurity Monkey10-18 MonkeyChaos GorillaChaos Kong
Automated failover architectures, circuit-breaker patterns, and other resilience engineering practices make availability more continuous, both by preventing failures and by enabling faster healing.E.g. Netflix – “If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine”Starts with design and things you can do from ops Infrastructure – how the components communicate with each other about failurehttp://en.wikipedia.org/wiki/Circuit_breaker_design_pattern
What are you automating?What things?How do u automated – tools?
Git – Code RepositoryJenkins – Kick off testsVagrant – Build our dev environmentMaven – Buildfpm – create RPMs and .deb filesGrails – Web framework based on the Groovy languageArtifactory – repository managerChef – InstallsLogstash – collect logs from different serversGraphite – VisualizationNagios – MonitoringRiemann – MonitoringNew Relic – MonitoringBoundary – MonitoringAria – Subscription BillingQualys – Vulnerability scansTwillo - SMSOSSEC - IDS
Making decisions based on data is the next important step in cloud applications management
We have used logstash to consolidate logs in our environment
And visualize the data using graphiteOther tools:SplunkCactiRRD toolNetflix dashboardEtsy dashboard
Outsourcing to the cloud to reduce CapEx and move to an OpEx model just meant that while availability, confidentiality and integrity of your service and assets are solid, your sustainability and survivability are threatened.http://www.behind-the-enemy-lines.com/2012/04/google-attack-how-i-self-attacked.htmlhttp://rationalsecurity.typepad.com/blog/2008/11/cloud-computing-security-from-ddos-distributed-denial-of-service-to-edos-economic-denial-of-sustaina.html
Chris Hoff in his blog talked about economic denial of sustainability (EDoS) -- where the dynamism of the infrastructure allows scaling of service beyond the economic means of the vendor to pay their cloud-based service bills.
There are several tools in the market that help track costs.http://fountnhead.blogspot.com/2012/05/follow-its-money-survey-of-it-financial.htmlOnce you have visibility into costs, you can start building applications with costs in - what server size do you really need? How much data should you transfer? When should you transfer data?PA Consulting Group recently worked with a public-sector client to deliver a large-scale Google App Engine implementation which needed to query large data sets calculated from source data at speed. The client had to choose between executing complex run-time queries and paying for processing power, or pre-computing large data sets and so paying for storage. Either approach was valid, but which would be more cost-effective?
You cant talk about cloud applications and not talk about security. While there are several aspects to security, I am going to talk about user managementThe challenge that enterprises face today that is different from web scale companies is the need to manage several applications created by several teams as opposed to one or a few applications. Users belong to one or more groups or departments which may interact with one another causing a human scale & coordination problemApps created by the teams can run in one or more cloud.Each cloud has its own authentication, keys, certificates causing operations sprawlProblemsComplianceCorodination
Leverage on-premise directory services to manage users and authenticationUsers can log into Cloud security broker (Enstratius) w/their AD credsthen, through that, use resources on the cloudwithout ever having to have creds on the outside cloud directlythen, once the vms are up and runningyou can create user access on those vms that is unique to the user inside enstratiusessentially, brokering your local AD auth into vms on the cloud without ever exposing your auth resource (AD in this case) to anything outside your firewall Cloud security as a shared responsibility and how important it is for users to recognize what remains in their domain of responsibility.CSAhas an excellent set of assessment guidelines and controls for cloud security.Challenge: Managing cloud as a one-off & forgetting to update correctlyUsers who change jobs or leave not fully synced or removedSolution: Synchronize/delegate authentication with LDAP/ADRetains single point of control over user & authenticationGuest VMs do not talk directly to your LDAP/AD infrastructureUsers removed from LDAP are automatically removed from appropriate VMs
Enstratius Mission is to address these type of problems