Cloud Computing - A Practical View

    Mandeep Dhami
http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html
Overview
•   The Context
     – A specific project scenario


•   Why Cloud Computing?
     – Economic drivers
     – Flexibility and agility
     – New capabilities


•   Why not Cloud Computing?
     – Regulatory constraints
     – Operational concerns
     – Technical issues


•   And the Practical “Middle Way”!
     – Services evaluated
     – Proposed engagement
The Context
•   Cloud computing can mean                                          •   In this talk we evaluate the
    different things to different                                         trade-offs in context of the
    people                                                                following hypothetical scenario:
                                                                           – You work on a medicare/medicaid
                                                                             eligibility system
                                                                           – Field workers use a web based
                                                                             tool to input case details and to
                                                                             check status
                                                                           – Web server is implemented using
                                                                             java/websphere on a Windows
                                                                             Server
                                                                           – Backend eligibility sub-system is
                                                                             implemented using COBOL on a
                                                                             IBM mainframe
                                                                           – You are tasked with evaluating a
                                                                             cloud based solution for the web
                                                                             tool
     http://www.nature.com/ki/journal/v62/n5/fig_tab/4493262f1.html
Many Layers of the Cloud
Some Initial Design Constraints
•   Type of cloud service required - IaaS or Private Cloud
     – Since it is a custom software application, SaaS is not an option
     – Since the platform is also very custom (for libraries and versions) and has some
       non-standard libraries (say websphere v6.5, DB2 v9.1, JCA for CICS, etc …),
       PaaS is not an option either.
     – IaaS might be feasible as we own the software stack in that model
     – Private cloud can always be used, as we will own the cloud in that model!


•   Type of connectivity required – “VPN to VM”
     – We will need secure encrypted connection to backend system for the web
       application to get/update case status. Conceptually this is like a VPN from the
       VM to the backend.
     – Any IaaS solution that does not provide secure connection from the server VM to
       internal LAN can not be used
Why Cloud Computing?

   To cloud or not to cloud, that is the question …
http://geekandpoke.typepad.com/geekandpoke/2009/11/simply-explained-project-risk-update.html
Economic Drivers
•   Pay as you go                           •   Lower support costs
     – No upfront cost to acquire                – The team does not have manage
       server/network hardware                     hardware, network or storage for
     – Only pay for dev and test systems           production system
       during dev and test phases                – No need to hire expensive
     – No upfront cost to “try” new                consultants for non-core
       features like Web Firewalls                 (infrastructure related) activities




•   Deterministic Project Costing           •   Lower hardware costs
     – More transparency regarding               – Typical server utilization is low,
       infrastructure costs                        pay only for what you use
     – Less risk from last minute capital        – Typical network utilization is low
       cost request related to production          (routers, firewall, etc), pay only
       usage                                       for what you use
     – Not encumbered by internal
       transfer accounting!
Flexibility and Agility
•   Rapid Scaling                           •   Dynamic Provisioning
     – Start small, scale as required            – Spin up more test-beds as
       based on production performance             required. Keep test execution
       measurements                                moving even as developers are
     – Respond faster to customer                  debugging on an existing setup
       demand for capacity                       – Spin up systems to do load testing
     – Respond faster to features that             as required. Pay only for the time
       require more compute/storage                used to do the tests
       resources

•   Dynamic Infrastructure                  •   More Choice
     – Enable infrastructure changes with        – Change infrastructure vendors for
       mouse clicks                                better SLA or price without
     – Increase server pool for batch              impacting/altering the application
       processing as required – meet any         – Do “Beta test” for a few case
       batch window (at some cost)                 workers on a small system, roll
     – Developers can prototype “at                out new code incrementally
       production scale” and capacity            – Roll back to a previous image, as
                                                   a fallback option
New Capabilities
•   Next Gen architectures                  •   Accelerate innovation
     – Enable disaster recovery by using         – Shift from supporting the
       a service provider with multiple            infrastructure to innovating on
       physical locations                          application
     – “Try” new features like                   – Use cost transparency to innovate
       memcached, CDNs, etc. without               processes and reduce waste
       new investment in hardware or
       infrastructure expertise




•   Advanced infrastructure                 •   Green computing
    capabilities                                 – Increase server utilization, reduce
     – Change management to server                 power usage
       configuration is centrally managed        – Use more efficient cooling, reduce
       and encapsulated                            power usage
     – Self healing, hot backups etc.            – Reduce number of servers and
       available                                   reduce waste
     – API’s available to infrastructure
       for flow-thru’ automation
Why Not Cloud Computing?

   There be dragons …
First, you sometimes hear some FUD …

 “We will have no liability to you for any unauthorized access or use,
 corruption, deletion, destruction or loss of Your Content or
 Applications”
                                          Customer Agreement, Amazon Web Services




 “Salesforce.com shall not be responsible or liable for the deletion,
 correction, destruction, damage, loss or failure to store any
 customer data”
                                      Master Subscription Agreement, Salesforce.com



       … but this is not really very different from software EULA
 (So we believe that you can safely ignore this issue, except during contract negotiation)
                                                              during
But there are Real Regulatory Constraints
•   Privacy                                 •   Forensics and audit
     – Since this project handles medical        – If your cloud APIs can not be
       data, HIPPA rules apply                     audited for forensic investigation,
     – If your cloud infrastructure can            you can not use it for sensitive
       not be HIPPA compliant, you can             data
       not use it                                – If audit data is not
                                                   cryptographically secure, it lacks
                                                   adequate controls



•   Governance mandate                      •   PKI infrastructure
     – Just because the application is on        – How are private keys stored and
       cloud, the governance mandates              managed by the cloud based VMs?
       do not go away!                           – Can you meet FIPS requirements
     – Can you produce reports on usage            that you currently meet with
       or controls that are comparable to          hardware/physical security
       a system with physical security?            constraints?
And Real Operational Concerns
•   The Blame game                          •   Priority management
     – When there is a problem today, it         – When you have a customer
       is already painful to get from              situation, your “tech team” works
       defect to defect ownership …                on it as #1 priority till it is
                                                   resolved …
       When a problems occur in cloud,
       how do you get from the “conf-call          How do you set priority for the
       from hell” discussing defect to             cloud vendor’s tech team to fix
       productive “root cause analysis”            your specific problem among their
       and taking defect ownership?                priorities?

•   SLA “assurance”                         •   Vendor lock-in
     – Can you measure service levels in         – How real is the promise of choice?
       terms of the metrics used in the          – To resolve the technical or
       SLA in the contract?                        operational issues, are you tying
     – Do you get reports on “real SLA”            into a proprietary API that limits
       or on a synthetic benchmark?                any real choice?
     – Do you get “continuous reporting”
       of metrics that you can use for
       trend analysis and planning?
And Very Real Technical Issues
•   Visibility                              •   Security
     – Clear system boundary with                – Encrypted VPN from “Server VM to
       adequate instrumentation                    the Backend network”
     – Tools to view infrastructure usage        – SSO integration for admin/API
       by your application                         usage
                                                 – “Safe sharing” of shared resources
                                                   (like network, swap, crash dump,
                                                   etc).



•   Diagnostics                             •   Network Services
     – On demand capture of data, traffic        – No good model for application
       and performance statistics                  level network services (like
     – Flow thru’ integration with                 firewall, load balancer, etc)
       automation/tools                          – We can use x86 VMs as virtual
     – Automated data capture (black               appliances, but they lack the
       box) before the VM image is lost.           hardware acceleration of typical
                                                   network devices
The Practical “Middle Way”

   In Buddhism, the “Middle Way” is the Nirvana-bound path of
   moderation - away from the extremes of sensual indulgence and
   self-mortification and toward the practice of wisdom, morality and
   mental cultivation.
                                                   From http://en.wikipedia.org/wiki/Middle_way
                                                        http://en.wikipedia.org/wiki/Middle_way
From http://dilbert.com/strips/comic/2009-11-18
                http://dilbert.com/strips/comic/2009-11-




… No I really did not mean that!
Cloud Service’s Evaluation for This Specific Project
NOTE: This is a sample evaluation. Your results will differ based on the assumptions
that you make on the project and on the services them selves


Service     Product                                           Regulatory      Operational                 Technical
Provider                                                      Constraints     Concerns*                    Issues

Amazon      EC2

            Solid performer, lots of 3rd party support


Rackspace   Mosso

            Solid performer, good enterprise support


Savvis      Virtualization in the Cloud

            Closest to a private cloud (VMware), very
            good enterprise support

Appnexus    Appnexus Cloud

            Not clear how it will handle issues specific to
            government or HIPPA compliance

                                                                     * Assuming appropriate relationship and contract/penalties
Engagement Proposed for This Specific Project

•   First qualify the service provider’s offering for regulatory issues
     –   HIPPA
     –   PCI (if you accept credit cards for fees)
     –   FIPS (for PKI)
     –   Etc


•   Then qualify your relationship with the service provider so that you can handle
    operational issues around “blame game”, priority management etc.

•   Then qualify the network, the virtual servers, and the storage for security, visibility,
    manageability, diagnostics, etc. In particular, qualify the secure VPN to your virtual
    servers (like Amazon’s VDC)

•   Finally move development and test of next major upgrade to cloud service provider.
    Do a beta roll out first, and then scale incrementally as you build confidence.

•   With dev & test success behind you, use it as a model to transition the production
    servers (for the web application) to the cloud.

•   Always, incremental build-up based on success of the previous step!
Cloud Computing Conf 1209

Cloud Computing Conf 1209

  • 1.
    Cloud Computing -A Practical View Mandeep Dhami
  • 2.
  • 3.
    Overview • The Context – A specific project scenario • Why Cloud Computing? – Economic drivers – Flexibility and agility – New capabilities • Why not Cloud Computing? – Regulatory constraints – Operational concerns – Technical issues • And the Practical “Middle Way”! – Services evaluated – Proposed engagement
  • 4.
    The Context • Cloud computing can mean • In this talk we evaluate the different things to different trade-offs in context of the people following hypothetical scenario: – You work on a medicare/medicaid eligibility system – Field workers use a web based tool to input case details and to check status – Web server is implemented using java/websphere on a Windows Server – Backend eligibility sub-system is implemented using COBOL on a IBM mainframe – You are tasked with evaluating a cloud based solution for the web tool http://www.nature.com/ki/journal/v62/n5/fig_tab/4493262f1.html
  • 5.
    Many Layers ofthe Cloud
  • 6.
    Some Initial DesignConstraints • Type of cloud service required - IaaS or Private Cloud – Since it is a custom software application, SaaS is not an option – Since the platform is also very custom (for libraries and versions) and has some non-standard libraries (say websphere v6.5, DB2 v9.1, JCA for CICS, etc …), PaaS is not an option either. – IaaS might be feasible as we own the software stack in that model – Private cloud can always be used, as we will own the cloud in that model! • Type of connectivity required – “VPN to VM” – We will need secure encrypted connection to backend system for the web application to get/update case status. Conceptually this is like a VPN from the VM to the backend. – Any IaaS solution that does not provide secure connection from the server VM to internal LAN can not be used
  • 7.
    Why Cloud Computing? To cloud or not to cloud, that is the question …
  • 8.
  • 9.
    Economic Drivers • Pay as you go • Lower support costs – No upfront cost to acquire – The team does not have manage server/network hardware hardware, network or storage for – Only pay for dev and test systems production system during dev and test phases – No need to hire expensive – No upfront cost to “try” new consultants for non-core features like Web Firewalls (infrastructure related) activities • Deterministic Project Costing • Lower hardware costs – More transparency regarding – Typical server utilization is low, infrastructure costs pay only for what you use – Less risk from last minute capital – Typical network utilization is low cost request related to production (routers, firewall, etc), pay only usage for what you use – Not encumbered by internal transfer accounting!
  • 10.
    Flexibility and Agility • Rapid Scaling • Dynamic Provisioning – Start small, scale as required – Spin up more test-beds as based on production performance required. Keep test execution measurements moving even as developers are – Respond faster to customer debugging on an existing setup demand for capacity – Spin up systems to do load testing – Respond faster to features that as required. Pay only for the time require more compute/storage used to do the tests resources • Dynamic Infrastructure • More Choice – Enable infrastructure changes with – Change infrastructure vendors for mouse clicks better SLA or price without – Increase server pool for batch impacting/altering the application processing as required – meet any – Do “Beta test” for a few case batch window (at some cost) workers on a small system, roll – Developers can prototype “at out new code incrementally production scale” and capacity – Roll back to a previous image, as a fallback option
  • 11.
    New Capabilities • Next Gen architectures • Accelerate innovation – Enable disaster recovery by using – Shift from supporting the a service provider with multiple infrastructure to innovating on physical locations application – “Try” new features like – Use cost transparency to innovate memcached, CDNs, etc. without processes and reduce waste new investment in hardware or infrastructure expertise • Advanced infrastructure • Green computing capabilities – Increase server utilization, reduce – Change management to server power usage configuration is centrally managed – Use more efficient cooling, reduce and encapsulated power usage – Self healing, hot backups etc. – Reduce number of servers and available reduce waste – API’s available to infrastructure for flow-thru’ automation
  • 12.
    Why Not CloudComputing? There be dragons …
  • 13.
    First, you sometimeshear some FUD … “We will have no liability to you for any unauthorized access or use, corruption, deletion, destruction or loss of Your Content or Applications” Customer Agreement, Amazon Web Services “Salesforce.com shall not be responsible or liable for the deletion, correction, destruction, damage, loss or failure to store any customer data” Master Subscription Agreement, Salesforce.com … but this is not really very different from software EULA (So we believe that you can safely ignore this issue, except during contract negotiation) during
  • 14.
    But there areReal Regulatory Constraints • Privacy • Forensics and audit – Since this project handles medical – If your cloud APIs can not be data, HIPPA rules apply audited for forensic investigation, – If your cloud infrastructure can you can not use it for sensitive not be HIPPA compliant, you can data not use it – If audit data is not cryptographically secure, it lacks adequate controls • Governance mandate • PKI infrastructure – Just because the application is on – How are private keys stored and cloud, the governance mandates managed by the cloud based VMs? do not go away! – Can you meet FIPS requirements – Can you produce reports on usage that you currently meet with or controls that are comparable to hardware/physical security a system with physical security? constraints?
  • 15.
    And Real OperationalConcerns • The Blame game • Priority management – When there is a problem today, it – When you have a customer is already painful to get from situation, your “tech team” works defect to defect ownership … on it as #1 priority till it is resolved … When a problems occur in cloud, how do you get from the “conf-call How do you set priority for the from hell” discussing defect to cloud vendor’s tech team to fix productive “root cause analysis” your specific problem among their and taking defect ownership? priorities? • SLA “assurance” • Vendor lock-in – Can you measure service levels in – How real is the promise of choice? terms of the metrics used in the – To resolve the technical or SLA in the contract? operational issues, are you tying – Do you get reports on “real SLA” into a proprietary API that limits or on a synthetic benchmark? any real choice? – Do you get “continuous reporting” of metrics that you can use for trend analysis and planning?
  • 16.
    And Very RealTechnical Issues • Visibility • Security – Clear system boundary with – Encrypted VPN from “Server VM to adequate instrumentation the Backend network” – Tools to view infrastructure usage – SSO integration for admin/API by your application usage – “Safe sharing” of shared resources (like network, swap, crash dump, etc). • Diagnostics • Network Services – On demand capture of data, traffic – No good model for application and performance statistics level network services (like – Flow thru’ integration with firewall, load balancer, etc) automation/tools – We can use x86 VMs as virtual – Automated data capture (black appliances, but they lack the box) before the VM image is lost. hardware acceleration of typical network devices
  • 17.
    The Practical “MiddleWay” In Buddhism, the “Middle Way” is the Nirvana-bound path of moderation - away from the extremes of sensual indulgence and self-mortification and toward the practice of wisdom, morality and mental cultivation. From http://en.wikipedia.org/wiki/Middle_way http://en.wikipedia.org/wiki/Middle_way
  • 18.
    From http://dilbert.com/strips/comic/2009-11-18 http://dilbert.com/strips/comic/2009-11- … No I really did not mean that!
  • 19.
    Cloud Service’s Evaluationfor This Specific Project NOTE: This is a sample evaluation. Your results will differ based on the assumptions that you make on the project and on the services them selves Service Product Regulatory Operational Technical Provider Constraints Concerns* Issues Amazon EC2 Solid performer, lots of 3rd party support Rackspace Mosso Solid performer, good enterprise support Savvis Virtualization in the Cloud Closest to a private cloud (VMware), very good enterprise support Appnexus Appnexus Cloud Not clear how it will handle issues specific to government or HIPPA compliance * Assuming appropriate relationship and contract/penalties
  • 20.
    Engagement Proposed forThis Specific Project • First qualify the service provider’s offering for regulatory issues – HIPPA – PCI (if you accept credit cards for fees) – FIPS (for PKI) – Etc • Then qualify your relationship with the service provider so that you can handle operational issues around “blame game”, priority management etc. • Then qualify the network, the virtual servers, and the storage for security, visibility, manageability, diagnostics, etc. In particular, qualify the secure VPN to your virtual servers (like Amazon’s VDC) • Finally move development and test of next major upgrade to cloud service provider. Do a beta roll out first, and then scale incrementally as you build confidence. • With dev & test success behind you, use it as a model to transition the production servers (for the web application) to the cloud. • Always, incremental build-up based on success of the previous step!