1. CLOUD COMPUTING
FROM ACADEMIC PERSPECTIVE
Dani Adhipta
JTETI, UGM
Cloud Computing Seminar
Gadjah Mada University
Yogyakarta, 12 October 2012
2. Cloud Computing
• Controversial definition
Richard Stallman: "It's stupidity. It's
worse than stupidity: it's a marketing
hype campaign." "Careless
computing" since data is being held
by 3rd party.
12/10/2012 UC-UGM 2
Old ideas of the 60’s
• John McCarthy: “computation delivered as a public utility in the same
way as water and power.”
• J. C. R. Licklider: “the intergalactic computer network”
Dormant for 35 years and restarted in 1999-2002 as Salesforce.com,
Amazon Web Services (AWS) to serve apps. Then Amazon Elastic
Compute Cloud (EC2) (2006) on-demand service and Google Apps
browser-based (2009).
3. Overview
• Cloud computing: applications, system software, and hardware
delivered as services over the Internet.
• Service oriented architecture + virtualization + utility computing
• Software as a Service (SaaS), Infrastructure as a Service
(IaaS), Platform as a Service (PaaS), Everything as a Services
(XaaS)
• Web services, data centers
• Public cloud, private cloud, community/hybrid cloud
• Major public cloud service providers: Amazon, Google,
Saleforce; IBM, Microsoft, HP, VMWare
• 5th generation of computing
• (Re)-New paradigm
12/10/2012 UC-UGM 3
4. Benefits
• Cloud is green (not grey)
• Centralized in the internet, not localized
• Faster access (supposedly)
• Provision by 3rd party, pay as-you-go and as needed
• Relieved the management/deployment of infrastructure or
technological issues
• Scaling up (or down) when desired
• Ready immediately (the infrastructure)
• Pros: agility, cost, device/location independence, multi-tenancy,
reliability, scalability, maintenance, metering, failsafe, highly
available, load-balancing
12/10/2012 UC-UGM 4
5. Outstanding Issues
• Service level agreements (SLA) – Assurances for uptime, legal
protection, and security?
• Uptime and reliability – In comparison to locally hosted and managed
resources?
• Cost and affordability – Cost model over time? Personnel and
technology resources vs local solution?
• Legal and organizational issues – Related issues
consideration? Customer (or student/patron) data? Platform and
connection security?
• Staff knowledge – Migration related toward staff knowledge and
competency? Everything is known?
• Possible vendor-locked?
• Future outlook?
• Cons: privacy, security, compliance, availability, performance,
sustainability
12/10/2012 UC-UGM 5
7. Categories
• Industrial Clouds
• Can be either a (i) public cloud, or (ii) private cloud
• Private clouds are accessible only to company employees or organization
members
• Public clouds provide service to commercial customer:
• Amazon S3 (Simple Storage Service): store arbitrary datasets, pay per
GB-month stored
• Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary images,
pay per CPU hour used
• Google AppEngine: develop applications within their app engine
framework, upload data which is then imported into their format, and run
• Academic Clouds
• Allow researchers to innovate, deploy, and experiment
• Google-IBM Cloud (U. Washington): run apps programmed atop Hadoop
• Cloud Computing Testbed (CCT @ UIUC): first cloud testbed to support
systems research. OpenCirrus: first federated cloud testbed.
http://opencirrus.org
12/10/2012 UC-UGM 7
8. Academic Institutions
• Saving costs as much as possible
• Only utilize cloud when needed (elastic)
• Saving space (local resources)
• Not-so-mission-critical purposes
• Security is still important
• Performance is not the primary
• Easy to use and manage
• Use in-house human resources
• Continually training personnel
• As learning tool for student
• Research vision and mission
• Experiments are the common activities
• Cater innovation advancements
The bottomline is clear, the Cloud must be more beneficial than
having own infrastructure.
12/10/2012 UC-UGM 8
9. Cloud Costs and Properties Example
• Amazon EC2 Computing Economics
• Reduces the time required to obtain and boot new server
instances to minutes or even seconds
• Quickly scales capacity, both up and down, as your
computing requirements change
• Changes the economics of computing:
• Pay only for capacity that you actually use
• No start-up, monthly, or fixed costs
• $0.10 per CPU hour
• $0.20 per GB transferred across Net
• No cost to transfer data between Amazon S3 and Amazon
EC2
12/10/2012 UC-UGM 9
10. Basic Requirements
• Scalability: future need
• Reliability: 24/7 availability, 99.9% uptime in SLA is
common
• Performance: low latency network and processing power
• Openness: future innovation and new applications
• Failsafe, Highly Available (HA), Load-balancing
• Multitenancy (Resource Pooling): computing resources
are pooled to serve multiple cloud users
• Self-provisioning of resources: provision computing
capabilities without the intervention of the service provider
12/10/2012 UC-UGM 10
11. The Goal
• Provide cost-effective computing environment for
research in academic institutions
• The goal is to improve students' knowledge of distributed
and parallel computing practices and better prepare them
for increasingly popular large-scale computing that takes
place in the "real world," such as search engines, social
networking sites, and scientific computational needs.
• To develop different skills, such as managing contracts,
overseeing integration between in-house and outsourced
services, and mastering a different model of IT budgets.
• Arguably more secure than on-campus solutions, given
the infrastructure complexity at the institutional level
12/10/2012 UC-UGM 11
13. Cloud Deployment Models
• Public clouds
• Private clouds
• Hybrid clouds
• Automated Elasticity as on-demand to accommodate spill-
over/overflow in hybrid private-public cloud.
• Transparently grow the cluster’s capacity using an external cloud
provider
12/10/2012 UC-UGM 13
US National institute of standards and technology (NIST)
14. Scientific Apps Example
• Loosely-coupled parallel applications
• Many domains: astronomy, biology, earth science.
• Potentially very large: 10 K tasks common, >1M not uncommon
• Potentially data-intensive: 10 GB common, >1TB not uncommon
• Data communicated via files
• Shared storage system, or network transfers
12/10/2012 UC-UGM 14
Resource Usage of the Three Workflow Applications
Workflow Specifications
(Space Telescope Science Institute, 2012)
15. Opensource Cloud
• Linux Distro for Cloud Deployment
• Ubuntu Cloud Server
• Proxmox
• OpenStack
• JoliCloud
• PeppermintOS
• CloudLinux
• etc
• Cloud Middleware:
Eucalyptus, OpenStack, Nimbus, OpenNebula, etc.
• Virtual Machine:
XEN, KVM, VirtualBox, etc.
• PaaS
WSO2 Stratos (http://wso2.com/cloud/stratos/), Apache, etc.
12/10/2012 UC-UGM 15
16. Amazon’s EC2
• Automatically distribute incoming application traffic across
multiple Amazon EC2 instances.
• Can detect health of EC2 instances and route traffic
accordingly.
• Elastic balancing support for CloudWatch metrics.
• Elastic IP address: not static not dynamic but elastic; an
IP reserved for your use; disappears once the server is
terminated.
• Drawbacks
• Non-persistent storage
• Persistent Machine Images
• Random IP Addresses
12/10/2012 UC-UGM 16
17. Elastic IP Address
• Use the public DNS name
• Have offsite DNS/load-balancing
• Use a CNAME e.g. www.walkjogrun.net
• Points to public DNS name
• Avoid using no CNAME e.g. http://walkjogrun.net
• Reduce domain TTL
• Use reverse-proxy
12/10/2012 UC-UGM 17
18. Cloud VM Example
Amazon EC2 Linux based VM
• 1.7Ghz x86 processor
• 1.75GB of RAM
• 160GB of local disk
• 250Mb/s of network bandwidth
• $0.10 per hour per machine + bandwidth
In short
• Spin up a basic server for about $2.50 per day / $0.10 per hour
• Install what is necessary
• Use the server
• Optionally save a snapshot
• Shut it down
12/10/2012 UC-UGM 18
19. Automatic Scaling and Load-Balancing (1)
• WSO2 Elastic Load-Balancer (ELB)
• Web Services
• Java Platform-as-a-Service (PaaS)
• Fail-over support
• Enterprise grade
• 100% Opensource
• Amazon EC2 and CloudWatch
• Built as and Part of EC2 services
• Cloud Front service
• IaaS
• Provides monitors for AWS cloud resources.
• Custom metrics support.
• Alarms to take automated action when metric crosses specified
threshold.
• Visual support of metrics in form of graphs and statistical tables.
12/10/2012 UC-UGM 19
20. Automatic Scaling and Load-Balancing (2)
• Load-Balancer as a Service (LBaaS)
• RESTful API for hardware and software LB
• Openstack Tenants’ addon
• Standalone
• Autoscaling currently only by external support
• Cisco’s Application Control Engine (ACE)
• Hardware-based LB
• Highly-Available Proxy (HAProxy)
• Event driven, single process model
• Specific to TCP and HTTP-based applications
• Layer 7 (Applications) processing
12/10/2012 UC-UGM 20
21. Load-Balancer Properties
• “Sticky” sessions – stateful protocols persistence to the same server,
ex. HTTP cookies (session affinity)
• Dynamically adding/removing VMs to LB – autoscaling capabilities in
the resources pool and gracefully terminating when no longer needed
• Health Monitoring and High Availability – able to stops directing traffic
when servers unresponsive
Other desired properties
• SSL offload/acceleration – allows the back-end application to only implement
HTTP and not bear the CPU load of SSL encryption/decryption.
• L7 traffic shaping - performs complex routing decisions depending on level-7
(application-level) protocol parameters
• DoS attack protection - provides a layer of protection against DoS attacks by
both low-level and high-level means, from TCP SYN cookies for protection
against the “SYN flood”-type attacks, up to aggregating access statistics per
IP address or subnet and denying access to suspiciously active ones, etc.
12/10/2012 UC-UGM 21
22. Elastic Cloud for High Performance
Computing (HPC)
On-Demand Autoscaler
• Nimbus
• Provides an infrastructure as a service cloud to its client via WSRF-based
or Amazon EC2 WSDL web service APIs.
• Nimbus supports the XEN hypervisor or KVM and virtual machine
schedulers PBS and SGE. It allows deployment of self-configured virtual
clusters via contextualization. It is configurable with respect to scheduling,
networking leases, and usage accounting.
• OpenNebula
• Managing heterogeneous distributed data center infrastructures to build
private, public and hybrid IaaS (Infrastructure as a Service) clouds.
• Orchestrates storage, network, virtualization, monitoring, and security
technologies to deploy multi-tier services (e.g. compute clusters) as virtual
machines on distributed infrastructures, combining both data center
resources and remote cloud resources, according to allocation policies.
12/10/2012 UC-UGM 22
23. Fronting Elastic Cloud
• Service-aware Load-Balancing
12/10/2012 UC-UGM 23
Port Forwarding to Virtual Ports
http://wso2.org/library/tutorials/2012/08/fronting-wso2-
application-server-cluster-wso2-elastic-load-balancer
24. Elasticity Autoscaling Example
• WSO2 Carbon keeps track of the number of messages in flight to
each Service cluster, and decides whether to scale the system up or
down.
12/10/2012 UC-UGM 24
• WSO2 Carbon will start new
Service member instances,
and once those members
successfully boot up, they will
join the relevant Service
cluster. Then the Load-
balancer will start forwarding
the request to the new
members as well.
• Autoscaling Algorithm in
WSO2-ELB is developed by
Afkham Azeez, and it’s called
"Request in-flight based
autoscaling" algorithm.
http://nirmalfdo.blogspot.com/2012_07_01_archive.html
25. VM Instances Startup Time
• Startup time must considered as overhead
• For example:
• The startup time of Amazon EC2 and the Opensource
Eucalyptus platform in launching 1 and 8 VMs takes an
average of less than 20 seconds for 1 new instance and less
than 25 seconds for 8 new instances.
• For most applications, this level of elasticity is sufficient to
scale up in times of peak demand.
12/10/2012 UC-UGM 25
27. Raw Virtualization Overhead
• For compute-intensive applications, overhead introduced
by virtualization is very similar around 8%
• For the I/O-intensive applications, the virtualization
overhead can be around 60%
12/10/2012 UC-UGM 27
R.S. Montero, R. Moreno-Vozmediano, I.M. Llorente, Journal of Parallel and Distributed Computing, 2010
28. Security Issues In Cloud Computing
12/10/2012 UC-UGM 28
Data
Cloud
Management
Connection
User
Privacy
Policies
Issues related to the
policies that have been
put in place to determine
the legal boundries of
the cloud computing
services
Issues related to
the user
information privacy
Issues related to
the data stored in
the cloud
Issues related to
the cloud
management
procedures
Issues related to
the connection
between the
user and the
cloud
Security
Issues
Security issues and approaches for cloud computing
architectures, Hassan Alghamdi, 2011
29. Data
12/10/2012 UC-UGM 29
Integrity
Availability
Location
Can anyone else modify my data
when it is in the cloud?
Will my data be available 24/7?
Where is my data?
Policies
SLA
Ownership
and
Disclosure
Data
Lock-In
Service Level Agreement.
Definition of Service, Penalties?
Who owns the data and who has
the rights to disclose it?
Will you be able to use another
service from another provider
without any problem?
Data and Policies
30. User Privacy
12/10/2012 UC-UGM 30
Confidentiality
Authorization
Trust
Can anyone else see my personal
data when it is in the cloud?
How does the cloud ensure that
users who access the data are
the one who allowed to access?
Can I trust the provider?
Connection issues
Malicious
Attacks
SQL
Injections
Port
Scanning
Technology improves, its
technicalities get complex,
number of attacks grows.
Technique used by hackers to
gain complete access into the
DB.
Hacker monitors the traffic or a
port to send legitimate messages
so he can get into the system.
Management
Issues
Auditing
Disaster
Recovery
Re-evaluation policies, practices
and data.
What procedures the user and the
provider have to take in case a
service fails?
User Privacy, Connection Issues
and Management Issues
31. Web-Scale for Mobile Example
Services Used
• Amazon S3
• Amazon EC2
Estimated Savings
$650,000
Gumiyo.com
12/10/2012 UC-UGM
31
32. Cloud in the News
• Cloud computing is still in its formative years, but expect
to grow up quickly.
• More revenue will be generated from Cloud computing.
• 45% of companies as a priority will implement mobile
enterprise apps by the end of 2012, which run in the
Cloud.
• Security updates are extremely important within the
current situation of the Cloud environment. E.g. Facebook
sometimes ago announced new security updates and
admits that hackers hit it 600,000 times a day.
• Security group: define the firewall/security. Defines what
can talk to your instances. (Ex; http, and not sftp etc.)
12/10/2012 UC-UGM 32
33. Cloud Outstanding Issues
SLA related issue
• Cloud computing providers should take more responsibility in regard
to the data stored in their databases. e.g. from an Amazon contract:
“7.2: We will have no liability to you for any unauthorized access or
use, corruption, deletion, destruction or loss of any of Your Content or
Applications.” (Silverstone)
Integration issue
• Customers find it difficult to integrate their on-premise services with
cloud-based services.
Security issue (again)
• Security issues vary significantly with the type of cloud computing
model being utilized.
Deployment issue
• Current cloud computing deployments are not suitable for all use
cases.
12/10/2012 UC-UGM 33
34. Comments
• The benefits of cloud computing will not be realized if businesses are
not convinced that it is secure.
• Trust is at the centre of success and providers have to prove
themselves worthy of that trust if hosted services are going to work.
• No fundamental obstacles to making a cloud-computing environment
as secure as the vast majority of in-house IT environments. Many of
the current obstacles can be overcome immediately with well
understood technologies such as encrypted storage, Virtual Local
Area Networks, and network middleboxes (e.g. firewalls, packet
filters).
• Like other utility services, such as electricity, cloud computing can be
secured. To make the cloud secure, security must be built into every
aspect of the cloud starting from its foundation stage.
12/10/2012 UC-UGM 34
35. Summary
• Desired Autoscaling (on-demand) properties
• Services at PaaS or IaaS
• Compatible with commercially available Cloud
• Support session affinity (sticky SSL)
• Preferably works at Level 7 (application) via API
• Random IP address may need further evaluation (depends on DNS server)
• Fast VM instances startup
• Monitors CPU, memory and bandwidth utilization, and SLAs
• Current Cloud in general
• Self-provisioning
• Cheap
• Scalable
• High computation and storage capacity
• Fault tolerant
• Good tool for research and experiment (what universities do best)
• Still working progress
• Security assurances
• Service level guarantee
12/10/2012 UC-UGM 35
36. References
• Chapter 9 of Course Book: Cloud Computing Bible, 2011, Wiley
Publishing Inc.
• http://aws.amazon.com
• An elasticity model for High Throughput Computing clusters, R.S.
Montero, R. Moreno-Vozmediano, I.M. Llorente, Journal of Parallel and
Distributed Computing, Available online 24 May 2010
• Build a scalable architecture with Amazon's EC2, Adam Howitt, 2010
• http://www.mirantis.com/blog/load-balancing-for-openstack-proposing-
lbaas/
• WSO2 Elastic Load Balancer on StratosLive,
http://wso2.com/products/elastic-load-balancer/
• http://wso2.org/library/tutorials/2012/08/fronting-wso2-application-
server-cluster-wso2-elastic-load-balancer
• Security issues and approaches for cloud computing architectures,
Hassan Alghamdi, 2011
12/10/2012 UC-UGM 36
Editor's Notes
often time ‘leave it to the pros’ works
HAProxy implements an event-driven, single-process model which enables support for very high number of simultaneous connections at very high speeds. Multi-process or multi-threaded models can rarely cope with thousands of connections because of memory limits, system scheduler limits, and lock contention everywhere. Event-driven models do not have these problems because implementing all the tasks in user-space allows a finer resource and time management. The down side is that those programs generally don't scale well on multi-processor systems. That's the reason why they must be optimized to get the most work done from every CPU cycle.
HAProxy implements an event-driven, single-process model which enables support for very high number of simultaneous connections at very high speeds. Multi-process or multi-threaded models can rarely cope with thousands of connections because of memory limits, system scheduler limits, and lock contention everywhere. Event-driven models do not have these problems because implementing all the tasks in user-space allows a finer resource and time management. The down side is that those programs generally don't scale well on multi-processor systems. That's the reason why they must be optimized to get the most work done from every CPU cycle.