3. Cloud Computing (Cont.)
Variety of services available over Internet that
deliver compute functionality on service
provider’s infrastructure
Umbrella term
Computing as a utility
Pay as you go model
3
Source: www.free-power-point-templates.com/articles/best-
cloud-computing-powerpoint-templates/
4. Cloud Computing Characteristics
Massive scale
Rapid elasticity
Resource pool
Virtualization
On demand
Resilient computing
Broad network access
Service orientation
Geographic distribution
Homogeneity 4
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
6. Cloud Computing – Pros & Cons (Cont.)
Reduced cost
5.7 times reduction in storage costs
7.1 times reduction in administrative costs
7.3 times reduction in networking costs
No upfront investment
Better performance
Rapid scalability
Access to latest version
Global distribution
Device independent
More secure than having your own server rack
6Source – Green Cloud Computing by Dr. Rajkumar Buyya
7. Cloud Computing – Pros & Cons (Cont.)
Cons
Need high-bandwidth links
Lower control & security concerns
Low performance
Web-based applications aren’t the fastest
Interoperability
Deployment specific software
7
8. Cloud Computing – Levels
Cloud Computing =
Software as a Service
+ Platform as a Service
+ Infrastructure as a Service
+ Data as a Service
8
9. Software as a Service (SaaS)
Examples
Google apps, O365, Salesforce.com (CRM)
Pros
Availability
When & where you need them
Cost reduction
No up front costs
Access to the latest version
Cons
Lack of control
Lower customizability
9
10. Platform as a Service (PaaS)
Examples
Google app engine, Windows Azure, Heroku
Pros
Rapid development
Better control
Cost reduction
Access to latest version
Cons
Relatively lower customizability
10
11. Infrastructure as a Service (IaaS)
Examples
Amazon, Rackspace, Akamai, SLT
Pros
Better control
High customizability
Cons
Administration overhead
High upfront cost, if application is built using
commercial software/OS
11
14. Design Factors for WSC
Cost-performance
Small savings add up
Energy efficiency
Affects power distribution & cooling
Work per joule
Operational costs count
Power consumption is a primary, constraint when
designing a system
Dependability via redundancy
Many low-cost components
14
15. Design Factors (Cont.)
Network I/O
Interactive & batch processing workloads
Web search – interactive
Web indexing – batch
Ample computational parallelism isn’t important
Most jobs are totally independent, “Request-level
parallelism”
Scale – Its opportunities & problems
Can afford to build customized systems as WSC
require volume purchase
Frequent failures
15
16. Programming Models & Workloads
Batch processing framework
– MapReduce
Map
Applies a programmer-
supplied function to each
logical input record
Runs on thousands of
computers
Provides new set of (key,
value) pairs as intermediate
values
Reduce
Collapses values using
another function 16
Source:
www.cbsolution.net/techniques/ontarget/
mapreduce_vs_data_warehouse
17. Divide & Conquer
17
“Work”
w1 w2 w3
r1 r2 r3
“Result”
“worker” “worker” “worker”
Partition
Combine
Source – “What is Cloud Computing? (and an intro to parallel/distributed
processing) “by Jimmy Lin, The iSchool, University of Maryland
18. Map-Reduce (Contd.)
Map-reduce support is provided by a function
like following
Y map-reduce(mapfn, reducefn, List<X>)
Map reduce implementation takes list of inputs
(list) & does following:
Apply map function to each entry in the list, which
emit (key, value) pairs
Collect results, group them by keys, & then pass them
to reduce function as an array
18
20. Applications of Map-Reduce
Frequency distribution of word occurrences
Building inverted index of a search engine
Sorting
Stitch Imagery
Google maps
Data clustering
Data analytics & business intelligence
20
21. Map-Reduce for Word Counting
21
Source: http://xiaochongzhang.me/blog/?p=338
How to do this for a large dataset using a distributed system?
22. Example – Word Count
Map(docId, text):
for all terms t in text
emit(t, 1);
Reduce(t, values[])
int sum = 0;
for all values v
sum += v;
emit(t, sum);
22
23. In Class Activity
1. Identify missing card(s)
2. Card sorting
3. Card sorting with 2 rounds
23
Inspired by Marcio Silva's “The MapReduce Card Game” at
http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
24. Why Map-Reduce?
Implementing same pattern in a distributed
system isn’t that easy
Need to worry about communication, failures,
initialization, etc.
MapReduce frameworks worry about all those
You write map & reduce functions & call
framework
It forces you to think parallel in design time
It gives you a higher-level of abstraction to think in
It’s very generic, & covers lot of usecases
See http://wiki.apache.org/hadoop/PoweredBy
24
33. Xen vs. KVM
33
Source: http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen/
34. Challenges
Getting large volume of data in/out
Bandwidth aggregation
Lack/lower QoS
SLAs are too simplistic
Deployment times are in 10s of seconds to
minutes
Distributed cloud
Lack of control
Security, privacy, & ownership concerns
Policy issues
34
Editor's Notes
DaaS examples - Urban Mapping, a geography data service, AWS data (Genome data, US Census, corpus of web crawl data)
S3 - Simple Storage Service
EC2 - Elastic Compute Cloud
KVM - Kernel-based Virtual Machine
QEMU - Quick Emulator
Requires a processor with hardware virtualization extensions