An Overview of Cloud Computing:
My Other Computer is a Data Center
Robert Grossman
Open Data Group &
University of Illinoi...
Part 1
What is a Cloud?
2
What is a Cloud?
3
Software as a Service
What Else is a Cloud?
4
Platform as a Service
Is Anything Else a Cloud?
5
Infrastructure as a Service
Are There Other Types of Clouds?
6
Large Data Cloud Services
ad targeting
One Definition
 Clouds provide on-demand resources or
services over a network, often the Internet,
with the scale and rel...
8
Scale is new.
Elastic, Usage Based Pricing Is New
9
1 computer in a rack
for 120 hours
120 computers in three
racks for 1 hour
costs the...
Simplicity Offered By the Cloud is New
10
+ .. and you have a computer
ready to work.
A new programmer can develop a
progr...
Part 2
Varieties of Clouds
11
Varieties of Clouds
 Architectural Model
– On-demand computing instances
vs large data cloud services
 Payment Model
– E...
Architectural Models:
How Do You Fill a Data Center?
Cloud Storage Services
Cloud Compute Services
(MapReduce & Generaliza...
Payment Models
 Buying racks, containers and data centers
 Leasing racks containers and data centers
 Utility based com...
Management Models
 Public, private and hybrid models
 Single tenant vs multiple tenant (shared vs
non-shared hardware)
...
Programming Models
 Amazon’s Simple
Queue Service
 MPI, sockets, FIFO
16
 MapReduce
 Distributed UDF
on-demand
computi...
Part 3. Cloud Computing Industry
 “Cloud computing has become the center of
investment and innovation.”
Nicholas Carr, 20...
IaaS, PaaS and SaaS Point of View
SaaS
PaaS
IaaS
Infrastructure as a Service
PRODUCT: Compute power, storage
and networkin...
Building Data Centers
 Sun’s Modular
Data Center (MD)
 Formerly Project
Blackbox
 Containers used by
Google, Microsoft
...
Data Center Operating Systems
 Data center services include: VM management
services, business continuity services, securi...
Berkeley View of Cloud Computing
21
Providers of Cloud Services
Consumers of Cloud Services
Providers of Software as a Ser...
Transition Taking Place
 A hand full of players are building multiple data
centers a year and improving with each one.
 ...
Mindmeister Map of Cloud Computing
 Dupont’s Mindmeister Map divides the industry:
– IaaS, PaaS, Management, Community
 ...
Part 4
Virtualization
24
Virtualization
 Virtualization separates logical infrastructure
from the underlying physical resources to
decrease time t...
Idea Dates Back to the 1960s
26
IBM Mainframe
IBM VM/370
CMS
App
Native (Full) Virtualization
Examples: Vmware ESX
MVS
App...
Two Types of Virtualization
 Using the hypervisor, each guest OS sees its own
independent copy of the CPU, memory, IO, et...
Four Key Properties
1. Partitioning: run multiple VMs on one
physical server; one VM doesn’t know about
the others
2. Isol...
Managing Virtual Machines
 Provision VM
 Schedule VM
 Monitor VM
 Self-service portal for VM
29
Large Data Clouds
30
Part 5
The Google Data Stack
 The Google File System (2003)
 MapReduce: Simplified Data Processing… (2004)
 BigTable: A Distri...
Map-Reduce Example
 Input is file with one document per record
 User specifies map function
– key = document URL
– Value...
Example (cont’d)
 MapReduce library gathers together all pairs
with the same key value (shuffle/sort phase)
 The user-de...
Generalization: Apply User Defined
Functions (UDF) to Files in Storage Cloud
34
map/shuffle reduce
UDFUDF
Google’s Layered Cloud Services
Storage Services
Table Services
Compute Services
35
Google’s Stack
Applications
Google Fil...
Hadoop’s Layered Cloud Services
Storage Services
Table Services
Compute Services
36
Hadoop’s Stack
Applications
Hadoop Dis...
Sector’s Layered Cloud Services
Storage Services
Table Services
Compute Services
37
Sector’s Stack
Applications
Sector’s D...
Hadoop & Sector
Hadoop Sector
Storage Cloud Block-based file
system
File-based
Programming
Model
MapReduce UDF &
MapReduce...
MalStone Benchmark
 Benchmark developed by Open Cloud
Consortium for clouds supporting data
intensive computing.
 Code t...
MalStone B
time
40
dk-2 dk-1 dk
sites entities
MalStone B Benchmark
41
MalStone B
Hadoop v0.18.3 799 min
Hadoop Streaming v0.18.3 142 min
Sector v1.19 44 min
# Nodes 20 ...
Trading Functionality for Scalability
Databases Data Clouds
Scalability 100’s TB 100’s PB
Functionalit
y
Full SQL-based qu...
Not Everyone Agrees
 David J. DeWitt and Michael Stonebraker,
MapReduce: A Major Step Backwards,
Database Column, Jane 17...
Part 6. Standards Efforts
44
Change of gauge at Ussuriisk (near
Vladivostok) at the Chinese –Russian border
Train gauge
in...
Standards Efforts for Clouds
 Cloud Computing Interoperability Forum (CCIF)
 Open Cloud Consortium (OCC)
 Open Grid For...
www.opencloudconsortium.org
1. Supports the development of standards.
2. Supports reference implementations for
cloud comp...
Activities Currently Focused Around
Five Use Cases
1. Moving an existing cloud application from Cloud
1 to Cloud 2 without...
Large Data Cloud Use Cases
3. Moving a large data cloud application from
one large data cloud storage service to
another.
...
Inter-Cloud Use Case
5. Inter-cloud communication between two
HIPAA compliant clouds.
Cloud 1 Cloud 2
OCC Welcomes New Members
 Companies and organizations are welcome to
join the Open Cloud Consortium (OCC)
www.opencloudco...
For More Information
 Contact information:
Robert Grossman
rlg@opendatagroup.com
blog.rgrossman.com
 Web sites
– www.ope...
Upcoming SlideShare
Loading in …5
×

An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)

5,104 views

Published on

An introduction to cloud computing given at a IEEE sponsored event on August 6, 2009 in Philadelphia.

Published in: Technology, Education

An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)

  1. 1. An Overview of Cloud Computing: My Other Computer is a Data Center Robert Grossman Open Data Group & University of Illinois at Chicago IEEE New Technologies Conference August 6, 2009
  2. 2. Part 1 What is a Cloud? 2
  3. 3. What is a Cloud? 3 Software as a Service
  4. 4. What Else is a Cloud? 4 Platform as a Service
  5. 5. Is Anything Else a Cloud? 5 Infrastructure as a Service
  6. 6. Are There Other Types of Clouds? 6 Large Data Cloud Services ad targeting
  7. 7. One Definition  Clouds provide on-demand resources or services over a network, often the Internet, with the scale and reliability of a data center.  No standard definition.  Cloud architectures are not new.  What is new: – Scale – Ease of use – Pricing model. 7
  8. 8. 8 Scale is new.
  9. 9. Elastic, Usage Based Pricing Is New 9 1 computer in a rack for 120 hours 120 computers in three racks for 1 hour costs the same as  Elastic, usage based pricing turns capex into opex.  Clouds can be used to manage surges in computing needs.
  10. 10. Simplicity Offered By the Cloud is New 10 + .. and you have a computer ready to work. A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
  11. 11. Part 2 Varieties of Clouds 11
  12. 12. Varieties of Clouds  Architectural Model – On-demand computing instances vs large data cloud services  Payment Model – Elastic, usage based pricing, lease/own, …  Management Model – Private vs Public; Single vs Multiple Tenant; …  Programming Model – Queue Service, MPI, MapReduce, Distributed UDF 12 Computing instances vs large data cloud services Private internal vs public external Elastic, usage- based pricing or not All combinations occur.
  13. 13. Architectural Models: How Do You Fill a Data Center? Cloud Storage Services Cloud Compute Services (MapReduce & Generalizations) Cloud Data Services (BigTable, etc.) Quasi-relational Data Services App App App App App App App App App large data cloud services App App App … on-demand computing instances
  14. 14. Payment Models  Buying racks, containers and data centers  Leasing racks containers and data centers  Utility based computing (pay as you go) – Moves cap ex to op ex – Handle surge requirements (use 1000 servers for 1 hour vs 1 server for 1000 hours) 14
  15. 15. Management Models  Public, private and hybrid models  Single tenant vs multiple tenant (shared vs non-shared hardware)  Owned vs leased  Manage yourself vs outsource management  All combinations are possible 15
  16. 16. Programming Models  Amazon’s Simple Queue Service  MPI, sockets, FIFO 16  MapReduce  Distributed UDF on-demand computing instances large data cloud services  DryadLINQ  Azure services
  17. 17. Part 3. Cloud Computing Industry  “Cloud computing has become the center of investment and innovation.” Nicholas Carr, 2009 IDC Directions 17 Cloud computing is approaching the top of the Gartner hype cycle.
  18. 18. IaaS, PaaS and SaaS Point of View SaaS PaaS IaaS Infrastructure as a Service PRODUCT: Compute power, storage and networking infrastructure over the internet, provided as a virtual machine image USERS: Developers Platform as a Service PRODUCT: storage, compute and other services to simplify application development, especially of web applications. USERS: Application Developers Software as a Service PRODUCT: Finished application available on demand to end user USERS: Software consumer
  19. 19. Building Data Centers  Sun’s Modular Data Center (MD)  Formerly Project Blackbox  Containers used by Google, Microsoft & others  Data center consists of 10-60+ containers. 19
  20. 20. Data Center Operating Systems  Data center services include: VM management services, business continuity services, security services, power management services, etc. 20 workstatio n VM 1 VM 5 … VM 1 VM 50,000 … Data Center Operating System
  21. 21. Berkeley View of Cloud Computing 21 Providers of Cloud Services Consumers of Cloud Services Providers of Software as a Service Consumers of Software as a Service  Berkeley Report on cloud computing divides industry into these layers & concentrates on public clouds. Data Centers
  22. 22. Transition Taking Place  A hand full of players are building multiple data centers a year and improving with each one.  This includes Google, Microsoft, Yahoo, …  A data center today costs $200 M – $400+ M  Berkeley RAD Report points out analogy with semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B 22
  23. 23. Mindmeister Map of Cloud Computing  Dupont’s Mindmeister Map divides the industry: – IaaS, PaaS, Management, Community  http://www.mindmeister.com/maps/show_public/15936058 23
  24. 24. Part 4 Virtualization 24
  25. 25. Virtualization  Virtualization separates logical infrastructure from the underlying physical resources to decrease time to make changes, improve flexibility, improve utilization and reduce costs  Example - server virtualization. Use one physical server to support multiple logical virtual machines (VMs), which are sometimes called logical partitions.  Technology pioneered by IBM in 1960s to better utilize mainframes 25
  26. 26. Idea Dates Back to the 1960s 26 IBM Mainframe IBM VM/370 CMS App Native (Full) Virtualization Examples: Vmware ESX MVS App CMS App
  27. 27. Two Types of Virtualization  Using the hypervisor, each guest OS sees its own independent copy of the CPU, memory, IO, etc. 27 Physical Hardware Hyperviser Unmodified Guest OS 1 Unmodified Guest OS 2 Native (Full) Virtualization Examples: Vmware ESX Apps Physical Hardware Hyperviser Modified Guest OS 1 Modified Guest OS 2 Para Virtualization Examples: Xen Apps
  28. 28. Four Key Properties 1. Partitioning: run multiple VMs on one physical server; one VM doesn’t know about the others 2. Isolation: security isolation is at the hardware level. 3. Encapsulation: entire state of the machine can be copied to files and moved around 4. Hardware abstraction: provision and migrate VM to another server 28
  29. 29. Managing Virtual Machines  Provision VM  Schedule VM  Monitor VM  Self-service portal for VM 29
  30. 30. Large Data Clouds 30 Part 5
  31. 31. The Google Data Stack  The Google File System (2003)  MapReduce: Simplified Data Processing… (2004)  BigTable: A Distributed Storage System… (2006) 31
  32. 32. Map-Reduce Example  Input is file with one document per record  User specifies map function – key = document URL – Value = terms that document contains (“doc cdickens”, “it was the best of times”) “it”, 1 “was”, 1 “the”, 1 “best”, 1 map
  33. 33. Example (cont’d)  MapReduce library gathers together all pairs with the same key value (shuffle/sort phase)  The user-defined reduce function combines all the values associated with the same key key = “it” values = 1, 1 key = “was” values = 1, 1 key = “best” values = 1 key = “worst” values = 1 “it”, 2 “was”, 2 “best”, 1 “worst”, 1reduce
  34. 34. Generalization: Apply User Defined Functions (UDF) to Files in Storage Cloud 34 map/shuffle reduce UDFUDF
  35. 35. Google’s Layered Cloud Services Storage Services Table Services Compute Services 35 Google’s Stack Applications Google File System (GFS) Google’s MapReduce Google’s BigTable
  36. 36. Hadoop’s Layered Cloud Services Storage Services Table Services Compute Services 36 Hadoop’s Stack Applications Hadoop Distributed File System (HDFS) Hadoop’s MapReduce
  37. 37. Sector’s Layered Cloud Services Storage Services Table Services Compute Services 37 Sector’s Stack Applications Sector’s Distributed File System (SDFS) Sphere’s UDF Routing & Transport Services UDP-based Data Transport Protocol (UDT)
  38. 38. Hadoop & Sector Hadoop Sector Storage Cloud Block-based file system File-based Programming Model MapReduce UDF & MapReduce Protocol TCP UDP-based protocol (UDT) Replication At time of writing Periodically Security Not yet HIPAA capable Language Java C++ 38
  39. 39. MalStone Benchmark  Benchmark developed by Open Cloud Consortium for clouds supporting data intensive computing.  Code to generate synthetic data required is available from code.google.com/p/malgen  Stylized analytic computation that is easy to implement in MapReduce and its generalizations. 39
  40. 40. MalStone B time 40 dk-2 dk-1 dk sites entities
  41. 41. MalStone B Benchmark 41 MalStone B Hadoop v0.18.3 799 min Hadoop Streaming v0.18.3 142 min Sector v1.19 44 min # Nodes 20 nodes # Records 10 Billion Size of Dataset 1 TB
  42. 42. Trading Functionality for Scalability Databases Data Clouds Scalability 100’s TB 100’s PB Functionalit y Full SQL-based queries, including joins Optimized access to sorted tables (tables with single keys) Optimized Databases are optimized for safe writes Clouds optimized for efficient reads Consistency model ACID (Atomicity, Consistency, Isolation & Durability) – database always consist Eventual consistency – updates eventually propagate through system Parallelism Difficult because of ACID model; shared nothing is possible (Graywolf) Basic design incorporates parallelism over commodity components Scale Racks Data center 42
  43. 43. Not Everyone Agrees  David J. DeWitt and Michael Stonebraker, MapReduce: A Major Step Backwards, Database Column, Jane 17, 2008 43
  44. 44. Part 6. Standards Efforts 44 Change of gauge at Ussuriisk (near Vladivostok) at the Chinese –Russian border Train gauge in China is 1435 mm Train gauge in Russia is 1520 mm How can a cloud application move from one cloud storage service to another?
  45. 45. Standards Efforts for Clouds  Cloud Computing Interoperability Forum (CCIF)  Open Cloud Consortium (OCC)  Open Grid Forum (OGF)  Distributed Management Task Force (DMTF)  Storage Network Industrial Association (SNIA)  Plus several others… 45
  46. 46. www.opencloudconsortium.org 1. Supports the development of standards. 2. Supports reference implementations for cloud computing, preferably open source. 3. Manages a testbed for cloud computing called the Open Cloud Testbed. 4. Supports the development of benchmarks. 5. Sponsors workshops and other events related to cloud computing. 46
  47. 47. Activities Currently Focused Around Five Use Cases 1. Moving an existing cloud application from Cloud 1 to Cloud 2 without changing the application. 2. Providing surge capacity for an application on Cloud 1 using any of the Clouds 2, 3, … (without changing the application). Cloud 1 Cloud 2 1. Migrate / port 2. Surge / burst
  48. 48. Large Data Cloud Use Cases 3. Moving a large data cloud application from one large data cloud storage service to another. 4. Moving a large data cloud application from one large data cloud compute service to another. Large Data Cloud Storage Services Large Data Cloud Compute Services App 1 App 2
  49. 49. Inter-Cloud Use Case 5. Inter-cloud communication between two HIPAA compliant clouds. Cloud 1 Cloud 2
  50. 50. OCC Welcomes New Members  Companies and organizations are welcome to join the Open Cloud Consortium (OCC) www.opencloudconsortium.org/membership.html  Join one of our working groups – Large Data Clouds Working Group – Standard Cloud Performance Measurement (SCPM) Working Group – Information Sharing & Security Working Group
  51. 51. For More Information  Contact information: Robert Grossman rlg@opendatagroup.com blog.rgrossman.com  Web sites – www.opendatagroup.com – www.ncdm.uic.edu – www.opencloudconsortium.org 51

×