Introduction to Cloud Computing

  • 8,981 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
8,981
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
505
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to Cloud Computing Marin Dimitrov (technology watch #3) Apr 2010
  • 2. Contents • Introduction • Cloud Computing platforms • Programming for the Cloud • Semantic Web on the Cloud Cloud Computing Apr 2010 #2
  • 3. Contents Part I Introduction Cloud Computing Apr 2010 #3
  • 4. Cloud Computing - NIST definition • “Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” • Delivery models – IaaS (Infrastructure as a Service) - the consumer uses "fundamental resources" such as processing power, storage, networking components or middleware. The consumer can control the operating system, storage, applications and possibly networking – PaaS (Platform as a Service) - the consumer uses a hosting environment for their applications and has control over the applications (and some control over the hosting environment), but does not control the infrastructure on which they are running – SaaS (Software as a Service) - the consumer uses an application, but does not control the infrastructure on which it's running (OS, hardware) Cloud Computing Apr 2010 #4
  • 5. XaaS spectrum – Google, Amazon, Microsoft • Elastic Map Reduce • Gmail SaaS • Google apps • SimpleDB • App Engine • SQL Azure PaaS • Relational DataStore • BigTable / MegaStore • Flexible Payment Service • EC2 • Google Storage • Blob storage IaaS • Simple Queue Service • Azure Computing • Simple Notification Service • Queues • Elastic Block Storage • Load Balancer • S3 / RRS • CloudWatch / Auto Scaling • Elastic Load Balancer Cloud Computing Apr 2010 #5
  • 6. Cloud Computing - Essential characteristics (NIST) • Rapid elasticity – the ability to scale resources both up and down as needed. To the consumer, the Cloud appears to be infinite, and the consumer can purchase as much / little computing power as they need • Measured service – aspects of the Cloud service are controlled and monitored by the Cloud provider. This is crucial for billing, access control, resource optimization & capacity planning • On-demand self service – a consumer can use cloud services as needed without any human interaction with the cloud provider • Ubiquitous network access – the Cloud provider’s capabilities are available over the network and can be accessed through standard mechanisms • Resource Pooling – allows a Cloud provider to serve its consumers via a multi-tenant model - resources are (re)assigned according to consumer demand. Cloud Computing Apr 2010 #6
  • 7. Cloud Computing - deployment models (NIST) • Public cloud – Infrastructure owned by some organisation but sold to 3rd parties – E.g. Amazon Web Services, Google AppEngine, Windows Azure • Private cloud – Internal infrastructure for a single organisation (on or off-premise) – E.g. VMware vCloud, IBM Cloudburst, Microsoft Hyper-V • Community cloud – Infrastructure shared by several organisations, targeting a specific community – E.g. OpenCirrus (HP, Intel, Yahoo, KIT, CMU, …) • Hybrid cloud – Composition of the above – E.g. AWS Virtual Private Cloud Cloud Computing Apr 2010 #7
  • 8. Cloud computing – business drivers 1. Business agility – Faster time to market • No major upfront commitment & investment in infrastructure – Scalability & elasticity • Instant on-demand provisioning • Shifting the risk of over-/under-provisioning to the cloud provider 2. Focus – Outsource non-core tasks to the cloud provider 3. Pay-as-you-go – Speed up new project launching & rollout (start small, add resources when needed) – No need for complex planning ahead – Turn fixed costs (CapEx) into variable costs (OpEx) Cloud Computing Apr 2010 #8
  • 9. Some cloud use cases • Overflow buffer – Avoid over-provisioning for peak loads, but just for the average load • Seasonal business – E.g. Wallmart has 4:1 peak-to-average ratio (source?) • Small startups time-to-market – Less upfront investment, more focus on core competencies • Experimental playground – Rollout experimental projects without major equipment purchases • Speedup of large scale batch operations – 1000 servers for 1 hour cost the same as 1 server for 1000 hours – More cost-efficient computing (off-peak tariffs & time zones) • Unforeseeable events – E.g. sudden traffic spikes to web sites (volcanoes, anyone?) 2010 Cloud Computing Apr #9
  • 10. Cloud-able applications • Typical characteristics – Non mission critical – Need >99% uptime – Low bandwidth / higher latency tolerance – Relaxed security requirements – Few integration points – E.g • Batch operations (speedup at the same price!) • One-time large scale processing • Barriers to cloud migration – Security & trust – Lack of SLA – Lack of standardization (vendor lock-in) Cloud Computing Apr 2010 #10
  • 11. Cloud Computing – pros & cons (C) Dion Hinchcliffe Cloud Computing Apr 2010 #11
  • 12. Contents Part II Cloud Computing Platforms AWS, Google AppEngine, Windows Azure Cloud Computing Apr 2010 #12
  • 13. XaaS spectrum – Google, Amazon, Microsoft (again) • Elastic Map Reduce • Gmail SaaS • Google apps • SimpleDB • App Engine • SQL Azure PaaS • Relational Database Service • BigTable / MegaStore • Flexible Payment Service • EC2 • Google Storage • Blob storage IaaS • Simple Queue Service • Azure Computing • Simple Notification Service • Queues • Elastic Block Storage • Load Balancer • S3 / RRS • CloudWatch / Auto Scaling • Elastic Load Balancer • Virtual Private Cloud Cloud Computing Apr 2010 #13
  • 14. Amazon Web Services • http://aws.amazon.com/ • Xen VMs, 1 ECU = 1.2GHz AMD Opteron, US/EU prices EC2 instance RAM CU* HDD bit $/h on $/h $/h GB (Cores) GB demand Spot reserved S 1.7 1 (1) 160 32 0.085 0.03 0.03 L 7.5 4 (2) 850 64 0.34 0.13 0.12 XL 15 8 (4) 1690 64 0.68 0.24 0.24 High-mem XL 17.1 6.5 (2) 420 64 0.50 0.18 0.17 High-mem 2XL 34.2 13 (4) 850 64 1.20 0.43 0.42 High-mem 4XL 68.4 26 (8) 1690 64 2.40 0.82 0.84 High-CPU M 1.7 5 (2) 350 32 0.17 0.06 0.06 High-CPU XL 7 20 (8) 1690 64 0.68 0.24 0.24 Cloud Computing Apr 2010 #14
  • 15. Amazon Web Services (2) • Simple Storage Service (S3) – Eventually consistent blob storage (SLA available) – Max 5GB per object, REST+SOAP API – Storage $0.15/GB/mo, transfer $0.15/GB, $0.10 per 100K API calls • Elastic Compute Cloud (EC2) – Xen VM, Amazon Machine Image (AMI), no SLA • Elastic Block Storage (EBS) – Up to 1TB storage to be used by EC2 instances (attached devices) – Raw/unformatted block devices (create your own filesystem on top) – Replicated – $0.10/GB/mo, $0.10 per 1 million I/O ops (iostat) Cloud Computing Apr 2010 #15
  • 16. Amazon Web Services (3) • Simple Queue Service – Persistent, reliable, secure, distributed queue (no SLA) – Message size 8KB, autodelete 4 days – duplicate and out-of-order delivery may occur – Price: $0.15/GB transfer, $0.10 per 100K API calls • Simple Notification Service – Reliable, secure & scalable pub/sub service (no SLA) – Protocols: HTTP, e-mail, SQS – Price: $0.15/GB transfer, $0.06 per 100K API calls, price per 100K notifications: $0.06 (HTTP), $2.00 (e-mail), free (SQS) • SimpleDB – Distributed column store (built on Erlang) – Consistent or eventually consistent reads, flexible schema – $0.14/hour consumed, $0.15/GB transfer, $0.25/GB/mo storage Cloud Computing Apr 2010 #16
  • 17. Amazon Web Services (4) • Relational Database Service – MySQL (no SLA) – Automated backup and scaling – $0.11 to $3.10 per hour (instance type), $0.10/GB/mo storage, $0.10 per million I/O ops, $0.15/GB transfer • Elastic MapReduce – Based on Hadoop – Price: EC2 instance price + premium ($0.01 - $0.42/hour) • CloudWatch, Auto Scaling, Elastic Load Balancer – Monitoring, auto scaling & load balancing for EC2 • Virtual Private Cloud Cloud Computing Apr 2010 #17
  • 18. Google AppEngine • http://code.google.com/appengine/ • Features – custom JVM (lots of limitations) – servlet container, JSP – Datastore based on BigTable (column store, consistent, C+P) – JDO/JPA – Google infrastructure services: URL fetch, mail – Memcache (in-memory distributed key/value cache) – Task queues & scheduler – Development: local dev server, Eclipse plugins, administration • Pricing – traffic/GB $0.10 ($0.12); CPU/h $0.10; storage/GB/mo $0.15; e-mail $1 per 10K Cloud Computing Apr 2010 #18
  • 19. Google AppEngine (2) (C) Dan Sanderson / O’Reilly Cloud Computing Apr 2010 #19
  • 20. Google AppEngine (3) • Restrictions – Applications run in a restricted JVM sandbox • No threads, no System calls, limited reflection – No sub-process forking – Connections • Outbound – only URL fetch & mail • Inbound – only HTTP(S) – No filesystem writes (limited read access), use datastore instead – Limits • Request duration – 30 sec • Request/response size – 10 MB (datastore request/response – 1MB) • file size – 10 MB, number of files – 3,000 • Datastore: entity size – 1 MB, property values – 1000, entities per batch - 500 Cloud Computing Apr 2010 #20
  • 21. Google AppEngine (4) • Datastore – Based on BigTable, distributed column-store • Entities and multi-valued properties • Entities have unique key & a type (kind) • Flexible schema Select from Person where lastName = … – Transactional, consistent && height < … – JDO/JPA interface order by height desc • Queries – JDOQL: entity kind + property value restrictions + sort order – Cursors can be specified (query range) – query resultset is materialised in a predefined index • query execution only fetches data from the existing index • queries with same kind + property restriction operator (but different value filler) + same sort order share the same index Cloud Computing Apr 2010 #21
  • 22. Windows Azure • http://www.microsoft.com/windowsazure/ • Components – Windows Azure • Fabric – management & monitoring of cloud services (Hyper-V) • Compute – hosted applications (.net, c++, java, …) • Storage – blob storage, tables, queues (REST interface) – SQL Azure • Cloud based MS SQL Server – AppFabric • Infrastructure services, Service registry • Access control • Pricing – CPU/h $0.12; storage $0.15/GB/mo, transfer $0.10 ($0.15), storage transactions – $1 per 1 million Cloud Computing Apr 2010 #22
  • 23. Windows Azure (2) (C) David Chapell Cloud Computing Apr 2010 #23
  • 24. Contents Part III Programming for the Cloud Tools & APIs Cloud Computing Apr 2010 #24
  • 25. Programming for the Cloud • Amazon – REST API – AWS Java SDK (http://aws.amazon.com/sdkforjava/) – AWS Toolkit for Eclipse (http://aws.amazon.com/eclipse) – Typica (http://code.google.com/p/typica/) – JetS3t (S3 only) http://jets3t.s3.amazonaws.com/index.html • Google AppEngine – AppEngine SDK (dev server, admin tools, Eclipse plugins) – Datastore: JDO, JPA, low-level Java API – Memcache: JCache + low level Java API – URL fetch: java.net + low level Java API – Mail: java.mail + low level Java API – Task queue, blob store, accounts: low level APIs Cloud Computing Apr 2010 #25
  • 26. Programming for the Cloud (2) • jClouds – http://code.google.com/p/jclouds/ – Cloud interoperability framework (AWS, Google AppEngine*, Windows Azure, GoGrid) – Mostly storage oriented functionality • Eucalyptus – http://www.eucalyptus.com/ – Open source private cloud infrastructure – AWS compatible (EC2, EBS, S3) (C) Eucalyptus Inc. – Cross-hypervisor support Cloud Computing Apr 2010 #26
  • 27. Don’t forget… • Deploying on EC2 requires minimal to no modifications of existing software • EC2 has some big machines: 70GB RAM / 8 CPU cores • 1,000 servers for 1hr cost the same as 1 server for 1,000hrs • Data traffic (in/out) of the Cloud can be expensive • Storage relatively cheap • Internal cloud traffic is free (AWS), e.g. accessing other applications/datasets on the Cloud • CPU price: uptime (EC2) vs. computing cycles (AppEngine) • EC2 spot instances (off-peak hours) are very, very cheap! Cloud Computing Apr 2010 #27
  • 28. Contents Part IV Semantic Web on the Cloud Cloud Computing Apr 2010 #28
  • 29. Semantic Web on the Cloud • Public Data Sets on AWS – A lot of datasets hosted for free by Amazon • Freebase, UniGene, US Census, … – New data sets can be submitted too (after approval) – Full LOD cloud still not available (due to licensing issues) • SaaS – Virtuoso (AWS hosted), OpenCalais, … • “Semantic Cloud” initiatives (cloud interoperability & data integration) – E.g. fluidOps - Management & provisioning of semantic applications (SaaS) and datasources (DaaS) on the Cloud • Semantic Web apps as virtual appliances on the Cloud • LOD data sources as virtual resources on the Cloud (“Self-service” paradigm) Cloud Computing Apr 2010 #29
  • 30. Unified Cloud Computing • http://code.google.com/p/unifiedcloud/ • Uses RDF for cloud data interoperability Cloud Computing Apr 2010 #30
  • 31. Useful and useless links • http://groups.google.com/group/cloud-computing • “An Essential Guide to Possibilities and Risks of Cloud Computing” • “Talking To Your CFO About Cloud Computing” • Nick Carr @ Atmosphere’2009 • Introducing the Windows Azure platform Cloud Computing Apr 2010 #31
  • 32. Q&A Questions? Cloud Computing Apr 2010 #32