Introduction to Cloud Computing

            Marin Dimitrov
         (technology watch #3)


               Apr 2010
Contents

• Introduction
• Cloud Computing platforms
• Programming for the Cloud
• Semantic Web on the Cloud




         ...
Contents



       Part I

Introduction




   Cloud Computing   Apr 2010   #3
Cloud Computing - NIST definition

• “Cloud computing is a model for enabling ubiquitous, convenient, on-
  demand network...
XaaS spectrum – Google, Amazon, Microsoft


        • Elastic Map Reduce              • Gmail
SaaS                        ...
Cloud Computing - Essential characteristics
                        (NIST)
• Rapid elasticity – the ability to scale resou...
Cloud Computing - deployment models (NIST)

• Public cloud
   – Infrastructure owned by some organisation but sold to 3rd ...
Cloud computing – business drivers

1. Business agility
   –   Faster time to market
       •   No major upfront commitmen...
Some cloud use cases

• Overflow buffer
   – Avoid over-provisioning for peak loads, but just for the average load
• Seaso...
Cloud-able applications

• Typical characteristics
   –   Non mission critical
   –   Need >99% uptime
   –   Low bandwidt...
Cloud Computing – pros & cons




                            (C) Dion Hinchcliffe



          Cloud Computing           ...
Contents



                Part II

     Cloud Computing
         Platforms
AWS, Google AppEngine, Windows Azure



     ...
XaaS spectrum – Google, Amazon, Microsoft
                 (again)

        • Elastic Map Reduce              • Gmail
SaaS...
Amazon Web Services

• http://aws.amazon.com/
• Xen VMs, 1 ECU = 1.2GHz AMD Opteron, US/EU prices
EC2 instance   RAM    CU...
Amazon Web Services (2)

• Simple Storage Service (S3)
   – Eventually consistent blob storage (SLA available)
   – Max 5G...
Amazon Web Services (3)

• Simple Queue Service
   –   Persistent, reliable, secure, distributed queue (no SLA)
   –   Mes...
Amazon Web Services (4)

• Relational Database Service
   – MySQL (no SLA)
   – Automated backup and scaling
   – $0.11 to...
Google AppEngine

• http://code.google.com/appengine/
• Features
   –   custom JVM (lots of limitations)
   –   servlet co...
Google AppEngine (2)




                        (C) Dan Sanderson / O’Reilly




      Cloud Computing                   ...
Google AppEngine (3)

• Restrictions
   – Applications run in a restricted JVM sandbox
       • No threads, no System call...
Google AppEngine (4)

• Datastore
   – Based on BigTable, distributed column-store
       • Entities and multi-valued prop...
Windows Azure

• http://www.microsoft.com/windowsazure/
• Components
   – Windows Azure
      • Fabric – management & moni...
Windows Azure (2)




                      (C) David Chapell



    Cloud Computing        Apr 2010       #23
Contents



          Part III

Programming for the
       Cloud
      Tools & APIs



       Cloud Computing   Apr 2010  ...
Programming for the Cloud

• Amazon
   –   REST API
   –   AWS Java SDK (http://aws.amazon.com/sdkforjava/)
   –   AWS Too...
Programming for the Cloud (2)

• jClouds
   – http://code.google.com/p/jclouds/
   – Cloud interoperability framework (AWS...
Don’t forget…

• Deploying on EC2 requires minimal to no modifications of
  existing software
• EC2 has some big machines:...
Contents



         Part IV

Semantic Web on the
       Cloud


      Cloud Computing   Apr 2010   #28
Semantic Web on the Cloud

• Public Data Sets on AWS
   – A lot of datasets hosted for free by Amazon
       • Freebase, U...
Unified Cloud Computing

• http://code.google.com/p/unifiedcloud/
• Uses RDF for cloud data interoperability




         ...
Useful and useless links

• http://groups.google.com/group/cloud-computing
• “An Essential Guide to Possibilities and Risk...
Q&A




Questions?




  Cloud Computing   Apr 2010   #32
Upcoming SlideShare
Loading in...5
×

Introduction to Cloud Computing

9,218

Published on

Published in: Technology, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,218
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
512
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Introduction to Cloud Computing

  1. 1. Introduction to Cloud Computing Marin Dimitrov (technology watch #3) Apr 2010
  2. 2. Contents • Introduction • Cloud Computing platforms • Programming for the Cloud • Semantic Web on the Cloud Cloud Computing Apr 2010 #2
  3. 3. Contents Part I Introduction Cloud Computing Apr 2010 #3
  4. 4. Cloud Computing - NIST definition • “Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” • Delivery models – IaaS (Infrastructure as a Service) - the consumer uses "fundamental resources" such as processing power, storage, networking components or middleware. The consumer can control the operating system, storage, applications and possibly networking – PaaS (Platform as a Service) - the consumer uses a hosting environment for their applications and has control over the applications (and some control over the hosting environment), but does not control the infrastructure on which they are running – SaaS (Software as a Service) - the consumer uses an application, but does not control the infrastructure on which it's running (OS, hardware) Cloud Computing Apr 2010 #4
  5. 5. XaaS spectrum – Google, Amazon, Microsoft • Elastic Map Reduce • Gmail SaaS • Google apps • SimpleDB • App Engine • SQL Azure PaaS • Relational DataStore • BigTable / MegaStore • Flexible Payment Service • EC2 • Google Storage • Blob storage IaaS • Simple Queue Service • Azure Computing • Simple Notification Service • Queues • Elastic Block Storage • Load Balancer • S3 / RRS • CloudWatch / Auto Scaling • Elastic Load Balancer Cloud Computing Apr 2010 #5
  6. 6. Cloud Computing - Essential characteristics (NIST) • Rapid elasticity – the ability to scale resources both up and down as needed. To the consumer, the Cloud appears to be infinite, and the consumer can purchase as much / little computing power as they need • Measured service – aspects of the Cloud service are controlled and monitored by the Cloud provider. This is crucial for billing, access control, resource optimization & capacity planning • On-demand self service – a consumer can use cloud services as needed without any human interaction with the cloud provider • Ubiquitous network access – the Cloud provider’s capabilities are available over the network and can be accessed through standard mechanisms • Resource Pooling – allows a Cloud provider to serve its consumers via a multi-tenant model - resources are (re)assigned according to consumer demand. Cloud Computing Apr 2010 #6
  7. 7. Cloud Computing - deployment models (NIST) • Public cloud – Infrastructure owned by some organisation but sold to 3rd parties – E.g. Amazon Web Services, Google AppEngine, Windows Azure • Private cloud – Internal infrastructure for a single organisation (on or off-premise) – E.g. VMware vCloud, IBM Cloudburst, Microsoft Hyper-V • Community cloud – Infrastructure shared by several organisations, targeting a specific community – E.g. OpenCirrus (HP, Intel, Yahoo, KIT, CMU, …) • Hybrid cloud – Composition of the above – E.g. AWS Virtual Private Cloud Cloud Computing Apr 2010 #7
  8. 8. Cloud computing – business drivers 1. Business agility – Faster time to market • No major upfront commitment & investment in infrastructure – Scalability & elasticity • Instant on-demand provisioning • Shifting the risk of over-/under-provisioning to the cloud provider 2. Focus – Outsource non-core tasks to the cloud provider 3. Pay-as-you-go – Speed up new project launching & rollout (start small, add resources when needed) – No need for complex planning ahead – Turn fixed costs (CapEx) into variable costs (OpEx) Cloud Computing Apr 2010 #8
  9. 9. Some cloud use cases • Overflow buffer – Avoid over-provisioning for peak loads, but just for the average load • Seasonal business – E.g. Wallmart has 4:1 peak-to-average ratio (source?) • Small startups time-to-market – Less upfront investment, more focus on core competencies • Experimental playground – Rollout experimental projects without major equipment purchases • Speedup of large scale batch operations – 1000 servers for 1 hour cost the same as 1 server for 1000 hours – More cost-efficient computing (off-peak tariffs & time zones) • Unforeseeable events – E.g. sudden traffic spikes to web sites (volcanoes, anyone?) 2010 Cloud Computing Apr #9
  10. 10. Cloud-able applications • Typical characteristics – Non mission critical – Need >99% uptime – Low bandwidth / higher latency tolerance – Relaxed security requirements – Few integration points – E.g • Batch operations (speedup at the same price!) • One-time large scale processing • Barriers to cloud migration – Security & trust – Lack of SLA – Lack of standardization (vendor lock-in) Cloud Computing Apr 2010 #10
  11. 11. Cloud Computing – pros & cons (C) Dion Hinchcliffe Cloud Computing Apr 2010 #11
  12. 12. Contents Part II Cloud Computing Platforms AWS, Google AppEngine, Windows Azure Cloud Computing Apr 2010 #12
  13. 13. XaaS spectrum – Google, Amazon, Microsoft (again) • Elastic Map Reduce • Gmail SaaS • Google apps • SimpleDB • App Engine • SQL Azure PaaS • Relational Database Service • BigTable / MegaStore • Flexible Payment Service • EC2 • Google Storage • Blob storage IaaS • Simple Queue Service • Azure Computing • Simple Notification Service • Queues • Elastic Block Storage • Load Balancer • S3 / RRS • CloudWatch / Auto Scaling • Elastic Load Balancer • Virtual Private Cloud Cloud Computing Apr 2010 #13
  14. 14. Amazon Web Services • http://aws.amazon.com/ • Xen VMs, 1 ECU = 1.2GHz AMD Opteron, US/EU prices EC2 instance RAM CU* HDD bit $/h on $/h $/h GB (Cores) GB demand Spot reserved S 1.7 1 (1) 160 32 0.085 0.03 0.03 L 7.5 4 (2) 850 64 0.34 0.13 0.12 XL 15 8 (4) 1690 64 0.68 0.24 0.24 High-mem XL 17.1 6.5 (2) 420 64 0.50 0.18 0.17 High-mem 2XL 34.2 13 (4) 850 64 1.20 0.43 0.42 High-mem 4XL 68.4 26 (8) 1690 64 2.40 0.82 0.84 High-CPU M 1.7 5 (2) 350 32 0.17 0.06 0.06 High-CPU XL 7 20 (8) 1690 64 0.68 0.24 0.24 Cloud Computing Apr 2010 #14
  15. 15. Amazon Web Services (2) • Simple Storage Service (S3) – Eventually consistent blob storage (SLA available) – Max 5GB per object, REST+SOAP API – Storage $0.15/GB/mo, transfer $0.15/GB, $0.10 per 100K API calls • Elastic Compute Cloud (EC2) – Xen VM, Amazon Machine Image (AMI), no SLA • Elastic Block Storage (EBS) – Up to 1TB storage to be used by EC2 instances (attached devices) – Raw/unformatted block devices (create your own filesystem on top) – Replicated – $0.10/GB/mo, $0.10 per 1 million I/O ops (iostat) Cloud Computing Apr 2010 #15
  16. 16. Amazon Web Services (3) • Simple Queue Service – Persistent, reliable, secure, distributed queue (no SLA) – Message size 8KB, autodelete 4 days – duplicate and out-of-order delivery may occur – Price: $0.15/GB transfer, $0.10 per 100K API calls • Simple Notification Service – Reliable, secure & scalable pub/sub service (no SLA) – Protocols: HTTP, e-mail, SQS – Price: $0.15/GB transfer, $0.06 per 100K API calls, price per 100K notifications: $0.06 (HTTP), $2.00 (e-mail), free (SQS) • SimpleDB – Distributed column store (built on Erlang) – Consistent or eventually consistent reads, flexible schema – $0.14/hour consumed, $0.15/GB transfer, $0.25/GB/mo storage Cloud Computing Apr 2010 #16
  17. 17. Amazon Web Services (4) • Relational Database Service – MySQL (no SLA) – Automated backup and scaling – $0.11 to $3.10 per hour (instance type), $0.10/GB/mo storage, $0.10 per million I/O ops, $0.15/GB transfer • Elastic MapReduce – Based on Hadoop – Price: EC2 instance price + premium ($0.01 - $0.42/hour) • CloudWatch, Auto Scaling, Elastic Load Balancer – Monitoring, auto scaling & load balancing for EC2 • Virtual Private Cloud Cloud Computing Apr 2010 #17
  18. 18. Google AppEngine • http://code.google.com/appengine/ • Features – custom JVM (lots of limitations) – servlet container, JSP – Datastore based on BigTable (column store, consistent, C+P) – JDO/JPA – Google infrastructure services: URL fetch, mail – Memcache (in-memory distributed key/value cache) – Task queues & scheduler – Development: local dev server, Eclipse plugins, administration • Pricing – traffic/GB $0.10 ($0.12); CPU/h $0.10; storage/GB/mo $0.15; e-mail $1 per 10K Cloud Computing Apr 2010 #18
  19. 19. Google AppEngine (2) (C) Dan Sanderson / O’Reilly Cloud Computing Apr 2010 #19
  20. 20. Google AppEngine (3) • Restrictions – Applications run in a restricted JVM sandbox • No threads, no System calls, limited reflection – No sub-process forking – Connections • Outbound – only URL fetch & mail • Inbound – only HTTP(S) – No filesystem writes (limited read access), use datastore instead – Limits • Request duration – 30 sec • Request/response size – 10 MB (datastore request/response – 1MB) • file size – 10 MB, number of files – 3,000 • Datastore: entity size – 1 MB, property values – 1000, entities per batch - 500 Cloud Computing Apr 2010 #20
  21. 21. Google AppEngine (4) • Datastore – Based on BigTable, distributed column-store • Entities and multi-valued properties • Entities have unique key & a type (kind) • Flexible schema Select from Person where lastName = … – Transactional, consistent && height < … – JDO/JPA interface order by height desc • Queries – JDOQL: entity kind + property value restrictions + sort order – Cursors can be specified (query range) – query resultset is materialised in a predefined index • query execution only fetches data from the existing index • queries with same kind + property restriction operator (but different value filler) + same sort order share the same index Cloud Computing Apr 2010 #21
  22. 22. Windows Azure • http://www.microsoft.com/windowsazure/ • Components – Windows Azure • Fabric – management & monitoring of cloud services (Hyper-V) • Compute – hosted applications (.net, c++, java, …) • Storage – blob storage, tables, queues (REST interface) – SQL Azure • Cloud based MS SQL Server – AppFabric • Infrastructure services, Service registry • Access control • Pricing – CPU/h $0.12; storage $0.15/GB/mo, transfer $0.10 ($0.15), storage transactions – $1 per 1 million Cloud Computing Apr 2010 #22
  23. 23. Windows Azure (2) (C) David Chapell Cloud Computing Apr 2010 #23
  24. 24. Contents Part III Programming for the Cloud Tools & APIs Cloud Computing Apr 2010 #24
  25. 25. Programming for the Cloud • Amazon – REST API – AWS Java SDK (http://aws.amazon.com/sdkforjava/) – AWS Toolkit for Eclipse (http://aws.amazon.com/eclipse) – Typica (http://code.google.com/p/typica/) – JetS3t (S3 only) http://jets3t.s3.amazonaws.com/index.html • Google AppEngine – AppEngine SDK (dev server, admin tools, Eclipse plugins) – Datastore: JDO, JPA, low-level Java API – Memcache: JCache + low level Java API – URL fetch: java.net + low level Java API – Mail: java.mail + low level Java API – Task queue, blob store, accounts: low level APIs Cloud Computing Apr 2010 #25
  26. 26. Programming for the Cloud (2) • jClouds – http://code.google.com/p/jclouds/ – Cloud interoperability framework (AWS, Google AppEngine*, Windows Azure, GoGrid) – Mostly storage oriented functionality • Eucalyptus – http://www.eucalyptus.com/ – Open source private cloud infrastructure – AWS compatible (EC2, EBS, S3) (C) Eucalyptus Inc. – Cross-hypervisor support Cloud Computing Apr 2010 #26
  27. 27. Don’t forget… • Deploying on EC2 requires minimal to no modifications of existing software • EC2 has some big machines: 70GB RAM / 8 CPU cores • 1,000 servers for 1hr cost the same as 1 server for 1,000hrs • Data traffic (in/out) of the Cloud can be expensive • Storage relatively cheap • Internal cloud traffic is free (AWS), e.g. accessing other applications/datasets on the Cloud • CPU price: uptime (EC2) vs. computing cycles (AppEngine) • EC2 spot instances (off-peak hours) are very, very cheap! Cloud Computing Apr 2010 #27
  28. 28. Contents Part IV Semantic Web on the Cloud Cloud Computing Apr 2010 #28
  29. 29. Semantic Web on the Cloud • Public Data Sets on AWS – A lot of datasets hosted for free by Amazon • Freebase, UniGene, US Census, … – New data sets can be submitted too (after approval) – Full LOD cloud still not available (due to licensing issues) • SaaS – Virtuoso (AWS hosted), OpenCalais, … • “Semantic Cloud” initiatives (cloud interoperability & data integration) – E.g. fluidOps - Management & provisioning of semantic applications (SaaS) and datasources (DaaS) on the Cloud • Semantic Web apps as virtual appliances on the Cloud • LOD data sources as virtual resources on the Cloud (“Self-service” paradigm) Cloud Computing Apr 2010 #29
  30. 30. Unified Cloud Computing • http://code.google.com/p/unifiedcloud/ • Uses RDF for cloud data interoperability Cloud Computing Apr 2010 #30
  31. 31. Useful and useless links • http://groups.google.com/group/cloud-computing • “An Essential Guide to Possibilities and Risks of Cloud Computing” • “Talking To Your CFO About Cloud Computing” • Nick Carr @ Atmosphere’2009 • Introducing the Windows Azure platform Cloud Computing Apr 2010 #31
  32. 32. Q&A Questions? Cloud Computing Apr 2010 #32
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×