Your SlideShare is downloading. ×
Cloud Computing: The Hard Problems Never Go Away
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Cloud Computing: The Hard Problems Never Go Away


Published on

Talk by Doug Tidwell of IBM at ZendCon 2009

Talk by Doug Tidwell of IBM at ZendCon 2009

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Cloud Computing: The Hard Problems Never Go Away By Doug Tidwell, Cloud Computing Evangelist, IBM
  • 2. Agenda • A few (a very few) words about the cloud • The hard problems (Boo!) • Vendor lock-in and the Simple Cloud API • Next steps / Resources 2
  • 3. A few words about the cloud 3
  • 4. Before the cloud • If you wanted to start an enterprise, you needed an IT shop. • Massive costs in hardware, software, power, administrative staff • Prohibitive cost to entry 4
  • 5. What if... • You could have unlimited computing resources? All the processing power you want All the data storage you want Data mining whenever you want • Cloud computing will be the biggest change to our industry since the rise of the Internet. 5
  • 6. The cloud •Cloud computing . . . is a style of computing where IT-related capabilities are provided ‘as a service,’ allowing users to access technology-enabled services ‘in the cloud’ without knowledge of, expertise with or control over the technology infrastructure that supports them. From Wikipedia •Everybody has a slightly different idea of what cloud computing really is. 6
  • 7. The cloud is here to stay • Extremely stupid idea of assuming the entire planet just can not be bothered with their own data (nor the security thereof). As always there will be some who think they 'need' this. I hope this whole cloud [stuff] just goes away. Logistically speaking it will never be anything but a waste of money. Posted by, who apparently just can not be bothered with their own email (nor the security thereof). 7
  • 8. Cloud characteristics • Rapid elasticity • Measured service • On-demand self-service • Ubiquitous network access • Location-independent resource pooling Source: NIST Working Definition of Cloud Computing, 8
  • 9. Cloud services • There are four basic things people are doing in the cloud: The application in the sky The hard drive in the sky The database in the sky The machine in the sky 9
  • 10. The hard problems never die No matter how badly we want them to 10
  • 11. The hard problems never die • Whenever the industry embraces a technology, there’s always the exhilaration of doing something new. • Unfortunately, we still have to deal with the hard problems like security, maintenance, lifecycle management, and so forth. We still have to build flexible applications that respond to changes in the business environment. We still have to share data and business processes with our partners. We’re not going back to building silos just because the cloud is here. • SOA isn’t going away just because the cloud is here. 11
  • 12. Cloud computing requirements • Management and governance • Transactions and concurrency • Identity • Federated identity • Security • Location awareness • Avoiding vendor lock-in Common APIs for cloud storage, cloud databases, cloud middleware • Common VM image format 12
  • 13. Cloud computing requirements • Data and application federation • SLAs • Lifecycle management • Open client • Metering and monitoring • Deployment 13
  • 14. Governance is hard • Similarities between SOA and the cloud: Both started bottom-up Both started with massive hype Both don’t work without governance • You need an architect, a blueprint, and an executive who is both enlightened and powerful. • SOA and cloud computing aren’t simply coding issues. 14
  • 15. Transactions and concurrency • Many enterprise applications require transactions and concurrency. There needs to be only one copy of the data Any changes to the data have to be synchronized • This is very difficult to scale. 15
  • 16. Cloud storage • Most cloud storage systems, such as Amazon’s S3, are designed as distributed, redundant systems. Your data is stored on more than one disk in more than one place. If one part of the system goes down, the rest of the system keeps going. “There should never be a single point of failure” is a stated design goal. • You can’t think of cloud storage as just another hard drive. 16
  • 17. Cloud storage is not just a hard drive • Once you create an object, it can’t be modified. You can delete it or replace it, but you can’t modify it. (In S3, you can’t even move it.) • It takes time for changes to make their way throughout the system (propagation latency). If you just put an object into S3, you can’t be sure that it will be available right away. If you get an object from S3, you can’t be sure that it’s the latest version. 17
  • 18. Cloud storage is not just a hard drive • Read and write requests will fail occasionally. Your application should handle that gracefully. Trying the request a second time usually does the trick. Delete requests sometimes fail as well. 18
  • 19. Cloud storage • All of this is by design. These design decisions mean that S3 is extremely scalable and reliable. But these design decisions also mean that S3 doesn’t work like another hard drive. • The right answer, as always, is to understand your choices, understand your needs and pick the technology that works best for you. 19
  • 20. Cloud databases • Cloud databases have similar design points. Datasets are distributed for reliability Some cloud databases support schemas, some don’t Some cloud databases support joins, most don’t Some cloud databases are relational, almost all aren’t Some cloud databases are transactional, some aren’t 20
  • 21. “Database” ≠ RDBMS • Amazon’s SimpleDB is built around domains. Each domain has some number of items; each item has some number of attribute / value pairs. No schema support No queries across domains • These design decisions make SimpleDB fast, scalable and, well, simple. But our previous discussions of propagation latency and I/O errors apply here, too. 21
  • 22. A wee quiz • Is this a good idea? count() • How about this? avg() • Or one of these? sum(), min(), max() 22
  • 23. A wee quiz • The answer: It depends. SimpleDB automatically indexes your datasets, so the count() operation is efficient. But everything is a string, so avg(), sum(), min(), and max() aren’t supported. Even if they are supported, they often aren’t efficient. If records have to be retrieved from multiple servers in multiple data centers just to calculate an average, that can take a long time. 23
  • 24. Database design • Many of us have learned over the years how to normalize an SQL database. But in the cloud, if the dataset is scattered across multiple machines in multiple data centers and you can’t do joins across tables, you have to do things differently. • A denormalized database contains redundant data and often does calculations at write time, not at read time. A denormalized database is different from a poorly-designed database that was never normalized to begin with. 24
  • 25. Denormalizing your database • The goal of a traditional, normalized relational database is to put each piece of data into the system once and only once. Building an application to use the data in a different way doesn’t require changes to the database. (Hopefully.) • With a denormalized database, the goal is to copy pieces of data to minimize queries and processing power. Building an application to use the data in a different way might require changes to the database. Updating data might require making the same change in multiple places. 25
  • 26. Reliability • If I have less physical control over my infrastructure, it’s vital that the cloud be reliable as possible. Some providers have SLAs that guarantee uptime and responsiveness. Some providers deliver private clouds via hosted or colocated data centers. • Reliability isn’t a new problem, but the cloud gives us someone else to blame. 26
  • 27. Security • If I’m storing my data elsewhere, security as crucial as ever. • If I’m running my applications elsewhere, security is as crucial as ever. Some cloud providers offer greater security, access to their facilities, customized backup and recovery procedures, data destruction, etc. • Cloud computing doesn’t introduce any new security issues. It doesn’t make them easier to solve, but it doesn’t create any new ones. • You can reuse much of your existing security infrastructure. 27
  • 28. Identity • Identity management, particularly federated identity management, is crucial for cloud computing. Technologies such as LDAP, OpenID, OAuth, etc. can be useful here. • You can reuse much of your existing authentication infrastructure. 28
  • 29. Regulatory issues • Many enterprises can’t use a public cloud at all. Laws prohibit certain types of data from being stored off-premises. • The private cloud is an important architectural pattern being used in many enterprises. • Location awareness is a key concern for many (potential) cloud users. • Data retention and data destruction are important also. 29
  • 30. Vendor lock-in 30
  • 31. Vendor lock-in • If there’s a new technology, any talented programmer will want to use it. Maybe the shiny new thing is appropriate for what we’re doing. Maybe not. We’re probably going to use it anyway. • The challenge is to walk the line between using the newest, coolest thing and avoiding vendor lock-in. 31
  • 32. The Simple Cloud API • A joint effort of Zend, GoGrid, IBM, Microsoft, Nirvanix and Rackspace But you can add your own libraries to support other cloud providers. • The goal: Make it possible to write portable, interoperable code that works with multiple cloud vendors. • • An article on the Simple Cloud API was published on the developerWorks Open Source zone today: There’s also a Simple Cloud podcast at dW. 32
  • 33. The Simple Cloud API • Covers three areas: File storage (S3, Nirvanix, Azure Blob Storage, Rackspace Cloud Files) Document storage (SimpleDB, Azure Table Storage) Simple queues (SQS, Azure Table Storage) • Uses the Factory and Adapter design patterns A configuration file tells the Factory object which adapter to create. 33
  • 34. Vendor-specific APIs • Listing all the items in a Nirvanix directory: •$auth = array('username' => 'your-username', 'password' => 'your-password', 'appKey' => 'your-appkey'); $nirvanix = new Zend_Service_Nirvanix($auth); $imfs = $nirvanix->getService('IMFS'); $args = array('folderPath' => '/dougtidwell', 'pageNumber' => 1, 'pageSize' => 5); $stuff = $imfs->ListFolder($args); 34
  • 35. Vendor-specific APIs • Listing all the items in an S3 bucket: •$s3 = new Zend_Service_Amazon_S3 ($accessKey, $secretKey); $stuff = $s3->getObjectsByBucket($bucketName); 35
  • 36. The Simple Cloud API • Listing all the items in a {Nirvanix directory | S3 bucket}: •$credentials = new Zend_Config_Ini($configFile); $stuff = Zend_Cloud_Storage_Factory::getAdapter ($credentials)->listItems(); 36
  • 37. Where we go from here It’s still early 37
  • 38. Issues with the Internet • “It’s not secure.” • “I don’t want to lose control of my infrastructure.” • “I don’t know how reliable it is.” • “I don’t know if my partners are going to use it.” • All of these were important, legitimate issues. With VPNs and other technology, the industry solved these problems. 38
  • 39. Issues with the cloud • “It’s not secure.” • “I don’t want to lose control of my infrastructure.” • “I don’t know how reliable it is.” • “I don’t know if my partners are going to use it.” • All of these are important, legitimate issues. We’ve got some work to do, but the massive economic incentives mean someone will find a way to solve these problems. 39
  • 40. The hype, one more time •Cloud computing will be the biggest change to our industry since the rise of the Internet. 40
  • 41. Resources Useful stuff 41
  • 42. Other cloud sessions • Cloud Computing with VMWare, Akhil Sahil 11:15 – 12:15 Wednesday • PHP and the Cloud, Ivo Jansch 2:45 – 3:45 Wednesday • PHP and Platform Independence in the Cloud, Wil Sinclair 9:15 – 10:15 Thursday 42
  • 43. Cloud Computing with the Zend Framework • A series of articles from the IBM developerWorks Open Source zone: Using Amazon S3 with Zend, September 22 • Using Amazon EC2 with Zend, October 13 • 43
  • 44. Principles of openness 1. Cloud providers must work together to ensure that the challenges to cloud adoption are addressed through open collaboration and the appropriate use of standards. 2. Cloud providers must not use their market position to lock customers into their particular platforms and limiting their choice of providers. 3. Cloud providers must use and adopt existing standards wherever appropriate. The IT industry has invested heavily in existing standards and standards organizations; there is no need to duplicate or reinvent them. 44
  • 45. Principles of openness 4. When new standards (or adjustments to existing standards) are needed, we must be judicious and pragmatic to avoid creating too many standards. We must ensure that standards promote innovation and do not inhibit it. 5. Any community effort around the open cloud should be driven by customer needs, not merely the technical needs of cloud providers, and should be tested or verified against real customer requirements. 6. Cloud computing standards organizations, advocacy groups, and communities should work together and stay coordinated, making sure that efforts do not conflict or overlap. 45
  • 46. The principles in action • The Cloud Computing Use Cases Google group has a white paper of common use cases. • The identified use cases will be used as input to various standards efforts. • Join us at cloud-computing-use-cases. • Version 1 of the paper is available at 46
  • 47. Developer requirements • Scalable Database • Centralized Logging • Scalable Session Management • Identity Management • Robust Storage • Raw Compute / Job Processing • Messaging – Pub-Sub • Service Discovery • Messaging – Point-to-Point • SLAs • Mail • Caching What else? 47
  • 48. Thanks! By Doug Tidwell, Cloud Computing Evangelist, IBM