WTIA Cloud Computing Series - Part I: The Fundamentals


Published on

WTIA Cloud Computing Series - Part I: The Fundamentals. Presented by: Aaron Kimball.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

WTIA Cloud Computing Series - Part I: The Fundamentals

  1. 1. Dealing with data Enterprise systems in the cloud Aaron Kimball Founding Engineer, Cloudera Inc. 2-5-09
  2. 2. Cloud computing: scalable applications
  3. 3. Cloud computing: broader than any one app Cloud computing is a method to address scalability and availability concerns for enterprise applications.
  4. 4. The take-away Cloud computing represents a new approach to scalability problems. Reusable infrastructure components are available to your organization to build rapidly and scale gracefully.
  5. 5. Outline Introduction More data than you’ve ever seen before Processing large data volumes Hosting large-scale applications An evolving ecosystem of components
  6. 6. Data volumes are growing Amount of data one computer can store: 10,000 GB Amount of data one computer can process at a time: 32 GB Amount of data processed by Google per month: 400,000,000 GB … in 2007
  7. 7. Where does data come from? Watching your users (clicks on web site, pages viewed, items purchased…) Simulations, scientific/experimental data (genome sequences, medical imaging, wireless sensor grids…) User-provided content (Billions of flickr images, youtube videos, blog posts…) Your infrastructure itself (10,000 computers reporting their status every second…) Existing databases (product catalogs, historical sales data, surveys…)
  8. 8. Large-scale data processing lessons You can generate vastly more data than you can process with conventional tools No relational database handles petabytes gracefully Data processing must involve many machines working in parallel
  9. 9. Hadoop: an active storage platform A community-driven, commercially-supported, extensible system. Based on techniques developed by Google. Separates the problem of extracting information from large data from performing reliable computation. Combines a scalable, reliable compute framework with self- healing high-bandwidth storage.
  10. 10. Putting it together: active storage Data automatically distributed to nodes at load time Load balancing implicitly managed by Hadoop
  11. 11. Automatic parallel processing Data elements processed locally, in parallel Reliable computation implicitly managed by Hadoop
  12. 12. Distributed data, single volume Output data is written to local disks, and forms a single user- accessible volume A high-level abstraction for engineers and analysts
  13. 13. A self-healing system Loss of nodes causes automatic data rebalance Automatic recovery managed by Hadoop
  14. 14. Existing large-scale systems…
  15. 15. Are augmented by Hadoop
  16. 16. Hosting infrastructure Managed cloud platforms provide hardware resources for rent. Think cycles and bytes, not months and machines. Provides on-demand low-level infrastructure for hosting applications.
  17. 17. An evolving ecosystem
  18. 18. An evolving ecosystem ! ! # ! " ! !
  19. 19. Conclusions Cloud computing makes resources available in an on-demand fashion. From raw hardware up to fully-configured applications The range of resources available is increasing, with new tools being aimed at different levels of the hardware/software stack. These tools allow you to rapidly integrate disparate components of your infrastructure and handle vastly more data than before.
  20. 20. (c) 2008 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Iceberg by wikipedia user Calyponte
  21. 21. (c) 2008 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0