Exploring cloud for data warehousing


Published on

Cloud computing is creating a new era for IT by providing a set of services that appear to have infinite capacity, immediate deployment and high availability at trivial cost. These are all appealing to someone running a data warehouse when data volume, use and cost are growing at a rapid rate.
Today most organizations look at cloud as a way to lower data center and IT costs. While cost reduction is a real benefit, there is more value in the increased scalability, speed to procure (and give up) resources, and ease of delivery in cloud environments.
Database workloads are particularly challenging in the cloud. Cloud deployments beyond a moderate scale favor shared-nothing database architectures designed to run transparently in a multi-node environment. We are still in an early period of standardization and design of software to run in the cloud. Not all workloads are suitable for deployment on a collection of small virtualized servers today. Business intelligence and analytic database workloads fall into this area, raising the importance of analysis for fit with public and private cloud options.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Exploring cloud for data warehousing

  1. 1. Exploring Cloud Computing Options for Data Warehousing July 26, 2012 Mark Madsen @markmadsen www.ThirdNature.net
  2. 2. Cloud Computing " …a model for enabling ubiquitous, convenient, on‐ demand network access to a shared pool of configurable  computing resources (e.g., networks, servers, storage,  applications, and services) that can be rapidly  provisioned and released with minimal management  effort or service provider interaction."  What people see: seemingly infinite resource to apply to  performance problems on short notice and at low cost http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  3. 3. Generators: Expensive Product
  4. 4. Generators: Commodity Product
  5. 5. Generators as a Service: Electricity
  6. 6. The Natural Process of Commoditization Simon Wardley, A Lifecycle Approach to Cloud Computing
  7. 7. Managing Hardware Resources Systems are sized for the peak workload, with the  expectation that it will fluctuate. Demand Capacity Time Resources
  8. 8. Idle resources = low utilizations = money wasted Demand Capacity Time Resources Idle resources
  9. 9. Not enough resource is (much) worse than too much. Demand Capacity Time Resources
  10. 10. Maintaining capacity just above the peak as  workloads increase is the art of capacity planning. One problem is the large step when upgrading to  more resources, equating to a large capital cost. Demand Capacity Time Resources
  11. 11. Great performance after an upgrade, bad  performance at year‐end before the next upgrade. A steady decline can be worse for user perception  than constant mediocre performance. Demand Capacity Time Resources Idle
  12. 12. What everyone would like: elastic capacity Pay for the resources you use when you use them,  not up front for the entire system that supplies them.  Just like electricity. Capacity Time Resources Demand
  13. 13. Five Key Cloud Characteristics 1. On‐demand self‐service 2. Network accessibility 3. Resource pooling 4. Measured service 5. Elasticity
  14. 14. Cloud Architecture Started with virtual machines Lots of servers, lots of virtual  nodes. But in public clouds: • Storage can, often is separated • VMs don’t run across nodes • Great for OLTP, not so much for BI • Implies new software architectures
  15. 15. Database Architecture and the Cloud Virtualizing on a single server makes  no sense for a database that needs  the full resources. If your server hardware  environment looks like this: then it’s probably good for  lightweight transaction  processing, simple storage and  retrieval, procedural  computations on data. If you want to use it for a data  warehouse, you need: • A shared‐nothing database • A proper storage architecture • Dynamic licensing
  16. 16. Three Models of Deployment 3. Private cloud 1. Public cloud 2. Leased / hosted private cloud
  17. 17. Benefits and Rationale Why did you / are you considering a move to the cloud? Two primary reasons: ▪ Cost reduction ▪ Reduced time to value IBM global survey of IT and line-of-business decision makers
  18. 18. Unexpected Benefits Speed to deploy: ▪ opex vs capex means faster approvals and  less planning ▪ Provision on‐demand means ability to do all  those small projects that needed resources  and staff to set up Performance management: ▪ Resource‐oriented fixes done in minutes ▪ Instead of static resources and fluctuations  in performance, set static SLAs and fluctuate  the resources Administration: ▪ No more hardware or operating system  upgrades to deal with
  19. 19. Public Cloud Challenges 1. Multi‐tenant servers and  unpredictable I/O performance 2. Legal problems: ▪ Data co‐mingling in multi‐tenant  databases ▪ Data locality and national laws 3. Cloud compatibility for data  integration and data management  tools (environment, data movement) 4. Security requirements When these are a concern, private clouds  may be the better option today.
  20. 20. What are manager preferences? 9% 21% 52% 44% 39% 35% Data mining, text mining, or  other analytics Data warehouses or data  marts Prefer not to use cloud Private cloud preference Public cloud preference IBM global survey of IT and line-of-business decision makers
  21. 21. Comparison of Models
  22. 22. New and growing use cases drive the need to expand The use cases are now interactive applications, lower latency  data, complex analytics and rapidly growing data volumes.
  23. 23. Image Attributions Thanks to the people who supplied the images used in this presentation: Commoditization diagram – from A Lifecycle Approach to Cloud Computing, © Simon Wardley tesla coil train ‐ http://www.flickr.com/photos/winterhalter/27364687 Amazon Virtual Private Cloud diagram‐ © Amazon, Inc.. caged_tower_melbourne.jpg ‐ http://www.flickr.com/photos/vermininc/2227512763
  24. 24. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.
  25. 25. About Third Nature Third Nature is a research and consulting firm focused on new and emerging technology and practices in business intelligence, analytics and performance management. If your question is related to BI, analytics, information strategy and data then you‘re at the right place. Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.