Exploring Cloud Computing Options for
Data Warehousing
July 26, 2012
Mark Madsen
@markmadsen
www.ThirdNature.net
Cloud Computing
" …a model for enabling ubiquitous, convenient, on‐
demand network access to a shared pool of configurable 
computing resources (e.g., networks, servers, storage, 
applications, and services) that can be rapidly 
provisioned and released with minimal management 
effort or service provider interaction." 
What people see: seemingly infinite resource to apply to 
performance problems on short notice and at low cost
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
Generators: Expensive Product
Generators: Commodity Product
Generators as a Service: Electricity
The Natural Process of Commoditization
Simon Wardley, A Lifecycle Approach to Cloud Computing
Managing Hardware Resources
Systems are sized for the peak workload, with the 
expectation that it will fluctuate.
Demand
Capacity
Time
Resources
Idle resources = low utilizations = money wasted
Demand
Capacity
Time
Resources
Idle resources
Not enough resource is (much) worse than too much.
Demand
Capacity
Time
Resources
Maintaining capacity just above the peak as 
workloads increase is the art of capacity planning.
One problem is the large step when upgrading to 
more resources, equating to a large capital cost.
Demand
Capacity
Time
Resources
Great performance after an upgrade, bad 
performance at year‐end before the next upgrade.
A steady decline can be worse for user perception 
than constant mediocre performance.
Demand
Capacity
Time
Resources
Idle
What everyone would like: elastic capacity
Pay for the resources you use when you use them, 
not up front for the entire system that supplies them. 
Just like electricity.
Capacity
Time
Resources
Demand
Five Key Cloud Characteristics
1. On‐demand self‐service
2. Network accessibility
3. Resource pooling
4. Measured service
5. Elasticity
Cloud Architecture
Started with virtual machines
Lots of servers, lots of virtual 
nodes. But in public clouds:
• Storage can, often is separated
• VMs don’t run across nodes
• Great for OLTP, not so much for BI
• Implies new software architectures
Database Architecture and the Cloud
Virtualizing on a single server makes 
no sense for a database that needs 
the full resources.
If your server hardware 
environment looks like this:
then it’s probably good for 
lightweight transaction 
processing, simple storage and 
retrieval, procedural 
computations on data.
If you want to use it for a data 
warehouse, you need:
• A shared‐nothing database
• A proper storage architecture
• Dynamic licensing
Three Models of Deployment
3. Private cloud
1. Public cloud
2. Leased / hosted
private cloud
Benefits and Rationale
Why did you / are you considering a move to the cloud?
Two primary reasons:
▪ Cost reduction
▪ Reduced time to value
IBM global survey of IT and line-of-business decision makers
Unexpected Benefits
Speed to deploy:
▪ opex vs capex means faster approvals and 
less planning
▪ Provision on‐demand means ability to do all 
those small projects that needed resources 
and staff to set up
Performance management:
▪ Resource‐oriented fixes done in minutes
▪ Instead of static resources and fluctuations 
in performance, set static SLAs and fluctuate 
the resources
Administration:
▪ No more hardware or operating system 
upgrades to deal with
Public Cloud Challenges
1. Multi‐tenant servers and 
unpredictable I/O performance
2. Legal problems:
▪ Data co‐mingling in multi‐tenant 
databases
▪ Data locality and national laws
3. Cloud compatibility for data 
integration and data management 
tools (environment, data movement)
4. Security requirements
When these are a concern, private clouds 
may be the better option today.
What are manager preferences?
9%
21%
52%
44%
39%
35%
Data mining, text mining, or 
other analytics
Data warehouses or data 
marts
Prefer not to use cloud
Private cloud preference
Public cloud preference
IBM global survey of IT and line-of-business decision makers
Comparison of Models
New and growing use cases drive the need to expand
The use cases are now interactive applications, lower latency 
data, complex analytics and rapidly growing data volumes.
Image Attributions
Thanks to the people who supplied the images used in this presentation:
Commoditization diagram – from A Lifecycle Approach to Cloud Computing, © Simon Wardley
tesla coil train ‐ http://www.flickr.com/photos/winterhalter/27364687
Amazon Virtual Private Cloud diagram‐ © Amazon, Inc..
caged_tower_melbourne.jpg ‐ http://www.flickr.com/photos/vermininc/2227512763
About the Presenter
Mark Madsen is president of Third
Nature, a technology research and
consulting firm focused on business
intelligence, analytics and
information management. Mark is an
award-winning author, architect and
former CTO whose work has been
featured in numerous industry
publications. During his career Mark
received awards from the American
Productivity & Quality Center, TDWI,
Computerworld and the Smithsonian
Institute. He is an international
speaker, contributing editor at
Intelligent Enterprise, and manages
the open source channel at the
Business Intelligence Network. For
more information or to contact Mark,
visit http://ThirdNature.net.
About Third Nature
Third Nature is a research and consulting firm focused on new and
emerging technology and practices in business intelligence, analytics and
performance management. If your question is related to BI, analytics,
information strategy and data then you‘re at the right place.
Our goal is to help companies take advantage of information-driven
management practices and applications. We offer education, consulting
and research services to support business and IT organizations as well as
technology vendors.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in product and technology analysis, so we look at
emerging technologies and markets, evaluating technology and hw it is
applied rather than vendor market positions.

Exploring cloud for data warehousing