Surveys show a growing demand for more up-to-date data in our BI environments. To meet these needs requires changing from a strict reliance on nightly batch-style ETL to other methods. What is often ignored is how this affects the data warehouse. This shift introduces new technology and methods, which means the warehouse must support new types of workloads.
• Methods and tools for processing up-to-date data
• New requirements for your data warehouse database or platform
• What to look for as you address these requirements
1. How Real Time Data
Requirements Change the Data
Warehouse Environment
Mark Madsen – September 17, 2008
www.ThirdNature.net
Attribution-NonCommercial-No Derivative
http://creativecommons.org/licenses/by-nc-nd/3.0/us/
2. Outline
What’s real-time about?
Impacts on the data
warehouse architecture
Delivering data to users
Extracting the data
Storing the data
Operations
Getting started
Third Nature, January 2008 Mark Madsen Slide 2
3. Speeding Up the Data Warehouse
Why?
Faster reaction time
Reduced decision time
New process capabilities
Third Nature, January 2008 Mark Madsen Slide 3
4. Which Decisions Benefit?
Strategic Operational
Decision time flexible, long cycle
constrained, short
cycle
Decision scope broad, organizational narrow, departmental
or process
Decision model Complex Simple
Data latency High, history is core Low, recent data is
to decisions core to decisions
Data scope Many sources, many Few sources,
types, aggregated structured, detailed
Most real time needs will be driven by operational decision
making, not strategic decisions.
Third Nature, January 2008 Mark Madsen Slide 4
5. Strategy, Decisions and Data Latency
Goal Increase share of low to mid market customers
Strategy Reduce cost of products sold Improve promotional performance
Tactics Efficient sourcing Decrease Out of Stocks
Consolidate suppliers Improve delivery compliance Catch out of stocks
before they occur
BI Needs
Reports & Dashboards, alerts Real time alerts &
spreadsheets & scorecards embedded analytics
Third Nature, January 2008 Mark Madsen Slide 5
6. What People Are Doing Today
Monthly W eekly Daily Multiple times per day On demand
2002 32% 34% 69% 15% 6%
2004 27% 29% 65% 30% 19%
2006 3 24 44 29
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Sources: TDWI, Gartner
At the same time, data volumes are rising for most data
warehouses at 50% to 100% per year.
Third Nature, January 2008 Mark Madsen Slide 6
7. BI Efforts Involving Real Time Data Access
Terms you may hear from the
BI market that imply real time:
Operational BI
Embedded analytics
Decision automation
Complex event processing
Event-driven BI
Process-driven BI
They are all similar in
requiring some level of low
latency data access.
Third Nature, January 2008 Mark Madsen Slide 7
8. Impacts on the DW Architecture
Databases Dashboards OLAP Productivity BAM/BPM Reporting Analytics Applications
Data Consumers
Delivery
DW Platforms Adding current
data to the system
Warehouse Mart requires effort at all
Database three layers
Content
ODS Store
ETL EDR EII
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Third Nature, January 2008 Mark Madsen Slide 8
9. One Architecture or Two?
In-line with process:
RT BI
• Real time data flows separately
from the warehouse data
• May include a low-latency data
store in the real time environment Process
• This model be needed for
extremely low latency data
BI
• More applicable for event-driven
Batch DW
Out of band:
• Data to the consumer first flows Process
through the DW
• Unified architecture for both low
and high latency data BI &
RT BI
• More applicable for on-demand DW
Third Nature, January 2008 Mark Madsen Slide 9
10. User Interface: Two BI Usage Models
Demand driven
• Users ask for current data
• Most BI tools work this way
• Harder to adapt these tools to
event-driven models
Event driven
• System takes action based on
data, e.g. alerts, rule engines
• May not have (or need) an end
user interface
• Need understanding of decision
& action process for this model
Third Nature, January 2008 Mark Madsen Slide 10
11. BI Tools Need New Capabilities
Embedding BI within
applications
• UI embedding
• Full embedding
Event-based integration
Feeding BI data to
applications: services, not
SQL, may be desired
Custom UI code may be
preferable to a BI tool
Third Nature, January 2008 Mark Madsen Slide 11
12. The Data Integration Layer
• Integration is the most complex
element of adding real time data.
• Inline vs. out of band, demand vs.
event-driven BI usage create
different DI requirements.
• You may not have exactly the
same metrics, attributes or data
extract logic.
• Don’t count on replacing the ETL
batch; more likely you are
augmenting it.
• You probably need to add new DI
technologies to your portfolio.
• Batch performance design isn’t
like real time design.
Third Nature, January 2008 Mark Madsen Slide 12
13. Speeding Up Data Integration Methods
Single batch
Frequent batch
Mini-batch
Continuous load
Streaming
Hourly+ Immediate
Third Nature, January 2008 Mark Madsen Slide 13
14. The Platform Layer: Data and Database
• Schemas will need changes.
• You don’t need to convert the
entire database to a real time
schema.
• One schema or two?
• Event-driven BI creates
different query patterns and
workloads.
• Configuration and tuning may
be different than what you are
used to with traditional BI.
• Application developers want
services or ORMs, not SQL.
Third Nature, January 2008 Mark Madsen Slide 14
15. Different Platform Workloads
Databases Dashboards OLAP Productivity BAM/BPM Reporting Analytics Applications
Data Consumers
Delivery
DW Platforms Three workloads:
Data loading +
Warehouse Mart Normal BI +
Database Real time BI
Content = complications
ODS Store
ETL EDR EII
Databases Documents Flat Files XML Queues ERP Applications
Third Nature, January 2008
Source Environments
Mark Madsen Slide 15
16. Development, Maintenance & Operations
• Real time decisions on real
time data mean data
quality plays a larger role,
and it’s harder to address.
• Warehouse availability
becomes much more
important to the business,
and it isn’t just the
database – it’s everything.
• Performance and meeting
strict BI SLAs will rise in
importance since you are
now tied in to business
operations.
Third Nature, January 2008 Mark Madsen Slide 16
17. A Prescription for Getting Started
1. Star with a decision
process
2. Define data needs for the
process
3. Ensure that data is
available at the right
latency
4. Determine appropriate
data integration
technologies.
5. Design and initiate
upstream work
6. Build
Third Nature, January 2008 Mark Madsen Slide 17
19. CC Image Attributions
Thanks to the people who supplied the creative commons licensed images used in this presentation:
• Divers - http://flickr.com/photos/raveller/
• Fast dog - http://flickr.com/photos/marinacvinhal/379111290/
• Febo - http://flickr.com/photos/igor/419425754/
• Subway - http://flickr.com/photos/neilsphotoalbum/504517855/
• Cadillac ranch - http://flickr.com/photos/whatknot/179655095/
Third Nature, January 2008 Mark Madsen Slide 19
20. About the Presenter
Mark Madsen is president of Third
Nature, a technology research and
consulting firm focused on business
intelligence, data integration and
data management. Mark is an
award-winning author, architect and
CTO whose work has been featured
in numerous industry publications.
Over the past ten years Mark
received awards for his work from
the American Productivity & Quality
Center, TDWI, and the Smithsonian
Institute. He is an international
speaker, a contributing editor at
Intelligent Enterprise, and manages
the open source channel at the
Business Intelligence Network. For
more information or to contact Mark,
visit http://ThirdNature.net.
Page 20