How Real TIme Data Changes the Data Warehouse
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

How Real TIme Data Changes the Data Warehouse

on

  • 9,855 views

Surveys show a growing demand for more up-to-date data in our BI environments. To meet these needs requires changing from a strict reliance on nightly batch-style ETL to other methods. What is often ...

Surveys show a growing demand for more up-to-date data in our BI environments. To meet these needs requires changing from a strict reliance on nightly batch-style ETL to other methods. What is often ignored is how this affects the data warehouse. This shift introduces new technology and methods, which means the warehouse must support new types of workloads.
• Methods and tools for processing up-to-date data
• New requirements for your data warehouse database or platform
• What to look for as you address these requirements

Statistics

Views

Total Views
9,855
Views on SlideShare
9,779
Embed Views
76

Actions

Likes
6
Downloads
506
Comments
0

1 Embed 76

http://www.slideshare.net 76

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

How Real TIme Data Changes the Data Warehouse Presentation Transcript

  • 1. How Real Time Data Requirements Change the Data Warehouse Environment Mark Madsen – September 17, 2008 www.ThirdNature.net Attribution-NonCommercial-No Derivative http://creativecommons.org/licenses/by-nc-nd/3.0/us/
  • 2. Outline What’s real-time about? Impacts on the data warehouse architecture Delivering data to users Extracting the data Storing the data Operations Getting started Third Nature, January 2008 Mark Madsen Slide 2
  • 3. Speeding Up the Data Warehouse Why? Faster reaction time Reduced decision time New process capabilities Third Nature, January 2008 Mark Madsen Slide 3
  • 4. Which Decisions Benefit? Strategic Operational Decision time flexible, long cycle constrained, short cycle Decision scope broad, organizational narrow, departmental or process Decision model Complex Simple Data latency High, history is core Low, recent data is to decisions core to decisions Data scope Many sources, many Few sources, types, aggregated structured, detailed Most real time needs will be driven by operational decision making, not strategic decisions. Third Nature, January 2008 Mark Madsen Slide 4
  • 5. Strategy, Decisions and Data Latency Goal Increase share of low to mid market customers Strategy Reduce cost of products sold Improve promotional performance Tactics Efficient sourcing Decrease Out of Stocks Consolidate suppliers Improve delivery compliance Catch out of stocks before they occur BI Needs Reports & Dashboards, alerts Real time alerts & spreadsheets & scorecards embedded analytics Third Nature, January 2008 Mark Madsen Slide 5
  • 6. What People Are Doing Today Monthly W eekly Daily Multiple times per day On demand 2002 32% 34% 69% 15% 6% 2004 27% 29% 65% 30% 19% 2006 3 24 44 29 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Sources: TDWI, Gartner At the same time, data volumes are rising for most data warehouses at 50% to 100% per year. Third Nature, January 2008 Mark Madsen Slide 6
  • 7. BI Efforts Involving Real Time Data Access Terms you may hear from the BI market that imply real time: Operational BI Embedded analytics Decision automation Complex event processing Event-driven BI Process-driven BI They are all similar in requiring some level of low latency data access. Third Nature, January 2008 Mark Madsen Slide 7
  • 8. Impacts on the DW Architecture Databases Dashboards OLAP Productivity BAM/BPM Reporting Analytics Applications Data Consumers Delivery DW Platforms Adding current data to the system Warehouse Mart requires effort at all Database three layers Content ODS Store ETL EDR EII Databases Documents Flat Files XML Queues ERP Applications Source Environments Third Nature, January 2008 Mark Madsen Slide 8
  • 9. One Architecture or Two? In-line with process: RT BI • Real time data flows separately from the warehouse data • May include a low-latency data store in the real time environment Process • This model be needed for extremely low latency data BI • More applicable for event-driven Batch DW Out of band: • Data to the consumer first flows Process through the DW • Unified architecture for both low and high latency data BI & RT BI • More applicable for on-demand DW Third Nature, January 2008 Mark Madsen Slide 9
  • 10. User Interface: Two BI Usage Models Demand driven • Users ask for current data • Most BI tools work this way • Harder to adapt these tools to event-driven models Event driven • System takes action based on data, e.g. alerts, rule engines • May not have (or need) an end user interface • Need understanding of decision & action process for this model Third Nature, January 2008 Mark Madsen Slide 10
  • 11. BI Tools Need New Capabilities Embedding BI within applications • UI embedding • Full embedding Event-based integration Feeding BI data to applications: services, not SQL, may be desired Custom UI code may be preferable to a BI tool Third Nature, January 2008 Mark Madsen Slide 11
  • 12. The Data Integration Layer • Integration is the most complex element of adding real time data. • Inline vs. out of band, demand vs. event-driven BI usage create different DI requirements. • You may not have exactly the same metrics, attributes or data extract logic. • Don’t count on replacing the ETL batch; more likely you are augmenting it. • You probably need to add new DI technologies to your portfolio. • Batch performance design isn’t like real time design. Third Nature, January 2008 Mark Madsen Slide 12
  • 13. Speeding Up Data Integration Methods Single batch Frequent batch Mini-batch Continuous load Streaming Hourly+ Immediate Third Nature, January 2008 Mark Madsen Slide 13
  • 14. The Platform Layer: Data and Database • Schemas will need changes. • You don’t need to convert the entire database to a real time schema. • One schema or two? • Event-driven BI creates different query patterns and workloads. • Configuration and tuning may be different than what you are used to with traditional BI. • Application developers want services or ORMs, not SQL. Third Nature, January 2008 Mark Madsen Slide 14
  • 15. Different Platform Workloads Databases Dashboards OLAP Productivity BAM/BPM Reporting Analytics Applications Data Consumers Delivery DW Platforms Three workloads: Data loading + Warehouse Mart Normal BI + Database Real time BI Content = complications ODS Store ETL EDR EII Databases Documents Flat Files XML Queues ERP Applications Third Nature, January 2008 Source Environments Mark Madsen Slide 15
  • 16. Development, Maintenance & Operations • Real time decisions on real time data mean data quality plays a larger role, and it’s harder to address. • Warehouse availability becomes much more important to the business, and it isn’t just the database – it’s everything. • Performance and meeting strict BI SLAs will rise in importance since you are now tied in to business operations. Third Nature, January 2008 Mark Madsen Slide 16
  • 17. A Prescription for Getting Started 1. Star with a decision process 2. Define data needs for the process 3. Ensure that data is available at the right latency 4. Determine appropriate data integration technologies. 5. Design and initiate upstream work 6. Build Third Nature, January 2008 Mark Madsen Slide 17
  • 18. Thanks Third Nature, January 2008 Mark Madsen Slide 18
  • 19. CC Image Attributions Thanks to the people who supplied the creative commons licensed images used in this presentation: • Divers - http://flickr.com/photos/raveller/ • Fast dog - http://flickr.com/photos/marinacvinhal/379111290/ • Febo - http://flickr.com/photos/igor/419425754/ • Subway - http://flickr.com/photos/neilsphotoalbum/504517855/ • Cadillac ranch - http://flickr.com/photos/whatknot/179655095/ Third Nature, January 2008 Mark Madsen Slide 19
  • 20. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net. Page 20