• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Warehouse Architecture Best Practices
 

Data Warehouse Architecture Best Practices

on

  • 9,527 views

 

Statistics

Views

Total Views
9,527
Views on SlideShare
9,521
Embed Views
6

Actions

Likes
3
Downloads
498
Comments
0

2 Embeds 6

http://www.slideshare.net 5
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Evan Silver: - past: Oracle: 6+ yrs; 2: tech support, sales, education, consulting consulting: implementation side, design architect, technical leads - present: Bianix: may-00; ? - why bi for me: no commitment to technology, front/back room got to see some of the world.
  • A. Many have heard various people claim: Tools are the solution for data warehousing problems: Oracle: - Oracle8i’s Data Warehousing functionality - Oracle’s Designer 2K (reverse engineering) - Oracle’s Express/Discoverer - Oracle’s Warehouse Builder Sagent: sagent data loader, sagent data access server Informatica: powercentre, powermart B. Ralph Kimball in Seatlle opened his course by stating that he’s here not to talk about technology, (there’s a time and place in the project for it), but rather he’s here to talk about data warehousing
  • DW users typically consider the higher slices of the pyramid. Although rules are changing , both business and technology: - business: some users up above want to look at detail below - technology: allows this to happen vs. only pre-aggregated data in dw
  • A. Business value: 1. the way businesses are run today is changing 2. annual reports in the past were adequate 3. businesses are insisting on broader , more detailed data, more meaningful views - understanding their customers’ behaviour (CRM) e.g. British Airways (go down to ticket lines) - individual button clicks on web sites (clickstream) 4. past - volumes , vs. today - profitability (manu: where can I cut on costs?) B. Business requirements: 1. We become translators : where we learn about our customer needs and wants , -- we become listeners (pre-requisite are listening skills) C. End user acceptance: 1. Success for (bi professionals) is derived from dw usage. 2. Effective way to ensure usage is through simplicity. Former clients had big plans of moving from mainframe / green screen reporting to dw analytical -- dw not simple enough or not enough training. Iocc/Celestica/BMO :
  • A. Business value: 1. the way businesses are run today is changing 2. annual reports in the past were adequate 3. businesses are insisting on broader , more detailed data, more meaningful views - understanding their customers’ behaviour (CRM) e.g. British Airways (go down to ticket lines) - individual button clicks on web sites (clickstream) 4. past - volumes , vs. today - profitability (manu: where can I cut on costs?) B. Business requirements: 1. We become translators : where we learn about our customer needs and wants , -- we become listeners (pre-requisite are listening skills) C. End user acceptance: 1. Success for (bi professionals) is derived from dw usage. 2. Effective way to ensure usage is through simplicity. Former clients had big plans of moving from mainframe / green screen reporting to dw analytical -- dw not simple enough or not enough training. Iocc/Celestica/BMO :
  • 1. end users come hard wired with the basic business logic, they know what they would like to see. 2. this is an opportunity to let the end users become the warehouse designer for a while - let them design what is needed. 3. Let them give you their wish list top-line report (kimball) - which represents the undiscounted value of the products shipped to the customer; ( revenue) bottom-line report (kimball) - which represents the money left over after discounts, allowances, cost; ( profit) these reports exists on the profit and loss statement (P&L)
  • A. Our data warehouse design shouldn’t have built-in preferences for the business questions that happens to be asked for this month, and next month we need to change our data schema to answer to new business question. B. Take human size proportions when tackling the enterprise solution - We also want separate subject areas being implemented, with each data mart having its own atomic data. vs. galactic - Collection of data marts will function together and add up to an enterprise data warehouse , and be able to “drill across” to assemble integrated views of the enterprise. C. We do want to be able to add new numerical measurements, and new dimensions to our data environment without modifying our database schemas. - We don’t want to duplicate numerical measurements (largest) . D. Don’t want to interrupt usage of DW to add new areas.
  • simplicity : means that is works ‘like it is supposed to’ complexity : end users wont use something that is complex intuitive/obvious: use templates that can be invoked by a click of a button visible/memorable : can turn your back on the screen, attend a meeting an describe what you’ve seen on the screen note: understand the needs of users and classify user needs into groups/responsibilities . - exec: dashboard (simplistic) - analysts: slice ‘n dice reporting (advance)
  • Technology: Data Warehousing is on a steep learning curve on which query response times are dropping very rapidly as we learn how to use indexes, aggregations and new query technology. (dbms - designed specifically for dw eg. 9i, redbrick and other analytical servers) Not-Technology: Data Warehouse Manager also need to re-evaluate the solutions if the response times are not quite instantaneous. (re-examine business rules - business rules change, and re-visiting design is a good thing) Phase in DW project: Transition to Production Phase is where we start tuning the response time for our queries. However as we all know there’s no such thing as an “acceptable” response time measured in hours or even minutes.
  • why are the 2 worlds so different: - users are different - data content is different - data structures are different - hardware/software is different - administration is different
  • because there is no redundancy in the data , a transaction that changes only needs to be touched in 1 place at the database level. this resulted in huge improvements in transaction processing since the early 80’s
  • A. all tables look symmetric , there’s no way to tell which is the largest or more important table. There is also no way to tell which table contains what data ie. numerical data or which holds static or near static descriptors of the objects. B. people have a hard time visualizing design and keeping the design in their heads C. there are a large number of possible connection paths between tables (foreign keys)
  • A. the environment is more predictable which allows the database and tools to make strong assumptions about the data to aid in performance B. constrain the dimensions then attach the big table.
  • A. ODS 1. business user visible (queryable source of data for users) 2. ods has a split personality: * a. provide real-time, operational role (historical) Inmon: subject-oriented, integrated (diff operational systems/sub. areas),volatile, current value data store, containing corporate detailed data eg. Operational reporting (from various areas) eg. banks - current loans, checking, savings account (past couple mnths) integrated current balances and recent history from separate accounts under 1 customer number. (one-to-one customer relationship for customer service) design: flat tables(?) with no relationship defined - current data (oper., txn level data) problem : can’t structure a single system to meet oper and dss and performance * b. reporting or dss capabilities (Changes to ODS: volatile: not anymore -- but rather added to (history) Kimball: believes with hardware/software technologies we can relocate the ODS into the front edge of the DW system (data mining tools) design: star schema B. OLAP - OLAP vendors’ data technology is nonrelational and is almost always based on an explicit multidimensional cube of data (MDDBs) - OLAP vendors’ design is very similar to the data designs adopted for data warehousing (but on a smaller scale -- more towards a data mart)
  • A. ODS 1. business user visible (queryable source of data for users) 2. ods has a split personality: * a. provide real-time, operational role (historical) Inmon: subject-oriented, integrated (diff operational systems/sub. areas),volatile, current value data store, containing corporate detailed data eg. Operational reporting (from various areas) eg. banks - current loans, checking, savings account (past couple mnths) integrated current balances and recent history from separate accounts under 1 customer number. (one-to-one customer relationship for customer service) design: flat tables(?) with no relationship defined - current data (oper., txn level data) problem : can’t structure a single system to meet oper and dss and performance * b. reporting or dss capabilities (Changes to ODS: volatile: not anymore -- but rather added to (history) Kimball: believes with hardware/software technologies we can relocate the ODS into the front edge of the DW system (data mining tools) design: star schema B. OLAP - OLAP vendors’ data technology is nonrelational and is almost always based on an explicit multidimensional cube of data (MDDBs) - OLAP vendors’ design is very similar to the data designs adopted for data warehousing (but on a smaller scale -- more towards a data mart)
  • A. ODS 1. business user visible (queryable source of data for users) 2. ods has a split personality: * a. provide real-time, operational role (historical) Inmon: subject-oriented, integrated (diff operational systems/sub. areas),volatile, current value data store, containing corporate detailed data eg. Operational reporting (from various areas) eg. banks - current loans, checking, savings account (past couple mnths) integrated current balances and recent history from separate accounts under 1 customer number. (one-to-one customer relationship for customer service) design: flat tables(?) with no relationship defined - current data (oper., txn level data) problem : can’t structure a single system to meet oper and dss and performance * b. reporting or dss capabilities (Changes to ODS: volatile: not anymore -- but rather added to (history) Kimball: believes with hardware/software technologies we can relocate the ODS into the front edge of the DW system (data mining tools) design: star schema B. OLAP - OLAP vendors’ data technology is nonrelational and is almost always based on an explicit multidimensional cube of data (MDDBs) - OLAP vendors’ design is very similar to the data designs adopted for data warehousing (but on a smaller scale -- more towards a data mart)
  • - most widely debated issues in data warehousing is how to plan the warehouse construction? - monolithic approach - build EDW all at once - too daunting - 15 minute/ data marts in a box - defeats overall goal of EDW - What can we do: - incremental approach - bit size pieces all tied together through a common architecture there are other ways to build an Enterprise DW - even one sanctioned by Oracle. But the Incremental DWM has been proven through many successful projects.
  • The Data Warehouse Bus Architecture is the key to Kimball’s enterprise DW architecture vision . It provides the glue that holds the incremental data marts together , and ensures the whole is more than the sum of the parts. Oracle DWM - Incremental and Kimball’s EDW vision tell us what to do . (strategy, definition, analysis, design, build, t to p, discovery) The Data Warehouse Bus Architecture shows us how to do this .
  • in the architecture phase before implementing specific data marts, the goals are to produce a master suite of conformed dimensions and standardize the definitions of facts. most conformed dimensions will naturally be defined at the most granular level possible eg. the grain for a customer dimension will be the individual customer (surrogate keys) - take the pledge of using these master conformed dimensions (is as much a political decision as it is a technical one , and must be supported at the highest executive levels) - we need conformed facts when we drill across the data marts in a single report e.g. reporting revenue and sales for a particular region in Sagent - revenue and sales has to mean the same thing in both data marts, and be reportable along the same time periods and geographies (as an example)
  • conformed dimensions is a dimension that mean the same thing with every possible fact table to which it can be joined.
  • conformed dimensions is a dimension that mean the same thing with every possible fact table to which it can be joined.
  • conformed dimensions is a dimension that mean the same thing with every possible fact table to which it can be joined.
  • conformed dimensions is a dimension that mean the same thing with every possible fact table to which it can be joined.
  • The term ‘federated data warehouse’ is one that has been coined by middleware vendors. A better term might be ‘federated database’, as there are good reasons to think that a ‘federated data warehouse’ is not a data warehouse at all. Does anyone know any such reasons? A bunch of databases (from different vendors) joined together from querying. Eda-sql/I-way (Transparent gateway) product from information builders A. ODS 1. business user visible (queryable source of data for users) 2. ods has a split personality: * a. provide real-time, operational role (historical) Inmon: subject-oriented, integrated (diff operational systems/sub. areas),volatile, current value data store, containing corporate detailed data eg. Operational reporting (from various areas) eg. banks - current loans, checking, savings account (past couple mnths) integrated current balances and recent history from separate accounts under 1 customer number. (one-to-one customer relationship for customer service) design: flat tables(?) with no relationship defined - current data (oper., txn level data) problem : can’t structure a single system to meet oper and dss and performance * b. reporting or dss capabilities (Changes to ODS: volatile: not anymore -- but rather added to (history) Kimball: believes with hardware/software technologies we can relocate the ODS into the front edge of the DW system (data mining tools) design: star schema B. OLAP - OLAP vendors’ data technology is nonrelational and is almost always based on an explicit multidimensional cube of data (MDDBs) - OLAP vendors’ design is very similar to the data designs adopted for data warehousing (but on a smaller scale -- more towards a data mart)

Data Warehouse Architecture Best Practices Data Warehouse Architecture Best Practices Presentation Transcript

  • December 5, 2005 Speaker: R. Michael Pickering President, Cohesion Systems Consulting Inc. Data Warehouse Architecture Best Practices
  • Agenda
    • Introductions
    • Business Intelligence Background
    • Architecture Best Practices
    • Questions & Answers
  • Introductions Data Warehouse Architecture Best Practices
  • Presenter Biography
    • R. Michael Pickering
      • President and Chief Architect,
      • Cohesion Systems Consulting Inc.
        • previously, Managing Consultant, BI&W, Oracle Consulting (Canada)
        • before that, Red Brick Systems, Inc.
      • over 8 years DW experience
        • Manulife Reinsurance, Bell Canada, USDA, Kraft Foods, LCBO, Telecom Argentina, Nortel Networks, Procter & Gamble, Bayer, Syncrude, OMoHLTC…
      • Mr. Pickering has had DW articles published in The Handbook of Data Management
  • Cohesion Systems Consulting
    • Provides DW and BI services, specializing in:
      • Architecture & Implementation Consulting
      • Project Management
      • Databases, Appliances & Emerging Technology
      • Training & Mentoring
    • Since inception in 2000, clients have included Enbridge, CIBC, The Bank of New York, Loyalty Management Group, Canada Post Borderfree, Katz Group
  • Audience Survey
    • By a show of hands, please indicate your experience with:
      • normalization
      • dimensional modeling
      • operational data store
      • data consolidation
      • Extract Transform Load (ETL)
      • metadata architecture
      • DW appliances
  • Business Intelligence Background Data Warehouse Architecture Best Practices
  • What is Business Intelligence?
    • A Data Warehouse is usually one component of an overall business intelligence solution
    • IT people may be tempted to think in terms of products and technologies BUT...
  • Overarching Goal
    • The overarching goal of business intelligence is to provide the information necessary to MANAGE a business
    • This means providing information in support of management decision making, which is why BI is also called “Decision Support”
  • BI is about “Data Abstraction”
        • wisdom
        • knowledge
        • information
        • data
    • audience for a data warehouse typically considers higher slices of data abstraction pyramid
    • lowest level of pyramid is too detailed & unwieldy
    Stages (4)
  • It’s Not Technology
    • Business Intelligence is about delivering business value
      • provide tangible benefit by answering important questions that can help the business to achieve its strategic focus
        • Improving profitability
          • Who are our five most profitable clients?
          • What are our least profitable products?
        • Reducing cost
          • Who are our lowest cost suppliers?
          • Which materials incur highest spoilage costs?
        • Improving customer satisfaction
          • What factors may lead to lost customers?
  • Business of BI
    • In some cases, legislation such as Sarbanes-Oxley or Basel II makes some kind of BI fundamental to doing business
    • Many leading companies use BI to achieve competitive advantage
      • E.g. Walmart, Dell, Amazon.com, Kraft, American Express, etc…
  • Data Warehouse Architecture
    • architecture is about delivering an elegant solution that meets the solution requirements
      • this means really understanding the problem
    • DW architecture is part art, part science
  • Good Architecture
    • ‘ It’s not easy to describe a good design, but I’ll know it when I see it’
  • BI Architecture Requirements
    • must recognize change as a constant
    • take incremental development approach
    • existing applications must continue to work
    • need to allow more data and new types of data to be added
  • End User Acceptance
    • understandability
      • understandability is in the eyes of the beholder
      • want to hide the complexity
      • try to make it:
        • intuitive, obvious
        • visible, memorable
  • End User Acceptance
    • performance
      • don’t want to interrupt the thinking process
      • provide one click, instantaneous access
      • warehouse must be available, “production” system
  • Architecture Best Practices Data Warehouse Architecture Best Practices
  • High Level Architecture
    • remember the different “worlds”
      • on-line transaction processing (OLTP)
      • business intelligence systems (BIS)
    • users are different
    • data content is different
    • data structures are different
    • architecture & methodology must be different
  • Two Different Worlds
    • On-Line Transaction Processing
      • Entity Relational Data Model
        • created in 1960’s to address performance issues with relational database implementations
        • normalized to most efficiently get data in
        • divides the data into many discrete entities
        • many relationships between these entities
        • this approach was documented by C.J. Date in An Introduction to Database Systems
  • Two Different Worlds
    • Business Intelligence Systems
      • Dimensional Data Model
        • also called star schema
        • designed to easily get information out
        • fewer relationships than ERD, the only table with multiple joins connecting to other tables is the central table
        • developed in 1960’s by data service providers, formalized by Ralph Kimball in The Data Warehouse Toolkit
  • Entity Relation Disadvantages
    • all tables look the same
    • people can’t visualize/remember diagrams
    • software can’t navigate as schema becomes too complex
    • business processes mixed together
    • many artificial keys created
  • Dimensional Model Advantages
    • simplicity
    • humans can navigate and remember
    • software can navigate deterministically
    • business process explicitly separated (Data Mart)
    • not so many keys (keys = # of attendant tables)
  • Best Practice #1
    • Use a data model that is optimized for information retrieval
      • dimensional model
      • denormalized
      • hybrid approach
  • Data Acquisition Processes
    • Extract Transform Load (ETL)
      • the process of unloading or copying data from the source systems, transforming it into the format and data model required in the BI environment, and loading it to the DW
      • also, a software development tool for building ETL processes (an ETL tool)
      • many production DWs use COBOL or other general-purpose programming languages to implement ETL
  • Data Quality Assurance
    • data cleansing
      • the process of validating and enriching the data as it is published to the DW
      • also, a software development tool for building data cleansing processes (a data cleansing tool)
      • many production DWs have only very rudimentary data quality assurance processes
  • Data Acquisition & Cleansing
    • getting data loaded efficiently and correctly is critical to the success of your DW
      • implementation of data acquisition & cleansing processes represents from 50 to 80% of effort on typical DW projects
      • inaccurate data content can be ‘the kiss of death’ for user acceptance
  • Best Practice #2
    • Carefully design the data acquisition and cleansing processes for your DW
      • Ensure the data is processed efficiently and accurately
      • Consider acquiring ETL and Data Cleansing tools
      • Use them well!
  • Data Model
    • Already discussed the benefits of a dimensional model
    • No matter whether dimensional modeling or any other design approach is used, the data model must be documented
  • Documenting the Data Model
    • The best practice is to use some kind of data modeling tool
      • CA ERwin
      • Sybase PowerDesigner
      • Oracle Designer
      • IBM Rational Rose
      • Etc.
    • Different tools support different modeling notations, but they are more or less equivalent anyway
    • Most tools allow sharing of their metadata with an ETL tool
  • Data Model Standards
    • data model standards appropriate for the environment and tools chosen in your data warehouse should be adopted
    • considerations should be given to data access tool(s) and integration with overall enterprise standards
    • standards must be documented and enforced within the DW team
      • someone must ‘own’ the data model
    • to ensure a quality data model, all changes should be reviewed thru some formal process
  • Data Model Metadata
    • Business definitions should be recorded for every field (unless they are technical fields only)
    • Domain of data should be recorded
    • Sample values should be included
    • As more metadata is populated into the modeling tool it becomes increasingly important to be able to share this data across ETL and Data Access tools
  • Metadata Architecture
    • The strategy for sharing data model and other metadata should be formalized and documented
    • Metadata management tools should be considered & the overall metadata architecture should be carefully planned
  • Best Practice #3
    • Design a metadata architecture that allows sharing of metadata between components of your DW
      • consider metadata standards such as OMG’s Common Warehouse Metamodel (CWM)
  • Alternative Architecture Approaches
    • Bill Inmon: “Corporate Information Factory”
    • Hub and Spoke philosophy
    • “ JBOC” – just a bunch of cubes
    • Let it evolve naturally
  • What We Want (Architectural Principal)
    • In most cases, business and IT agree that the data warehouse should provide a ‘single version of the truth’
    • Any approach that can result in disparate data marts or cubes is undesireable
    • This is known as data silos or…
  • Enterprise DW Architecture
    • how to design an enterprise data warehouse and ensure a ‘single version of the truth’?
    • according to Kimball:
      • start with an overall data architecture phase
      • use “Data Warehouse Bus” design to integrate multiple data marts
      • use incremental approach by building one data mart at a time
  • Data Warehouse Bus Architecture
    • named for the bus in a computer
      • standard interface that allows you to plug in cdrom, disk drive, etc.
      • these peripherals work together smoothly
    • provides framework for data marts to fit together
    • allows separate data marts to be implemented by different groups, even at different times
  • Data Mart Definition
    • data mart is a complete subset of the overall data warehouse
      • a single business process OR
      • a group of related business processes
    • think of a data mart as a collection of related fact tables sharing conformed dimensions, aka a ‘fact constellation’
  • Designing The DW Bus
    • determine which dimensions will be shared across multiple data marts
    • conform the shared dimensions
        • produce a master suite of shared dimensions
    • determine which facts will be shared across data marts
    • conform the facts
        • standardize the definitions of facts
  • Dimension Granularity
    • conformed dimensions will usually be granular
      • makes it easy to integrate with various base level fact tables
      • easy to extend fact table by adding new facts
      • no need to drop or reload fact tables, and no keys have to be changed
  • Conforming Dimensions
    • by adhering to standards, the separate data marts can be plugged together
      • e.g. customer, product, time
    • they can even share data usefully, for example in a drill across report
    • ensures reports or queries from different data marts share the same context
  • Conforming Dimensions (cont’d)
    • accomplish this by adding any dimension attribute(s) needed in any data mart(s) to the standard dimension definition
      • attributes not needed everywhere can always be ignored
    • typically harder to determine how to load conformed dimensions than to design them initially
      • need a single integrated ETL process
      • what is the SOR for each attribute?
      • how do we deal with attributes for which there is more than one possible SOR?
  • Conforming Facts
    • in an enterprise, some metrics may not have the same generally accepted definition across all business units
    • conforming facts is generally a bigger design challenge than conforming dimensions
      • why?
  • Conforming Facts - Benefits
    • ensures the constituent data marts can as clearly as possible represent fact data expressed on the same basis using consistent definitions
    • ensures reports or queries from different data marts share consistent content
    • success of an Enterprise DW hinges on successfully conformed facts
      • any perceived inconsistencies in fact definitions across data marts will generally be considered to be a DW bug or data problem by users
      • if users don’t have full confidence in data quality they may stop using the DW
  • Data Consolidation
    • a current trend in BI/DW is ‘data consolidation’
    • from a software vendor perspective, it is tempting to simplify this:
      • ‘we can keep all the tables for all your disparate applications in one physical database’
  • Data Integration
    • To truly achieve ‘a single version of the truth’, must do more than simply consolidating application databases
    • Must integrate data models and establish common terms of reference
  • Best Practice #4
    • Take an approach that consolidates data into ‘a single version of the truth’
      • Data Warehouse Bus
        • conformed dimensions & facts
      • OR?
  • Operational Data Store (ODS)
    • a single point of integration for disparate operational systems
    • contains integrated data at the most detailed level (transactional)
    • may be loaded in ‘near real time’ or periodically
    • can be used for centralized operational reporting
  • Role of an ODS in DW Architecture
    • In the case where an ODS is a necessary component of the overall DW, it should be carefully integrated into the overall architecture
    • Can also be used for:
      • Staging area
      • Master/reference data management
      • Etc…
  • ODS Data Model
    • Not clear if any design approach for an ODS data model has emerged as a best practice
      • normalized
      • dimensional
      • denormalized/hybrid
      • any suggestions?
  • Best Practice #5
    • Consider implementing an ODS only when information retrieval requirements are near the bottom of the data abstraction pyramid and/or when there are multiple operational sources that need to be accessed
      • Must ensure that the data model is integrated, not just consolidated
      • May consider 3NF data model
      • Avoid at all costs a ‘data dumping ground’
  • Capacity Planning
    • DW workloads are typically very demanding, especially for I/O capacity
    • Successful implementations tend to grow very quickly, both in number of users and data volume
    • Rules of thumb do exist for sizing the hardware platform to provide adequate initial performance
      • typically based on estimated ‘raw’ data size of proposed database e.g. 100-150 Gb per modern CPU
  • SMP Server Scale Up
    • Scaling performance within a single SMP server is referred to as ‘scale up’
    • Database benchmarks suggest Windows scalability is near that of Linux
    • IBM claims near-linear scalability for Linux (on commodity hardware) up to about 4 processors
      • Probably not cost effective to scale up Linux much beyond 4 processors
    • IBM claims near-linear scalability for AIX on POWER5 up to about 8 processors
  • Scale Out
    • There is an increasing trend in IT to ‘scale out’ processing capacity by deploying many small, commodity servers rather than a single large SMP system
    • This strategy tends to work well for relatively simple applications such as network or web servers
    • For very complex workloads such as a data warehouse, this strategy is much more difficult to effectively implement
      • Especially so for the database server itself
  • Scale Up vs. Scale Out
    • To obtain the total number of processors required for the estimated DW workload, must plan either to scale up or scale out
    • Both options are viable but, all other things being equal, scaling up is less disruptive to end users and requires less work to implement
      • scaling up can offer lower hardware investment, if practical
      • however, network bandwidth or latency issues can limit effectiveness of parallelism
  • Best Practice #6
    • Create a capacity plan for your BI application & monitor it carefully
    • Consider future additional performance demands
      • Establish standard performance benchmark queries and regularly run them
      • Implement capacity monitoring tools
      • Build scalability into your architecture
      • May need to allow for scaling both up and out!
  • Open Source Affordability
    • Another emerging trend in IT generally is to utilize Open Source software running on commodity hardware
      • this is expected to offer lower total cost of ownership
      • certainly, GNU/Linux and other Open Source initiatives do provide very good functionality and quality for minimal cost
    • This trend also applies to BI & DW:
      • most traditional rdbms’s are now supported on Linux
      • however, open source rdbms’s lag behind on providing good performance for DW queries
  • DW Appliances
    • DW appliances, consisting of packaged solutions providing all required software and hardware, are beginning to offer very promising price/performance
    • production experience is limited so far, so this is not yet a ‘best practice’
  • Q & A Data Warehouse Architecture Best Practices
  • the modern art of data abstraction cohesion systems consulting inc