DWH life cycle
Upcoming SlideShare
Loading in...5
×
 

DWH life cycle

on

  • 468 views

DWH life cycle

DWH life cycle

Statistics

Views

Total Views
468
Views on SlideShare
468
Embed Views
0

Actions

Likes
0
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

DWH life cycle DWH life cycle Presentation Transcript

  • Data WarehouseData Warehouse Life CycleLife Cycle
  • Data Warehouse DefinedData Warehouse Defined ““A data warehouse is a collection of corporateA data warehouse is a collection of corporate information, derived directly from operationalinformation, derived directly from operational systems and some external data sources. Itssystems and some external data sources. Its specific purpose is to supportspecific purpose is to support business decisions,business decisions, not business operationsnot business operations””
  • Characteristics of a DWCharacteristics of a DW  Subject-oriented DataSubject-oriented Data  collects all data for a subject, from different sourcescollects all data for a subject, from different sources  Read-only RequestsRead-only Requests  loaded during off-hours, read-only during day hoursloaded during off-hours, read-only during day hours  Interactive Features, ad-hoc queryInteractive Features, ad-hoc query  flexible design to handle spontaneous user queriesflexible design to handle spontaneous user queries  Pre-aggregated dataPre-aggregated data  to improve runtime performanceto improve runtime performance  Highly denormalized data structuresHighly denormalized data structures  Dimension tables with redundant columnsDimension tables with redundant columns
  • Components of a Data WarehouseComponents of a Data Warehouse Source Systems Data Staging Area DWH Servers End User Data Access Storage Flat Files RDBMS Processing No User Query Services Data Mart 1 Dimensional Conforms to DW Bus Data Mart 2 Query Tools Report Writers Mining Tools
  • Data ModelingData Modeling
  • Data ModelingData Modeling WHAT IS A DATA MODEL?WHAT IS A DATA MODEL? A data model is an abstraction of some aspect ofA data model is an abstraction of some aspect of the real world (system).the real world (system). WHY A DATA MODEL?WHY A DATA MODEL?  Helps to visualise the businessHelps to visualise the business  A model is a means of communication.A model is a means of communication.  Models help elicit and document requirements.Models help elicit and document requirements.  Models reduce the cost of change.Models reduce the cost of change.  Model is the essence of DW architecture based on whichModel is the essence of DW architecture based on which DW will be implementedDW will be implemented
  • What do we want to do with the data?What do we want to do with the data? Model depends on what kind of data analysis weModel depends on what kind of data analysis we want to do:want to do:  Different Data Analysis TechniquesDifferent Data Analysis Techniques  Query and reportingQuery and reporting  Display Query ResultsDisplay Query Results  Multidimensional analysisMultidimensional analysis  Analyse data content by looking at it in different perspectivesAnalyse data content by looking at it in different perspectives  Data miningData mining  discover patterns and clustering attributes in datadiscover patterns and clustering attributes in data
  • Impact of Data AnalysisImpact of Data Analysis Techniques on DMTechniques on DM  Query and reportingQuery and reporting  Normalized data modelNormalized data model  Select associated data elementsSelect associated data elements  summarize and group by categorysummarize and group by category  present resultspresent results  direct table scandirect table scan  ER with normalized / denormalized appropriateER with normalized / denormalized appropriate
  • Impact of Data AnalysisImpact of Data Analysis Techniques on DMTechniques on DM  Multidimensional analysisMultidimensional analysis  Fast and easy access to dataFast and easy access to data  Any number of analysis dimensions in anyAny number of analysis dimensions in any combinationscombinations  ER will mean many joinsER will mean many joins  Dimensional model appropriateDimensional model appropriate
  • Levels of modelingLevels of modeling  Conceptual modelingConceptual modeling  Describe data requirements from a business pointDescribe data requirements from a business point of view without technical detailsof view without technical details  Logical modelingLogical modeling  Refine conceptual modelsRefine conceptual models  Data structure oriented, platform independentData structure oriented, platform independent  Physical modelingPhysical modeling  Detailed specification of what is physicallyDetailed specification of what is physically implemented using specific technologyimplemented using specific technology
  • Conceptual ModelConceptual Model  A conceptual model shows data throughA conceptual model shows data through business eyes.business eyes.  All entities which have business meaning.All entities which have business meaning.  Important relationshipsImportant relationships  Few significant attributes in the entities.Few significant attributes in the entities.  Few identifiers or candidate keys.Few identifiers or candidate keys.
  • Logical ModelLogical Model  Replaces many-to-many relationships withReplaces many-to-many relationships with associative entities.associative entities.  Defines a full population of entity attributes.Defines a full population of entity attributes.  May use non-physical entities for domainsMay use non-physical entities for domains and sub-types.and sub-types.  Establishes entity identifiers.Establishes entity identifiers.  Has no specifics for any RDBMS orHas no specifics for any RDBMS or configuration.configuration.
  • Physical ModelPhysical Model  A Physical data model may includeA Physical data model may include  Referential IntegrityReferential Integrity  IndexesIndexes  ViewsViews  Alternate keys and other constraintsAlternate keys and other constraints  Tablespaces and physical storage objects.Tablespaces and physical storage objects.
  •  STAGING AREASTAGING AREA  YES ! (maybe multiple data models are required)YES ! (maybe multiple data models are required)  ODSODS  YES !YES !  DATAWAREHOUSE/DATAMARTDATAWAREHOUSE/DATAMART  YES!YES! What needs to be modeled during a data warehouse project
  • Data Modeling - TechniquesData Modeling - Techniques  Modeling techniquesModeling techniques  E-R ModelingE-R Modeling  Dimensional ModelingDimensional Modeling
  • Implementation and modelingImplementation and modeling stylesstyles  Modeling versus implementationModeling versus implementation  Modeling: describe what should be built to non-Modeling: describe what should be built to non- technical folkstechnical folks  Implementation: describe what is actually built toImplementation: describe what is actually built to technical folkstechnical folks
  •  Relational modelingRelational modeling  Use for implementationUse for implementation  Difficult to understand by non-technical folksDifficult to understand by non-technical folks  Dimensional modelingDimensional modeling  Use for modeling during analysis and design phasesUse for modeling during analysis and design phases  Can be implemented using other modeling styles e.g.Can be implemented using other modeling styles e.g. object-oriented, relationalobject-oriented, relational Implementation and modelingImplementation and modeling stylesstyles
  • Limitations of E-R ModelingLimitations of E-R Modeling  Poor PerformancePoor Performance  Tend to be very complex and difficult toTend to be very complex and difficult to navigate.navigate.
  • Dimensional ModelingDimensional Modeling  Dimensional modeling uses three basicDimensional modeling uses three basic concepts : measures, facts, dimensions.concepts : measures, facts, dimensions.  Is powerful in representing the requirementsIs powerful in representing the requirements of the business user in the context ofof the business user in the context of database tables.database tables.  Focuses on numeric data, such as valuesFocuses on numeric data, such as values counts, weights, balances and occurences.counts, weights, balances and occurences.
  •  Must identifyMust identify  Business process to be supportedBusiness process to be supported  Grain (level of detail)Grain (level of detail)  DimensionsDimensions  FactsFacts Dimensional modelingDimensional modeling
  • Conventions used in DimensionalConventions used in Dimensional modelingmodeling  FactsFacts  Measures(Variables)Measures(Variables)  DimensionsDimensions  Dimension membersDimension members  Dimension hierarchiesDimension hierarchies
  • FactsFacts  A fact is a collection of related data items,A fact is a collection of related data items, consisting of measures and context data.consisting of measures and context data.  Each fact typically represents a business item,Each fact typically represents a business item, a business transaction, or an event that can bea business transaction, or an event that can be used in analyzing the business or businessused in analyzing the business or business process.process.  Facts are measured, “continuously valued”,Facts are measured, “continuously valued”, rapidly changing information. Can berapidly changing information. Can be calculated and/or derived.calculated and/or derived.
  • Fact TableFact Table  A table that is used to store businessA table that is used to store business information (measures) that can be used ininformation (measures) that can be used in mathematical equations.mathematical equations.  QuantitiesQuantities  PercentagesPercentages  PricesPrices
  • DimensionsDimensions  A dimension is a collection of members orA dimension is a collection of members or units of the same type of views.units of the same type of views.  Dimensions determine the contextualDimensions determine the contextual background for the facts.background for the facts.  Dimensions represent the way businessDimensions represent the way business people talk about the data resulting from apeople talk about the data resulting from a business process, e.g., who, what, when,business process, e.g., who, what, when, where, why, howwhere, why, how
  • Dimension TableDimension Table  Table used to store qualitative data about factTable used to store qualitative data about fact recordsrecords  WhoWho  WhatWhat  WhenWhen  WhereWhere  WhyWhy
  • Dimension data should beDimension data should be  verbose, descriptiveverbose, descriptive  completecomplete  no misspellings, impossible valuesno misspellings, impossible values  indexedindexed  equally availableequally available  documented ( metadata to explain origin,documented ( metadata to explain origin, interpretation of each attribute)interpretation of each attribute)
  • Dimensional modelDimensional model  Visualise a dimensional model as a CUBEVisualise a dimensional model as a CUBE (hypercube because dimensions can be more than(hypercube because dimensions can be more than 3 in number)3 in number)  Operations for OLAPOperations for OLAP Drill DownDrill Down ::Higher level of detailHigher level of detail Roll UpRoll Up:: summarized level of datasummarized level of data (The navigation path is determined by hierarchies within(The navigation path is determined by hierarchies within dimensions.)dimensions.) SliceSlice:: cuts through the cube.Users can focus on specificcuts through the cube.Users can focus on specific perspectivesperspectives DiceDice:: rotates the cube to another perspective (change therotates the cube to another perspective (change the dimension)
  • Drill down …. Roll upDrill down …. Roll up
  • Slice and DiceSlice and Dice
  • DimensionsDimensions  Collection of members or units of the sameCollection of members or units of the same type of views.type of views.  determine the contextual background for thedetermine the contextual background for the facts.facts.  the parameters over which we want to performthe parameters over which we want to perform OLAPOLAP (eg.(eg. Time,Time, Location/region,Location/region, Customers)Customers)  MemberMember is a distinct name to determine data item’sis a distinct name to determine data item’s position (eg. Time - Month, quarter)position (eg. Time - Month, quarter)  HierarchyHierarchy arrange members into hierarchies or levelsarrange members into hierarchies or levels
  • HierarchiesHierarchies  Allow for the ‘rollup’ of data to moreAllow for the ‘rollup’ of data to more summarized levels.summarized levels.  TimeTime  dayday  monthmonth  quarterquarter  yearyear
  • HierarchiesHierarchies
  • AggregatesAggregates  Aggregate Tables are pre-stored summarizedAggregate Tables are pre-stored summarized tables… created at a higher level oftables… created at a higher level of granularity across any or all of thegranularity across any or all of the dimensions.dimensions.  If the existing granularity is Day wise sales,If the existing granularity is Day wise sales, then creating a separate month wise salesthen creating a separate month wise sales table is an example of Aggregate Table.table is an example of Aggregate Table.
  • AggregatesAggregates  The use of such aggregates is the single mostThe use of such aggregates is the single most effective tool the data warehouse designer haseffective tool the data warehouse designer has to improve query performance.to improve query performance.  Usage of Aggregates can increase theUsage of Aggregates can increase the performance of Queries by several times.performance of Queries by several times.
  • MeasuresMeasures  A measure is a numeric attribute of a fact,A measure is a numeric attribute of a fact, representing the performance or behaviour ofrepresenting the performance or behaviour of the business relative to dimensions.the business relative to dimensions.  The actual numbers are called as variables.The actual numbers are called as variables. eg. sales in money, sales volume, quantity supplied, supplyeg. sales in money, sales volume, quantity supplied, supply cost, transaction amountcost, transaction amount  A measure is determined by combinations ofA measure is determined by combinations of the members of the dimensions and is locatedthe members of the dimensions and is located on facts.on facts.
  • The CubeThe Cube
  • Types of FactsTypes of Facts  AdditiveAdditive  Able to add the facts along all the dimensionsAble to add the facts along all the dimensions  Discrete numerical measures eg. Retail sales in $Discrete numerical measures eg. Retail sales in $  Semi AdditiveSemi Additive  Snapshot, taken at a point in timeSnapshot, taken at a point in time  Measures of IntensityMeasures of Intensity  Not additive along time dimension eg. Account balance,Not additive along time dimension eg. Account balance, Inventory balanceInventory balance  Added and divided by number of time period to get aAdded and divided by number of time period to get a time-averagetime-average
  • Types of FactsTypes of Facts  Non AdditiveNon Additive  Numeric measures that cannot be added across anyNumeric measures that cannot be added across any dimensionsdimensions  Intensity measure averaged across all dimensions eg.Intensity measure averaged across all dimensions eg. Room temperatureRoom temperature  Textual facts - AVOID THEMTextual facts - AVOID THEM
  •  StarStar  Single fact table surrounded by denormalizedSingle fact table surrounded by denormalized dimension tablesdimension tables  The fact table primary key is the composite of theThe fact table primary key is the composite of the foreign keys (primary keys of dimension tables)foreign keys (primary keys of dimension tables)  Fact table contains transaction type information.Fact table contains transaction type information.  Many star schemas in a data martMany star schemas in a data mart  Easily understood by end users, more disk storageEasily understood by end users, more disk storage requiredrequired Common structures forCommon structures for Data Marts :Data Marts : Denormalize!Denormalize!
  • Example of Star SchemaExample of Star Schema
  •  SnowflakeSnowflake  Single fact table surrounded by normalized dimensionSingle fact table surrounded by normalized dimension tablestables  Normalizes dimension table to save data storage space.Normalizes dimension table to save data storage space.  When dimensions become very very largeWhen dimensions become very very large  Less intuitive, slower performance due to joinsLess intuitive, slower performance due to joins  May want to use both approaches, especially ifMay want to use both approaches, especially if supporting multiple end-user tools.supporting multiple end-user tools. Common structures forCommon structures for Data Marts:Data Marts: Denormalize!Denormalize!
  • Example of Snow flake schemaExample of Snow flake schema
  • Snowflake - DisadvantagesSnowflake - Disadvantages  Normalization of dimension makes it difficultNormalization of dimension makes it difficult for user to understandfor user to understand  Decreases the query performance because itDecreases the query performance because it involves more joinsinvolves more joins  Dimension tables are normally smaller than factDimension tables are normally smaller than fact tables - space may not be a major issue totables - space may not be a major issue to warrant snowflakingwarrant snowflaking
  • Keys …Keys …  Primary KeysPrimary Keys  uniquely identify a recorduniquely identify a record  Foreign KeysForeign Keys  primary key of another table referred hereprimary key of another table referred here  Surrogate KeysSurrogate Keys  system-generated key for dimensionssystem-generated key for dimensions  key on its own has no meaningkey on its own has no meaning  integer key, less spaceinteger key, less space
  • More Keys …More Keys …  Smart KeysSmart Keys  primary key out of various attributes ofprimary key out of various attributes of dimensiondimension  AVOID THEM!AVOID THEM!  Join to Fact table should be on single surrogateJoin to Fact table should be on single surrogate keykey  Production KeysProduction Keys  DO NOT USE Production defined attributesDO NOT USE Production defined attributes  Business may reuse/change them - DW cannot!Business may reuse/change them - DW cannot!
  • Basic Dimensional ModelingBasic Dimensional Modeling TechniquesTechniques  Slowing changing DimensionsSlowing changing Dimensions  Rapidly changing Small DimensionsRapidly changing Small Dimensions  Large DimensionsLarge Dimensions  Rapidly changing Large DimensionsRapidly changing Large Dimensions  Degenerate DimensionsDegenerate Dimensions  Junk DimensionsJunk Dimensions
  • Slowly Changing DimensionsSlowly Changing Dimensions A dimension is considered aA dimension is considered a Slowly ChangingSlowly Changing DimensionDimension when its attributes remainwhen its attributes remain almostalmost constant over time, requiring relatively minorconstant over time, requiring relatively minor alterations to represent the evolved state.alterations to represent the evolved state.
  • The Time DimensionThe Time Dimension Time_key day_of_week day_number_in_month day_number_overall week_number_in_year month quarter fiscal_period holiday_flag weekday_flag last_day_in_month_flag season event
  • Time DimensionTime Dimension  An exclusive Time dimension is requiredAn exclusive Time dimension is required because the SQL date semantics andbecause the SQL date semantics and functions cannot generate several importantfunctions cannot generate several important attributes required for analytical purposes.attributes required for analytical purposes.  Attributes like weekdays, weekends, fiscalAttributes like weekdays, weekends, fiscal period, holidays, season cannot be generatedperiod, holidays, season cannot be generated by SQL statements.by SQL statements.
  • Time DimensionTime Dimension  Moreover SQL date stamps occupy more spaceMoreover SQL date stamps occupy more space largely increasing the size of the fact table.largely increasing the size of the fact table.  Joins on such SQL generated date-stamps areJoins on such SQL generated date-stamps are costly decreasing the query speed significantly.costly decreasing the query speed significantly.
  • Time DimensionTime Dimension  The Day of week(Monday, ...) is useful toThe Day of week(Monday, ...) is useful to create reports comparing for ex. Mondaycreate reports comparing for ex. Monday sales to Friday sales.sales to Friday sales.  The Day number in month is useful forThe Day number in month is useful for comparing measures for the same day in eachcomparing measures for the same day in each month.month.  The last day in month flag is useful forThe last day in month flag is useful for performing payday analysis.performing payday analysis.
  • Time DimensionTime Dimension  The holiday flag and season attributes areThe holiday flag and season attributes are useful for holiday VS non-holiday analysisuseful for holiday VS non-holiday analysis and season business analysis.and season business analysis.  Event attribute is needed to record specialEvent attribute is needed to record special days like strike days, etc..days like strike days, etc..
  • ETVL OverviewETVL Overview
  • Introduction Source System 1 Source System 2 Source System 3 Staging Area Data warehouse E T V L E T V L Extraction, Transformation, Validation, Load
  • ExtractionExtraction  Source Systems (Multiple Source Systems)Source Systems (Multiple Source Systems)  Flat files, Excel, Legacy Systems, RDBMS etc.Flat files, Excel, Legacy Systems, RDBMS etc.  Frequency of ExtractionFrequency of Extraction  Staging Area (If any? How many?)Staging Area (If any? How many?)  Most Transformations from Source to StagingMost Transformations from Source to Staging  Cleansing and Data QualityCleansing and Data Quality  Data integrity, De-duplication, completeness,Data integrity, De-duplication, completeness, correctnesscorrectness
  • TransformationTransformation  Usage of toolsUsage of tools  Reusability of TransformationsReusability of Transformations  Reusability of MappingsReusability of Mappings  Different toolsDifferent tools  InformaticaInformatica  Warehouse BuilderWarehouse Builder  ETIETI  SagentSagent  PL/SQL scriptsPL/SQL scripts
  • LoadingLoading  Loading FrequencyLoading Frequency  Optimized LoadingOptimized Loading  IndexingIndexing  PartitioningPartitioning  AggregationAggregation  SumSum  AverageAverage  MaxMax  Update StrategyUpdate Strategy  Error HandlingError Handling
  • SynopsisSynopsis - Flat files, Excel, Legacy Systems, RDBMS- Flat files, Excel, Legacy Systems, RDBMS etc.etc.  Implement Business RulesImplement Business Rules  ODBC ConnectivityODBC Connectivity  Scheduling the ETVLScheduling the ETVL  Frequency of ExtractionFrequency of Extraction  Staging AreaStaging Area  Most Transformations from Source to StagingMost Transformations from Source to Staging
  • SynopsisSynopsis  Cleansing and Data QualityCleansing and Data Quality  Data integrity, De-duplication, completeness, correctnessData integrity, De-duplication, completeness, correctness  Rejected RecordsRejected Records  Exception Handling and Error LogException Handling and Error Log  Optimized LoadingOptimized Loading  Re-usabilityRe-usability  Aggregation of dataAggregation of data  Update StrategyUpdate Strategy
  • STAGING AREA - Some ClaritySTAGING AREA - Some Clarity  Staging AreaStaging Area  optionaloptional  to cleanse the source datato cleanse the source data  Accepts data from different sourcesAccepts data from different sources  Data model is required at staging areaData model is required at staging area  Multiple data models may be required for parkingMultiple data models may be required for parking different sources and for transformed data to bedifferent sources and for transformed data to be pushed out to warehousepushed out to warehouse
  • ODS - Some ClarityODS - Some Clarity  Operational Data StoreOperational Data Store  OptionalOptional  Granular, detailed level dataGranular, detailed level data  May feed warehouse (eg when warehouse isMay feed warehouse (eg when warehouse is aggregated)aggregated)  Usually a relational modelUsually a relational model  May keep data for a smaller time period thanMay keep data for a smaller time period than warehousewarehouse
  • A look at different DW architecturesA look at different DW architectures Operational Data External data Warehouse Manager L O A D M A N A G E R Q U E R Y M A N A G E R Detailed Information Summary information Meta Data OLAP
  • Data Warehouse Architecture - 2Data Warehouse Architecture - 2
  • Data Warehouse Architecture - 3Data Warehouse Architecture - 3
  • Data Warehouse Architecture - 4Data Warehouse Architecture - 4
  • DW ArchitectureDW Architecture  Architecture Choices depend onArchitecture Choices depend on  Current infrastructureCurrent infrastructure  Business environmentBusiness environment  Desired management and control structureDesired management and control structure  resourcesresources  commitment …..commitment …..  Data Warehouse/data martData Warehouse/data mart
  • DW ArchitectureDW Architecture  Architecture Choices determineArchitecture Choices determine  Where will DW reside?Where will DW reside?  Centrally / locally / distributedCentrally / locally / distributed  Where will it be managed from?Where will it be managed from?  Centrally / independentlyCentrally / independently  3 choices3 choices  GlobalGlobal  IndependentIndependent  InterconnectedInterconnected (or) a combination of these three(or) a combination of these three
  • DW ArchitectureDW Architecture  Global ArchitectureGlobal Architecture  related to scope of data access and storagerelated to scope of data access and storage  does not mean centralizeddoes not mean centralized  can be physically centralized or distributedcan be physically centralized or distributed  enterprise view of dataenterprise view of data  time-consuming & costly to implementtime-consuming & costly to implement
  • Global ArchitectureGlobal Architecture
  • DW ArchitectureDW Architecture  Independent ArchitectureIndependent Architecture  stand-alonestand-alone  controlled by a departmentcontrolled by a department  minimal integrationminimal integration  no global viewno global view  very fast to implementvery fast to implement
  • DW ArchitectureDW Architecture  Interconnected ArchitectureInterconnected Architecture  distributeddistributed  integrated and interconnectedintegrated and interconnected  gives a global view of enterprisegives a global view of enterprise  more complexitymore complexity  who manages / controls datawho manages / controls data  another tier in architecture to share common dataanother tier in architecture to share common data between multiple data martsbetween multiple data marts  have a data sharing schema across data martshave a data sharing schema across data marts
  • IndependentIndependent && Interconnected ArchitectureInterconnected Architecture
  • Types of Data WarehouseTypes of Data Warehouse  Enterprise Data WarehouseEnterprise Data Warehouse  Data MartData Mart Enterprise Data Warehouse Datamart Datamart Datamart
  • Enterprise data warehouseEnterprise data warehouse  Contains data drawn from multiple operationalContains data drawn from multiple operational systemssystems  Supports time- series and trend analysis acrossSupports time- series and trend analysis across different business areasdifferent business areas  Can be used as a transient storage area to cleanCan be used as a transient storage area to clean all data and ensure consistencyall data and ensure consistency  Can be used to populate data martsCan be used to populate data marts  Can be used for everyday and strategic decisionCan be used for everyday and strategic decision makingmaking
  • Data MartData Mart  Logical subset of enterprise data warehouseLogical subset of enterprise data warehouse  Organized around a single business processOrganized around a single business process  Based on granular dataBased on granular data  May or may not contain aggregatesMay or may not contain aggregates  Object of analytical processing by the endObject of analytical processing by the end user.user.  Less expensive and much smaller than a fullLess expensive and much smaller than a full blown corporate data warehouse.blown corporate data warehouse.
  • Distributed and CentralizedDistributed and Centralized Data warehousesData warehouses  DW sitting on a monolithic machine -DW sitting on a monolithic machine - unrealisticunrealistic  Separate machines, different OS, differentSeparate machines, different OS, different DB systems -DB systems - realityreality SolutionSolution  Share a uniform architecture to allow them toShare a uniform architecture to allow them to be fused coherentlybe fused coherently
  • Classical ArchitecturesClassical Architectures  Physical data warehouse (physical)Physical data warehouse (physical)  Data warehouse --> data martsData warehouse --> data marts  Data marts --> data warehouseData marts --> data warehouse  Parallel data warehouse and data martsParallel data warehouse and data marts
  • Physical data warehouse:Physical data warehouse: Data warehouse --> data martsData warehouse --> data marts •SOURCE DATA •External •Data •Operational Data •Staging Area •Data Warehouse •Data Marts •Physical Data Warehouse: •Data Warehouse --> Data Marts
  • Physical data warehouse:Physical data warehouse: Data marts --> data warehouseData marts --> data warehouse SOURCE DATA External Data Operational Data Staging Area Data Warehouse Data Marts Physical Data Warehouse: Data Marts --> Data Warehouse
  • Physical Data Warehouse:Physical Data Warehouse: Parallel Data Warehouse and DataParallel Data Warehouse and Data MartMart SOURCE DATA External Data Operational Data Staging Area Data Warehouse Data Marts Physical Data Warehouse: Parallel Data Warehouse & Data Marts
  • DW ImplementationDW Implementation ApproachesApproaches  Top DownTop Down  Bottom-upBottom-up  Combination of bothCombination of both  Choices depend on:Choices depend on:  current infrastructurecurrent infrastructure  resourcesresources  architecturearchitecture  ROIROI  Implementation speedImplementation speed
  • Top Down ImplementationTop Down Implementation
  • Bottom Up ImplementationBottom Up Implementation
  • DW ImplementationDW Implementation ApproachesApproaches Top DownTop Down  More planning and designMore planning and design initiallyinitially  Involve people fromInvolve people from different work-groups,different work-groups, departmentsdepartments  Data marts may be built laterData marts may be built later from Global DWfrom Global DW  Overall data model to beOverall data model to be decided up-frontdecided up-front Bottom UpBottom Up  Can plan initially withoutCan plan initially without waiting for globalwaiting for global infrastructureinfrastructure  built incrementallybuilt incrementally  can be built before or incan be built before or in parallel with Global DWparallel with Global DW  Less complexity in designLess complexity in design
  • DW ImplementationDW Implementation ApproachesApproaches Top DownTop Down  Consistent data definition andConsistent data definition and enforcement of business rulesenforcement of business rules across enterpriseacross enterprise  High cost, lengthy process,High cost, lengthy process, time consumingtime consuming  Works well when there isWorks well when there is centralized IS departmentcentralized IS department responsible for all H/W andresponsible for all H/W and resourcesresources Bottom UpBottom Up  Data redundancy andData redundancy and inconsistency betweeninconsistency between data marts may occurdata marts may occur  Integration requires greatIntegration requires great planningplanning  Less cost of H/W andLess cost of H/W and other resourcesother resources  Faster pay-backFaster pay-back