© 2013 IBM CorporationIBM ConfidentialBAFEDM2: Fundamentals of Enterprise DataManagementWeek 02
© 2013 IBM CorporationIBM ConfidentialAgenda2Module 1: Introduction to Data Warehousing (continued)• Framework of the Dat...
© 2013 IBM CorporationIBM ConfidentialModule 1: Introduction to DataWarehousing (continued)BAFEDM2: Fundamentals of Enterp...
© 2013 IBM CorporationIBM Confidential. . .Extract,Transform,Load(ETL)Framework of the Data Warehouse: OLTP, OLAP, ODS4Sou...
© 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)OLTP: On-Line Transacti...
© 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)OLAP: On-Line Analytica...
© 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)ODS: Operational Data S...
© 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)8Characteristic OLTP OLA...
© 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: Architecture9SOURCE SYSTEMS STAGING DATA STORES ANA...
© 2013 IBM CorporationIBM ConfidentialSource SystemsThe source systems of a data warehouse can be legacy data sources, ERP...
© 2013 IBM CorporationIBM ConfidentialStaging Area•Gets input from:landing area orindividual sourcesystems•Task done here:...
© 2013 IBM CorporationIBM ConfidentialChange Data CaptureChange Data Capture(CDC) is a set ofsoftware designpatterns used ...
© 2013 IBM CorporationIBM ConfidentialChange Data Capture•Gets input from:source system’sdatabases, or ETLmetadata reposit...
© 2013 IBM CorporationIBM ConfidentialETL: Transform1.Analyze the Data2.Profile the Data3.Cleanse the Data4.Integrate the ...
© 2013 IBM CorporationIBM ConfidentialETL: Transform1.Analyze the Data2.Profile the Data3.Cleanse the Data4.Integrate the ...
© 2013 IBM CorporationIBM ConfidentialOperational Data StoreIs a subject-oriented, integrated, volatile,current-valued, de...
© 2013 IBM CorporationIBM ConfidentialData WarehouseIs a a subject oriented, integrated, non-volatile,time-variant, collec...
© 2013 IBM CorporationIBM ConfidentialData MartIs a body of decision-support data for adepartment that has an architectura...
© 2013 IBM CorporationIBM ConfidentialAnalyticsAnalytics is defined as the extensive use of data, statistical and quantita...
© 2013 IBM CorporationIBM ConfidentialMetadata“Two contractors are assigned a task of building a bridge. One is to start b...
© 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: Architecture (continued)21SOURCE SYSTEMS STAGING DA...
© 2013 IBM CorporationIBM ConfidentialData Warehouse OptionsThere are, perhaps, as many ways to develop data warehouses a...
© 2013 IBM CorporationIBM ConfidentialFor the Next SessionBAFEDM2: Fundamentals of Enterprise Data Management23
© 2013 IBM CorporationIBM ConfidentialFor the Next SessionsAgenda• Module 2: Data Warehouse Design Considerations– Data M...
Upcoming SlideShare
Loading in …5
×

MELJUN CORTES Fundamentals of Enterprise Data Management Week 02

543 views
441 views

Published on

MELJUN CORTES Fundamentals of Enterprise Data Management Week 02

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
543
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MELJUN CORTES Fundamentals of Enterprise Data Management Week 02

  1. 1. © 2013 IBM CorporationIBM ConfidentialBAFEDM2: Fundamentals of Enterprise DataManagementWeek 02
  2. 2. © 2013 IBM CorporationIBM ConfidentialAgenda2Module 1: Introduction to Data Warehousing (continued)• Framework of the Data Warehouse• Data Warehouse Options
  3. 3. © 2013 IBM CorporationIBM ConfidentialModule 1: Introduction to DataWarehousing (continued)BAFEDM2: Fundamentals of Enterprise Data Management3
  4. 4. © 2013 IBM CorporationIBM Confidential. . .Extract,Transform,Load(ETL)Framework of the Data Warehouse: OLTP, OLAP, ODS4Source 1Source 2Source N. . .OLTPOLAP DataWarehouseData Mart 1Data Mart 2Data Mart NODS Data Store Reports
  5. 5. © 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)OLTP: On-Line Transaction Processing• A system that keeps track of an organizaion’s daily transactions andupdates the warehouse at periodic intervals• Involves frequent inserts, updates and deletes, highly volatile data, andapplication-specific data• A class of systems which facilitate and manage transaction-orientedapplications, mainly data entry and retrieval transactions• Most of the systems that are used in the day-to-day businesses are ofOLTP type such as:– Order entry– Inventory management– Railway reservation system– Payroll or production tracking5
  6. 6. © 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)OLAP: On-Line Analytical Processing• A technology that uses a multi-dimensional view of aggregate data toprovide quick access to strategic information for further analysis• Inserts, updates and deletes are periodic batc processes, non-volatiledata, and integrated and summarized data• Enables end-users to perform ad-hoc analysis of data in multipledimensions, thereby providing the insight and understanding they needfor better decision making• Typical OLAP applications are:– Business reporting for sales, management– Budgeting and forecasting– Financial reporting6
  7. 7. © 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)ODS: Operational Data Store• A subject-oriented, integrated, volatile, current-valued data storecontaining only corporate detailed data• ODS typically:– is meant only for operational systems– contains current value and near current values– contains detailed data– is meant for day-to-day decisions and operational activities7
  8. 8. © 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: OLTP, OLAP, ODS (continued)8Characteristic OLTP OLAP ODSUsed For Day-to-day transaction Information managementin an enterpriseOperational activities andday-to-day decisionsDatabase Size Moderate Very large ModerateData Load Field by field Batch upload Field by fieldAccessed By Operational users Executives, managers, andanalystsAnalysts and operationalusersKind of Data Individual records Set of records Individual recordsType of Data Transaction Analysis Transaction and analysisMethodology Operational requirements Evolutionary Data drivenData Structure Detailed Highly summarized Detailed and lightlysummarizedData Organization Functional Subject-oriented Subject-orientedData Source Homogenous, application-centricHeterogenous HomogenousData Redundancy Not redundant Managed redundancy Redundant to some extent
  9. 9. © 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: Architecture9SOURCE SYSTEMS STAGING DATA STORES ANALYTICSLegacyFlat FilesWebEnterprise ResourcePlanning (ERP)Customer RelationshipManagement (CRM)Supply ChainManagement (SCM)LANDING AREASource MetadataProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL Metadata Technical Metadata Business MetadataQualityLoadChange Data CaptureOperationalData StoreData WarehouseData MartsInformation onDemandOLAPData MiningReportingActionAnalysisExtract,Transform,Load(ETL)
  10. 10. © 2013 IBM CorporationIBM ConfidentialSource SystemsThe source systems of a data warehouse can be legacy data sources, ERP’s, simple flatfiles to complex SAP sources, or COBOL sources or other data sources like RDBMS,AS/400, web application data, etc. Commonly, these sources or operational data(OLTP data sources) are known as transactional data sources.•Gets input from: individual live application’s data•Tasks done here: application-specific transactional, point-in-time data load•Sends output to: either landing area or staging areaLanding AreaThe landing area is a volatile intermediate area for operational data beforetransformation takes place. This is implemented to insulate the OLTP systems from thedeveloper, avoid access on load on online source system applications, and to abide byfederal laws in some cases. Source system data is either pushed or pulled into thelanding area in a pre-determined format from respective source systems. This data isthen loaded into the staging area. In some scenarios it’s quite possible that data candirectly be sourced from the source system area to the staging area instead of routing itthrough the landing area.•Gets input from: individual live application’s data in a pre-determined format•Tasks done here: application-specific transactional, point-in-time data load•Sends output to: staging areaFramework of the Data Warehouse: Architecture (continued)10SOURCE SYSTEMSLegacyFlat FilesWebEnterprise ResourcePlanning (ERP)Customer RelationshipManagement (CRM)Supply ChainManagement (SCM)LANDING AREASource Metadata
  11. 11. © 2013 IBM CorporationIBM ConfidentialStaging Area•Gets input from:landing area orindividual sourcesystems•Task done here:extraction, cleansing,transformation,integration,standardization ofdisparate sourcesystems data togenerate a completeand conformed record•Sends output to:volatile, integrated,point-in-time datamoved to eitheroperational data store,data warehouse, ordata martsStaging AreaThe staging area is a place where you holdtemporary tables on the data warehouseserver. We basically need a staging area tohold the data and perform data cleansing andmerging, before loading the data into thewarehouse. Sometimes, the staging area isalso required to hold a subset of the sourcefor data profiling activities.Data quality (information quality) is defined asstandardizing and consolidating customerand/or business data. By cleansing, enhancing,merging, scrubbing the data andcombining/aggregation related records toavoid duplicate entries, you are able to createa single record view. The staging area can alsohold reference and standardization tables.Framework of the Data Warehouse: Architecture (continued)11STAGINGProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL MetadataQualityLoadChange Data CaptureExtract,Transform,Load(ETL)
  12. 12. © 2013 IBM CorporationIBM ConfidentialChange Data CaptureChange Data Capture(CDC) is a set ofsoftware designpatterns used todetermine the datathat has changed in adatabase so that actioncan be taken on thechanged data.CDC solutions occurmostly in datawarehouseenvironments sincecapturing andpreserving the state ofdata across time is oneof the core functions ofa data warehouse.It can be in source, inlanding. or in stagingarea.Extract, Transform, LoadExtract: extracts data from either landing areaor directly from source systems using ETL toolspreferably or using custom scripts.Transform: transformation would involve thefollowing:1.Analyze the Data2.Profile the Data (optional, required for dataquality)3.Cleanse the Data4.Integrate the Data5.Standardize the Data6.Data QualityLoad: load integrated, complete andconformed system of record into eitheroperational data store, data warehouse, ordata marts.Framework of the Data Warehouse: Architecture (continued)12STAGINGProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL MetadataQualityLoadChange Data CaptureExtract,Transform,Load(ETL)
  13. 13. © 2013 IBM CorporationIBM ConfidentialChange Data Capture•Gets input from:source system’sdatabases, or ETLmetadata repository, orlog or journal entries indatabases either onsource, staging, orlanding•Tasks done here:verification of recordsthat have been eitherinserted (new),updated (modified), ordeleted (removed)•Sends output to:perform necessaryupdates and deletes ontarget systemsExtract, Transform, Load•Gets input from: landing area or individual source systems orstaging area•Tasks done here: extraction, cleansing, transformation,integration, standardization of disparate source systems datato generate a complete and conformed record; generates ETLmedata•Sends output to: integrated, complete and conformedrecord moved to either operational data store, datawarehouse, or data martsFramework of the Data Warehouse: Architecture (continued)13STAGINGProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL MetadataQualityLoadChange Data CaptureExtract,Transform,Load(ETL)
  14. 14. © 2013 IBM CorporationIBM ConfidentialETL: Transform1.Analyze the Data2.Profile the Data3.Cleanse the Data4.Integrate the Data5.Standardize the Data6.Data QualityAnalyze the DataThis involves analysis of metadata and data values,and detection of differences between defined andinferred properties.Profile the DataData profiling is a process to assess current dataconditions, or to monitor data quality over time. Itbegins with collecting measurements about data,and then looking at the resultsindividually and in various combinations to seewhere anomalies exits.Cleanse the DataData cleansing (also referred to as data scrubbing) isthe act of detecting and correcting (or removing)corrupt or inaccurate records from a record set,table, or database. The term refers to identifyingincomplete, incorrect, inaccurate, irrelevant, etc.parts of the data and then replacing, modifying ordeleting this dirty data.Framework of the Data Warehouse: Architecture (continued)14STAGINGProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL MetadataQualityLoadChange Data CaptureExtract,Transform,Load(ETL)
  15. 15. © 2013 IBM CorporationIBM ConfidentialETL: Transform1.Analyze the Data2.Profile the Data3.Cleanse the Data4.Integrate the Data5.Standardize the Data6.Data QualityIntegrate the DataThis involves integration and consolidation of data fromvarious source systems to form a single system of record.Essentially, to understand the completelifecycle of this product means to integrate these differentrecords for these different processes into a single system ofrecord.Standardize the DataData standardization transforms different input formats intoa consolidated output format; helps in:creating single domain fields, incorporating business, industrystandards.Data QualityWithout accurate data, users loose confidence in the dataand make improper decisions. Data quality addresses issueslike:•Business Rules violations, e.g., missing data, use of default(1, or 0, or 9999, or ?), data with logic embedded (e.g., itemcode starts with 1, product code starts with 9)•Data Integrity violations, e.g., duplicate primary key, oneentity have different key identifiers, no reference data,multiple variation of same valueFramework of the Data Warehouse: Architecture (continued)15STAGINGProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL MetadataQualityLoadChange Data CaptureExtract,Transform,Load(ETL)
  16. 16. © 2013 IBM CorporationIBM ConfidentialOperational Data StoreIs a subject-oriented, integrated, volatile,current-valued, detailed-only collection of data insupport of an organizations need for up-to-thesecond, operational, integrated, collectiveinformation•Gets input from: staging area•Tasks done here: data storage for currentperiod, alters key structures, reformats data,lightly summarizes data, recalculates data,queried by analysis•Sends output to: data warehouse and/or datamartsAudienceData AnalystsData ModelEntity-Relationship(normalized), detailedand lightly summarizedDatabase SizeModerateData UpdateField by fieldPhilosophySupport day to daydecisions andoperational activitiesFramework of the Data Warehouse: Architecture (continued)16DATA STORESTechnical MetadataOperationalData StoreData WarehouseData Marts
  17. 17. © 2013 IBM CorporationIBM ConfidentialData WarehouseIs a a subject oriented, integrated, non-volatile,time-variant, collection of data organized tosupport management needs•Gets input from: staging area or operationaldata stores•Tasks done here: data storage for historicalperiod, alters key structures, reformats data,summarizes data, recalculates data•Sends output to: data martsAudienceManager and analystsData ModelDimensional andsummarizedDatabase SizeLarge to very largeData UpdateBatch, controlledPhilosophySupport managing theenterpriseFramework of the Data Warehouse: Architecture (continued)17DATA STORESTechnical MetadataOperationalData StoreData WarehouseData Marts
  18. 18. © 2013 IBM CorporationIBM ConfidentialData MartIs a body of decision-support data for adepartment that has an architectural foundationof a data warehouse; can also represent abusiness process that can proliferate acrossmanydepartments•Gets input from: staging area, operational datastores, or data warehouse•Tasks done here: summarization, key allocation,aggregation, de-normalization•Sends output to: analytics and businessintelligence tools use this data for reporting anddata miningAudienceExecutives, managerand analystsData ModelDimensional andsummarizedDatabase SizeModerate to largeData UpdateBatch, controlledPhilosophyOperational efficiencyFramework of the Data Warehouse: Architecture (continued)18DATA STORESTechnical MetadataOperationalData StoreData WarehouseData Marts
  19. 19. © 2013 IBM CorporationIBM ConfidentialAnalyticsAnalytics is defined as the extensive use of data, statistical and quantitativeanalyses, explanatory and predictive modeling, and fact-based managementto drive decision making.There are three types of analytics:•Descriptive analytics provides information about the past state orperformance of a business and its environment. It provides regular reports forevents that already happened and ad hoc reports to help examine facts aboutwhat happened, where, how often, and with how many.•Predictive analytics helps predict (based on data and statistical techniques)with confidence what will happen next so that you can make well-informeddecisions and improve business outcomes. It uses simulation models tosuggest what could happen.•Prescriptive analytics recommends high-value alternative actions or decisionsgiven a complex set of targets, limits, and choices. It predicts future outcomesand suggests courses of actions to take so that you can benefit from thosepredictions.Framework of the Data Warehouse: Architecture (continued)19ANALYTICSBusiness MetadataInformation onDemandOLAPData MiningReportingActionAnalysis
  20. 20. © 2013 IBM CorporationIBM ConfidentialMetadata“Two contractors are assigned a task of building a bridge. One is to start building from East end andthe other is to start building from the West end. Both have to meet in the center and then merge.When they arrived at the center point, one end of the bridge was higher than the other by a fewinches. This was because one group of contractors and their engineers used kilograms and meters,while another used pounds and feet. It caused the parent company losses in billions. Reason: itwasn’t the data that was faulty; it was the metadata.”Metadata is “data about data.” It refers to data that tries to describe a data set in terms of its value,content, quality, and significance.It provides insight into data for information like:1.What kind of data?2.Who is the owner of this data?3.How was the data created?4.What are the attributes and significance of the data created or collected?Framework of the Data Warehouse: Architecture (continued)20Source Metadata ETL Metadata Technical Metadata Business Metadata
  21. 21. © 2013 IBM CorporationIBM ConfidentialFramework of the Data Warehouse: Architecture (continued)21SOURCE SYSTEMS STAGING DATA STORES ANALYTICSLegacyFlat FilesWebEnterprise ResourcePlanning (ERP)Customer RelationshipManagement (CRM)Supply ChainManagement (SCM)LANDING AREASource MetadataProfileExtractAnalyzeCleanseTransformIntegrate/ConsolidateETL Metadata Technical Metadata Business MetadataQualityLoadChange Data CaptureOperationalData StoreData WarehouseData MartsInformation onDemandOLAPData MiningReportingActionAnalysisExtract,Transform,Load(ETL)
  22. 22. © 2013 IBM CorporationIBM ConfidentialData Warehouse OptionsThere are, perhaps, as many ways to develop data warehouses as thereare organization. Moreover, there are a number of key factors that needto be considered: scope, data redundancy, and type of end-user.• Scope. The scope of a data warehouse may be as broad as all theinformational data for the entire enterprise from the beginning of time, or itmay be as narrow as a personal data warehouse for a single manager for asingle year.• Data Redundancy. Virtual data warehouses allow end users to get atoperational databases directly; it provides the ultimate in flexibility as well asthe minimum amount of redundant data that must be loaded and maintained.Central data warehouses are single physical databases that contains all datafor a specific functional area, department, division, or enterprise. Distributeddata warehouses are those in which certain components are distributed acrossa number of different physical databases.• Type of End-User. End-users can be broadly categorized into three: executivesand managers, power users (business and financial analysts, engineers), andsupport users (clerical, administrative).22
  23. 23. © 2013 IBM CorporationIBM ConfidentialFor the Next SessionBAFEDM2: Fundamentals of Enterprise Data Management23
  24. 24. © 2013 IBM CorporationIBM ConfidentialFor the Next SessionsAgenda• Module 2: Data Warehouse Design Considerations– Data Models– The Dimensional Model– Facts and Dimensions– Four-Step Dimensional Design Process– Case Study: Retail24

×