Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,022
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
12
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Warehousing De-Mystified
    • Presented to the Annual
    • Georgia Oracle Users Group
    • Conference
    • on March 20, 2006
    • By Rusty Barnett
  • 2. Data Warehousing De-Mystified
    • What do people mean when they call a grouping of data a “Data Warehouse”?
    • Large Data Volume, Loaded using an ETL Process ( E xtract, T ransform, L oad)
    • Data is Time-sensitive and Subject Oriented
    • Enterprise Data Warehouse, Data Marts, ODS (Operational Data Stores), DSS , EIS
    • Dimensional Model, Third-Normal Form
  • 3. Data Warehousing De-Mystified
    • What do people mean when they refer to “Data Warehousing”?
    • Dimensional Data Modeling
    • ETL processing and administrating
    • Business Intelligence Reporting
    • Database managing and administrating
    • … As it pertains to a Data Warehouse
  • 4. Data Warehousing De-Mystified
    • What is a “Data Warehouse”?
    • "A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process“. Bill Inmon, 1990.
    • "A data warehouse is the queryable presentation resource for an enterprise’s data". Ralph Kimball, page 19, "The Data Warehouse Lifecycle Toolkit“.
  • 5. Data Warehousing De-Mystified
    • Definitions of Data Warehouse on the Web:
    • A data warehouse is a database geared towards the business intelligence requirements of an organization. www.oranz.co.uk/glossary_text.htm
    • An information infrastructure that enables businesses to access and analyze detailed data and trends. www.adobe.com/products/vdp/glossary.html
    • A collection of integrated, subject-oriented databases designed to support the DSS function. The data warehouse contains atomic data and lightly summarized data. it.csumb.edu/departments/data/glossary.html
    • A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. www.pdacortex.com/glossary.htm
  • 6. Data Warehousing De-Mystified
    • Definitions of Data Warehouse on the Web Continued:
    • The Data Warehouse is a central repository of data that provides the MIT community with integrated, up-to-date data from various administrative systems. web.mit.edu/sapr3/docs/webdocs/glossary/glCD.html
    • An information repository from which queries & analysis are made. www.pcai.com/web/glossary/pcai_d_f_glossary.html
    • This vast database stores information like a data repository, but goes a step further, allowing users to access data to perform research-oriented analyses. www.payorid.com/glossary.asp
    • A collection of databases combined with a flexible data extraction system. www.genpromag.com/Glossary~LETTER~D.html
  • 7. Data Warehousing De-Mystified
    • Definitions of Data Warehouse on the Web Continued :
    • Datawarehouse (database) A generic term for a system for storing, retrieving and managing large amounts of any type of data. www2.themanualpage.org/glossary/glo_d.php3
    • The department or entity charged with collecting organization-wide data, verifying its accuracy, and analyzing, managing, and distributing it throughout the organization. In organizations without a data warehouse, each department may collect, analyze, manage, and distribute the data it needs for its operations. www2.uta.edu/ssw/trainasfa/glossary.htm
    • A repository made up of databases of data extracted from a variety of sources, with a view to analysis to reveal additional information. www.gbc.t-online.hu/english/bszotare2.htm
  • 8. Data Warehousing De-Mystified
    • Definitions of Data Warehouse on the Web Continued :
    • Central repository of data extracted from various sources. The current CMSU DW is a ‘read-only’ system with extracted data from SIS+, AFINS, HRS and Enrollment Management. www.cmsu.edu/x18299.xml
    • A collection of data and information from various source systems. www.gov.bc.ca/prem/popt/service_plans/srv_pln/pssg/appen_a.htm
    • A subject-oriented non-volatile collection of data used to support strategic decision making. The warehouse is the central point of data integration for business intelligence. It is the source of data for data marts within an enterprise and delivers a common view of enterprise data. publib.boulder.ibm.com/tividd/td/TEDW/GC32-0744-01/en_US/HTML/insmst342.htm
  • 9. Data Warehousing De-Mystified
    • Definitions of Data Warehouse on the Web Continued :
    • A logical collection of information, gathered from many different operational databases, that supports business analysis activities and decision-making tasks www.321site.com/greg/courses/mis1/glossary.htm
    • A data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favor efficient data analysis and reporting (especially OLAP). Data warehousing is not meant for current, "live" data. en.wikipedia.org/wiki/Data_warehouse
  • 10. Data Warehousing De-Mystified
    • Definitions of Data Warehouse on the Web Continued :
    • A very large repository of data comprising nearly all of a company’s information. www.mosaictec.com/storage/storage_terms.htm
    • A large database capable of storing all the information possessed by a large organization. www.jqjacobs.net/edu/cis105/concepts/CIS105_concepts_13.html
    • A database used for storing historical data, which is used for data analysis. docs.rinet.ru/O8/glossary.htm
    • One or more data stores originating from prime authoritative data sources by an auditable replication process. www.data-core.com/glossary-of-terms.htm
  • 11. Data Warehousing De-Mystified
    • Goal of Data Warehousing:
    • A System that displays Large Volumes of Data as Business INFORMATION!
  • 12. Data Warehousing De-Mystified
    • Terms :
    • Data Warehouse --- The term Data Warehouse was coined by Bill Inmon in 1990
    • Enterprise Data Warehouse --- Accumulation of data from all significant areas of an Enterprise
    • Data Mart --- A subset of a data warehouse, for use by a functional area (department) on a subject
    • Dimensional Modeling --- A design technique that seeks to present the data in a framework that’s intuitive, allows for high-performance access, and resistance to change
    • ERD --- E ntity R elational D iagram
  • 13. Data Warehousing De-Mystified
    • Terms :
    • Schema --- A collection of database objects, including tables, views, indexes, and synonyms
    • Star Schema --- A method of designing tables with a Fact Table in the middle and Dimension Tables around it; a standard technique for designing and building multi-dimensional databases
    • Snowflake Schema --- Star Schema with child Dimensions
    • Star Query --- A join between a Fact table and a number of Dimension tables, where the Dimension tables are not joined to each other, only to the Fact table
  • 14. Data Warehousing De-Mystified
    • Terms :
    • Fact Table --- Primary table in Dimensional Model that is meant to contain measurements of the business, whose most useful columns are numeric and additive
    • Dimension Table --- One of the set of companion tables to a Fact Table, whose columns are usually textual and are used for constraining, grouping, ordering within queries
    • Grain --- The level of the data, like Weekly or Yearly
    • Measures --- The numeric columns in a Fact Table
    • Audit Columns --- Columns specifying change (who,when)
  • 15. Data Warehousing De-Mystified
    • Terms :
    • Conformance --- A Dimension is said to be conformed if it can be used by more than one Fact Table and/or Data Mart and/or subject area, i.e., “shareable”
    • Associative (Intersection) Table --- A table that is used to associate a value from one table to a value in another table in order to reduce many-to-many relationships between tables
    • Staging Tables --- Tables used to “stage” data along the way toward loading data, especially useful in ETL
    • ETL --- E xtract, T ransform, and L oad; also a Verb, meaning to load data into one system from data in another
  • 16. Data Warehousing De-Mystified
    • Data Warehousing Roles:
    • Business Users --- People with knowledge of the business and needs for reporting
    • Report Writers --- People who write reports (Developers); tools for writing reports
    • Data Modelers --- People who create data models from Business Requirements
    • ETL Developers --- People who write / develop ETL processes
    • DBA --- People who manage and support databases
  • 17. Data Warehousing De-Mystified
    • Tips for Data Modelers:
    • Understanding the capabilities of the ETL tool, the RDBMS, and the Reporting tool, is just as important as understanding the needs of the business and the users requirements
    • All Data Warehouse primary keys (PK) should be single-column surrogate keys (sequence)
    • Keep Referential Integrity in the DW
    • Use Associative tables to handle M:M
    • Order columns in tables by their “fill” factor
  • 18. Data Warehousing De-Mystified
    • Tips for ETL Developers:
    • Embrace the use of multiple Staging Tables
    • Primarily INSERT data in each ETL process step, especially into the Staging Tables
    • Invalidate Indexes (other than Primary Keys, Unique Keys, and Foreign Keys) before loading/updating massive amounts of data. After the data is loaded/updated, then rebuild these Indexes
    • Be careful in usage of Oracle ROWNUM
    • Test new features, like 10g’s MERGE enhancements, Pipeline Functions, Grouped Table Outer Joins, Conditional Update/Insert/Delete statements, etc.
  • 19. Data Warehousing De-Mystified
    • Tips for Report Developers:
    • Avoid using Hints in your SQL
    • Consider using Materialized Views for reporting on a common set of pre-processed data
    • Be very careful (in fact, be hesitant) in using normal Views in Data Warehouses
    • Consider using the SQL clauses “intersect”, “minus”, and “union [all]” in set processing
    • “ JOINs are expensive” --- NOT , I/O is expensive
    • Test new features, like 10g’s Connect_by, MView and MERGE enhancements, Pipeline Functions, Grouped Table Outer Joins, Time Series & Interrow Calc’s, etc.
  • 20. Data Warehousing De-Mystified
    • Tips for DBA’s:
    • Use as much of each Data Block as possible (PCTFREE, PCTUSED)
    • Set optimizer_index_cost_adj to lower than 100
    • Use ARCHIVELOG mode & RMAN for backups
    • Collect Statistics in best way for your Oracle Release
    • Test Benefits of Parallel Query and Query Re-write
    • Test new features of RDBMS --- 10g features like BFT, new Flashback, Security, Partitioning, and RMAN features, Improved VLDB support, etc.
  • 21. Data Warehousing De-Mystified
    • Current Debates in Data Warehousing:
    • Fresh Data versus Live Data
    • Dimensional Model versus Third-Normal Form
    • Is a Snowflake schema bad for performance?
    • Time spent to “physicalize” the Data Model?
    • Who is responsible for Data Model?
    • Manage DB via Data Model? (OWB)
    • When to partition and when not to?
    • Acceptance of Level 3 Changing Dimensions
  • 22. Data Warehousing De-Mystified
    • Current Standards DW Tools:
    • Database --- Oracle 9i / 10g
    • ERD Data Modeling --- ERwin (Oracle Designer)
    • ETL --- Informatica (OWB)
    • Report Writing --- Oracle Discoverer, MicroStrategy, Business Objects, Cognos
  • 23. Data Warehousing De-Mystified
    • Questions
    • And
    • Answer
    • Session
  • 24. Data Warehousing De-Mystified
    • The End