Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[db tech showcase Tokyo 2015] DATA WAREHOUSE BASICS by Wiliiam Inmon

4,112 views

Published on

Bill Inmon – the “father of data warehouse” – has written 53 books published in nine languages. Bill’s latest adventure is the building of technology known as textual disambiguation – technology that reads raw text in a narrative format and allows the text to be placed in a conventional data base so that it can be analyzed by standard analytical technology, thereby creating unique business value for Big Data/unstructured data. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession. Bill lives in Castle Rock, Colorado. For more information about textual disambiguation refer to www.forestrimtech.com.

Published in: Technology
  • Be the first to comment

[db tech showcase Tokyo 2015] DATA WAREHOUSE BASICS by Wiliiam Inmon

  1. 1. Forest Rim Technology Copyright Inmon Consulting Services, 2008C DATA WAREHOUSE BASICS a presentation by W H Inmon
  2. 2. The data warehouse - a definition A subject oriented, non volatile, integrated, time variant collection of data for the support of management’s decisions Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  3. 3. Granular, detailed data and lots of it Data that can be shaped and reshaped A foundation of reconcilability A basis for new, unknown analysis Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  4. 4. key time primary data secondary data What a typical record of the data warehouse looks like Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  5. 5. key An identifier Unique or non unique Often a compound key May be natural or blind Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  6. 6. time Time variancy - continuous - from date/to date - periodic discrete Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  7. 7. Name Address Phone Zip Email …….. A continuous time span record from date to date Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  8. 8. Name Address Phone Zip Email …….. from date to date Name Address Phone Zip Email …….. from date to date Name Address Phone Zip Email …….. from date to date A sequence of time span records Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  9. 9. No overlap Discontinuity is a possibility 999000 From the beginning of time to the end of time Continuous time span data Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  10. 10. Periodic discrete structure Jan 1 Expenses Revenues No of employees Stock price Price per share …………………. Feb 1 Expenses Revenues No of employees Stock price Price per share …………………. Mar 1 Expenses Revenues No of employees Stock price Price per share …………………. Apr 1 Expenses Revenues No of employees Stock price Price per share …………………. The notion of taking a snapshot as of some one moment in time Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  11. 11. Periodic discrete structure Jan 1 Expenses Revenues No of employees Stock price Price per share …………………. Feb 1 Expenses Revenues No of employees Stock price Price per share …………………. Mar 1 Expenses Revenues No of employees Stock price Price per share …………………. Apr 1 Expenses Revenues No of employees Stock price Price per share …………………. The structure says nothing about values as of any other date Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  12. 12. Periodic discrete structure For few variables For slow changing variables Continuous time span data For many variables For quickly changing variables Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  13. 13. Primary data Primary data relates directly to the key Example – key – ssno - primary data – name, date of birth Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  14. 14. Secondary data Secondary data relates directly to the primary data Example – key – ssno - primary data – name, date of birth - secondary data – address, zip, phone Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  15. 15. The granular data in the data warehouse – - serves as a basis for many other forms of DSS - is instantly available - forms a foundation of reconcilability Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  16. 16. Relational structures Star joins requirements The data warehouse is shaped by the data model; The star join world is shaped by requirements Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  17. 17. Often called Multi dimensional data Often called Atomic data Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  18. 18. applications Legacy data Operational data Transactional data Atomic data Data warehouse The source of data warehouse data is the operational environment Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  19. 19. m/f 1/0 x/y male/ female gender m/f integration of data in the data warehouse Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  20. 20. inches cms feet miles unit of measure cms units of measurement need to be integrated Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  21. 21. ETL Extract/transform/load The integration and conversion of data is the most difficult part of the data warehouse process Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  22. 22. Transformation code can be generated manually or automatically. Automatically is always preferred Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  23. 23. The functions performed by the ETL process are not trivial - Convert Reformat Add time element Restructure New key Add default values Change dbms Change operating system Summarize Break into multiple records Convert key structure Merge records Collect metadata Conform to data model Select data/reject data Add indexes Change encoding Change hardware environments Resequence data Ascii to ebcdic;ebcdic to ascii Partition data Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  24. 24. ETL performed in host environment ETL performed in source environment ETL processing can be performed in different places Forest Rim Technology Copyright Inmon Consulting Services, 2008C
  25. 25. data warehouse – at the center of the decision making of the corporation Forest Rim Technology Copyright Inmon Consulting Services, 2008C

×