A data warehouse is a relational databasethat is designed for query and analysis ratherthan for transaction processing. It usuallycontains historical data derived fromtransaction data, but it can include data fromother sources. It separates analysis workloadfrom transaction workload and enables anorganization to consolidate data from severalsources.
In addition to a relational database, a datawarehouse environment includes anextraction, transportation, transformation, and loading (ETL) solution, an online analyticalprocessing (OLAP) engine, client analysistools, and other applications that manage theprocess of gathering data and delivering it tobusiness users.
A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon :• Subject Oriented• Integrated• Nonvolatile• Time Variant
Subject OrientedData warehouses are designed to help youanalyze data. For example, to learn moreabout your companys sales data, you canbuild a warehouse that concentrates onsales. Using this warehouse, you cananswer questions like "Who was our bestcustomer for this item last year?" Thisability to define a data warehouse bysubject matter, sales in this case, makesthe data warehouse subject oriented.
Integrated Integration is closely related to subjectorientation. Data warehouses must put datafrom disparate sources into a consistentformat. They must resolve such problems asnaming conflicts and inconsistencies amongunits of measure. When they achievethis, they are said to be integrated.
NonvolatileNonvolatile means that, once entered into thewarehouse, data should not change. This islogical because the purpose of a warehouse isto enable you to analyze what has occurred.
Time VariantIn order to discover trends in business,analysts need large amounts of data.This is very much in contrast to onlinetransaction processing (OLTP) systems,where performance requirementsdemand that historical data be moved toan archive. A data warehouses focus onchange over time is what is meant bythe term time variant.
Typically, data flows from one or moreonline transaction processing (OLTP)databases into a data warehouse on amonthly, weekly, or daily basis. The data isnormally processed in a staging file beforebeing added to the data warehouse. Datawarehouses commonly range in size fromtens of gigabytes to a few terabytes.Usually, the vast majority of the data isstored in a few very large fact tables.
Data ModificationsA data warehouse is updated on aregular basis by the ETL process (runnightly or weekly) using bulk datamodification techniques. The end usersof a data warehouse do not directlyupdate the data warehouse.
Data Warehouse Architecture Data warehouses and their architectures vary depending upon the specifics of an organizations situation. Three common architectures are:• Data Warehouse Architecture (Basic)• Data Warehouse Architecture (with a Staging Area)• Data Warehouse Architecture (with a Staging Area and Data Marts)
Architecture of a Data Warehouse
Data Warehouse Architecture (with a Staging Area)
Architecture of a Data Warehouse with a Staging Area and Data Marts