SOFT COPY OF THE SEMINAR TOPIC ON “ DATA WAREHOUSE” SUBMITTED BY: IQxplorer
SEMINAR TOPIC ON DATA WAREHOUSE
What is Data Warehouse ? A data warehouse is a repository of information gathered from multiple sources stored under a unified schema,at a single site. The data warehouse is a relational data base organised to hold information in a structure that best supports reporting and analysis.
Characteristics of Data Warehouse :
The concept of a Data Warehouse given by Bill Inmon , the father of Data Warehouse is depicted in the figure below:
Architecture : A Data Warehouse Architecture (DWA) is a way of representing the overall structure of data, communication, processing and presentation that exists for end-user computing within the enterprise. The architecture of data warehouse is as follows:
Load Manager : Data flows into the data warehouse through the “load manager”.The data is extracted from the operational databases & supplemented by data imported from external sources.
Query manager : It provides an interface between the warehouse& its users.It performs task like directing the queries to appropriate tables, monitoring the effectiveness of the indexes & summary data & query scheduling.
The load manager primarily performs an extract Transform load(ETL) operation :
Components of data warehouse :
The primary components of data warehouses are :
Data Sources: Data sources refers to any electronic repository of information where data is passed from these systems to the data warehouse either on a transaction-by transaction basis for real-time data warehouses or on a regular cycle. Data Transformation: The Data Transformation layer receives data from the data sources, cleans and standardizes it, and loads it into the data repository. Data Warehouse: The data warehouse is a relational database organized to hold information in a structure that best supports reporting and analysis.
Reporting: The data in the data warehouse must be available to all the users if the data warehouse is to be useful. Metadata: Metadata or "data about data", is used to inform users of the data warehouse about its status and the information held within the data warehouse. Operations: Data warehouse operations comprises of the processes of loading, manipulating and extracting data from the data warehouse. Operations also covers user management, security, capacity management and related functions.
In addition, the following components also exist in some data warehouses:
Dependent Data Marts: A dependent data mart is a physical database (either on the same hardware as the data warehouse or on a separate hardware platform) that receives all its information from the data warehouse
Logical Data Marts: A logical data mart is a filtered view of the main data warehouse but does not physically exist as a separate data copy.
Operational Data Store: An ODS is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data
Design of data warehouse :
The key consideration involved in the design of a data ware house are:
Methods of storing data in a data warehouse :
The general principle used in the majority of data warehouse is that data is stored at its most elemental level for use in reporting and information analysis.
There are two primary approaches to organising the data in a data warehouse:
Dimensional approach : Here, information is stored as "facts" which are numeric or text data that capture specific data about a single transaction or event, and "dimensions" which contain reference information that allows each transaction or event to be classified in various ways.
Database normalization: In this style, the data in the data warehouse is stored in third normal form.
The main advantage of this approach is that it is quite straightforward to add new information into the database, while the primary disadvantage of this approach is that it can be quite slow to produce information and reports.
Advantages of using data warehouse:
Enhances end-user access to a wide variety of data.
Increases data consistency.
Increases productivity and decreases computing
Is able to combine data from different sources, in one place.
It provides an infrastructure that could support changes to data and replication of the changed data back into the operational systems.
Concerns in using data warehouse:
Extracting, cleaning and loading data could be time consuming.
Problems with compatibility with systems already in place e.g. transaction processing system.
Providing training to end-users, who end up not using the data warehouse.
Security could develop into a serious issue, especially if the data warehouse is web accessible.
Future Developments: Data Warehousing is such a new field that it is difficult to estimate what new developments are likely to most affect it. Clearly, the development of parallel DB servers with improved query engines is likely to be one of the most important. Parallel servers will make it possible to access huge data bases in much less time.
Conclusion : Data Warehousing is not a new phenomenon. All large organizations already have data warehouses, but they are just not managing them. Over the next few years, the growth of data warehousing is going to be enormous with new products and technologies coming out frequently. In order to get the most out of this period, it is going to be important that data warehouse planners and developers have a clear idea of what they are looking for and then choose strategies and methods that will provide them with performance today and flexibility for tomorrow.