A data warehouse is a central repository of enterprise data used for research and decision making. It consolidates data from different operational systems, which allows for improved querying across information sources. When designing a data warehouse, technical considerations include choosing a hardware platform for scalable parallel querying, selecting a database management system, and establishing communication infrastructure between the warehouse and other systems. The hardware platform needs to be scalable to support ongoing expansion of the warehouse as new data sources and user needs emerge.
Technical Considerations for Building an Optimized Data Warehouse
1.
2. Technical considerations
What Is a Data Warehouse:
Definition: A data warehouse is the data repository of
an enterprise. It is generally used for research and
decision support.
By comparison: an OLTP (on-line transaction processor)
or operational system is used to deal with the everyday
running of one aspect of an enterprise.
OLTP systems are usually designed independently of
each other and it is difficult for them to share
information.
3. Why Do We Need Data Warehouses
Consolidation of information resources
Improved query performance
Separate research and decision support
functions from the operational systems
Foundation for data mining, data
visualization, advanced reporting and
OLAP tools
4. Building a Data Warehouse
1. Business Considerations (Return on
Investment)
2. Design Considerations
3. Technical Considerations
4. Implementation Considerations
5. Integrated Solutions
6. Benefits of Data Warehousing
5. Technical Considerations
A number of technical issues are to be considered when
designing and implementing a Data Warehouse environment.
1. The Hardware Platform that would house the Data
Warehouse for parallel query scalability. (Uni-
Processor, Multi-processor, etc)
2. The DBMS that supports the warehouse database
3. The communication infrastructure that connects the
warehouse, data marts, operational systems, and end users
4. The hardware platform and software to support the
metadata repository
5. The systems management framework that enables
centralized management and administration to the entire
environment.
6. HARDWAER PLATFORMS
Data warehouse implementations are
developed into already existing
environments.
This section looks at the hardware
platform selection from an architectural
viewpoint.
A mainframe system however,is not as
open and flexible as contemporary
client/server system,and is noy optimized
for hoc query proccessing.
7. In addition it has to be scalable,since the data
warehouse is never finished, as new user
requirements,new data sources,and more
historical datata are continusly incorrporated
into the warehouse.
Often the platform choice is the choice
between a mainframe and non-mvs(unix or
window nt)server.
8. BALANCED APPROACH
An important design point when selecting
a scalable computing platform is the right
balanced between all computing
components,for
Example between the number of
processors in a multiprocessors system
and the i/o bandwidth.remember that the
lack of balance in a system inevitabley
results in a bottleneck.
9. OPTIMAL HARDWARE ARCHITECTURE
FOR PARALLEL QUERY SCALABILLITY
An important consideration when selecting a
hardware platform for a data wareehouse is
that of scalabilty.
This architecture induced data skew is more
severe in the low-density asymmetric
connection architectures.
When selecting a hardware platform for a
data warehouse,take into account the fact
that the system a hardware platform for a
data skew can overpower even the best data
layout for parallel query.