The document compares and contrasts OLTP and data warehouse systems. It states that a data warehouse is an enterprise-wide repository of historical data used for information retrieval and decision support. It stores both atomic and aggregated data. In contrast, an OLTP system supports predefined operations and only stores historical data needed for current transactions. The document outlines several key differences between the two systems, such as how data is modified (regular ETL vs individual updates), schema design (denormalized vs normalized), typical operations (complex queries on large data sets vs simple queries on few records), and historical data storage (years of data vs weeks or months).
1. Data Warehouse: 10
OLTP Vs DataWarehouse
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
2. Concept of a Data Warehouse
• An enterprise wide structured repository of
subject-oriented, time-variant, historical data
used for information retrieval and decision
support.
• The data warehouse stores atomic and
summary data
3. On-line Analytical Processing
• OLAP is characterized by relatively low
volume of transactions. Queries are often very
complex and involve aggregations. For OLAP
systems a response time is an effectiveness
measure. OLAP applications are widely used
by Data Mining techniques. In OLAP database
there is aggregated, historical data, stored in
multi-dimensional schemas (usually star
schema).
4. Workload
• Data warehouses are designed to accommodate ad hoc
queries and data analysis. The Analyst might not know the
workload of data warehouse in advance, so a data warehouse
should be optimized to perform well for a wide variety of
possible query and analytical operations.
• OLTP systems support only predefined operations. The
applications might be specifically tuned or designed to
support only these operations
5. Data modifications
• A data warehouse is updated on a regular basis
by the ETL process (run nightly or weekly) using
bulk data modification techniques.
• The end users of a data warehouse do not
directly update the data warehouse
• In OLTP systems, end users routinely issue
individual data modification statements to the
database.
• The OLTP database is always up to date, and
reflects the current state of each business
transaction
6. Schema design
• Data warehouses often use partially
denormalized schemas to optimize query and
analytical performance.
• OLTP systems often use fully normalized
schemas to optimize update/insert/delete
performance, and to guarantee data
consistency.
7. Typical operations
• A typical data warehouse query scans
thousands or millions of rows. For example,
"Find the total sales for all customers last
month.“
• A typical OLTP operation accesses only a
handful of records. For example, "Retrieve the
current order for this customer."
8. Historical data
• Data warehouses usually store many months
or years of data. This is to support historical
analysis and reporting.
• OLTP systems usually store data from only a
few weeks or months. The OLTP system stores
only historical data as needed to successfully
meet the requirements of the current
transaction.
9. Property OLTP OLAP
Source of data
Operational data; OLTPs are the original
source of the data.
Consolidation data; OLAP data comes
from the various OLTP Databases
Purpose of data
To control and run fundamental business
tasks
To help with planning, problem solving,
and decision support
What the data
Reveals a snapshot of ongoing business
processes
Multi-dimensional views of various kinds
of business activities
Inserts and Updates
Short and fast inserts and updates initiated
by end users
Periodic long-running batch jobs refresh
the data
Queries
Relatively standardized and simple queries
Returning relatively few records
Often complex queries involving
aggregations
Processing Speed Typically very fast
Depends on the amount of data involved;
batch data refreshes and complex queries
may take many hours; query speed can be
improved by creating indexes
Space Requirements
Can be relatively small if historical data is
archived
Larger due to the existence of aggregation
structures and history data; requires more
indexes than OLTP
Database Design Highly normalized with many tables
Typically de-normalized with fewer
tables; use of star and/or snowflake
schemas
Backup and Recovery
Backup religiously; operational data is
critical to run the business, data loss is
likely to entail significant monetary loss
and legal liability
Instead of regular backups, some
environments may consider simply
reloading the OLTP data as a recovery
method