Schema on read is obsolete. Welcome metaprogramming..pdf
Data mining and warehousing (uca15 e04)
1. Data Mining and Warehousing
(UCA15E04)
Unit 5 – Tuning the Datawarehouse
Prepared by
Dr. K. Puspalatha, Mrs. K. Ponveni, Mrs. J. Shyamala Devi
2. Difficulties in Data Warehouse Tuning
Tuning a data warehouse is a difficult procedure due to following
reasons −
Data warehouse is dynamic; it never remains constant.
It is very difficult to predict what query the user is going to post in
the future.
Business requirements change with time.
Users and their profiles keep changing.
The user can switch from one group to another.
The data load on the warehouse also changes with time.
2
3. Performance Assessment
Objective measures of performance
Average query response time
Scan rates
I/O throughput rates
Time used per day query
Memory usage per process
3
4. Performance Assessment contd..
specify the measures in service level agreement (SLA).
no use trying to tune response time, if they are already better than
those required.
realistic expectations while making performance assessment.
feasible expectations.
aggregations and views should be used (user need not know the
complexity of the system).
user can write a query you had not tuned for.
4
5. TUNING DATA LOAD
Why need tuning data load?
Speeds up ad hoc and fixed queries
Optimize hardware performance
Increase efficiency of loading process
Ensure data is consistent
Avoid duplication of data
Reduce operational cost
Avoid bottlenecking
5
6. Data flow through the data warehouse
Metadata
Extraction
Detail
Records
Metadata
Extraction
Utilities
Data Sources
Data
Warehouse
Warehouse server
ORACLE
MS
ACCESS
DB2
6
7. Steps in Tuning
Preallocate space for the table
Allocate sufficient memory
Creating DBWR process
Remove any unnecessary
Triggers
Constraints
Remove any indexes on the tables
7
8. Tuning Data Load involves
Perform consistency and integrity checks
Creating indexes and partition
Creating business views
Denormalization if appropriate
Aggregation and Summary tables
8
9. Tuning Queries
Fixed queries - Clearly defined and well understood
Adhoc queries - Unpredictable in quantity and frequency
Fixed Queries
Ad hoc Queries
10. QUERY PERFORMANCE
Unexpected long lasting queries can be caused by
Slow network connection
Slow running queries
Lack of useful statistics
Out of date statistics
Lack of useful indexes
Lack of useful data striping
10
11. Fixed Queries
Fixed queries are well defined. Examples of fixed queries
• regular reports
• Canned queries
• Common aggregations
Tuning fixed queries in a DW is same as in a RDBMS.
difference is - amount of data to be queried may be different
good to store the most successful execution plan.
spot changing data size and data skew, as it will cause the execution plan to
change.
We cannot do more on fact table but while dealing with dimension tables or the
aggregations, the usual collection of SQL tweaking, storage mechanism, and
access methods can be used to tune these queries.
11
12. Ad Hoc Queries
To understand ad hoc queries, it is important to know the ad hoc users of the
data warehouse. For each user or group of users, you need to know the following
−
The number of users in the group
Whether they use ad hoc queries at regular intervals of time
Whether they use ad hoc queries frequently
Whether they use ad hoc queries occasionally at unknown intervals.
The maximum size of query they tend to run
The average size of query they tend to run
Whether they require drill-down access to the base data
The elapsed login time per day
The peak time of daily usage
The number of queries they run per peak hour
12
13. How to Tune Ad hoc Queries ?
Frequency,Quantity
Understanding user profiles
Different queries against aggregation table
How often?
Frequently used indexes
This will help in
Growth Predictions
Capacity Planning
Index/Aggregation should be used or deleted
13
14. Query for U !!!
Select Name, roll no from BCA_rank where cgpa >8
And cgpa<=10
TUNE
14