Interactive Powerpoint_How to Master effective communication
Data mining and warehousing qb
1. QUESTION BANK
SUBJECT : DATA WAREHOUSING AND MINING
UNIT 1:
2MARKS
1. What are the key features of data warehouse?
2. Define data warehouse.
3. What are operational databases?
4. Define OLTP?
5. Define OLAP?
6. How a database design is represented in OLTP systems?
7. How a database design is represented in OLAP systems?
8. Differentiate the features of operational database and data warehouse.
9. Write short notes on multidimensional data model?
10.Define data cube?
11.What are facts?
12.What are dimensions?
13.Define dimension table?
14.Define fact table?
15.What are lattice of cuboids?
16.What is apex cuboid?
17.List out the components of star schema?
18.What is snowflake schema?
19.List out the components of fact constellation schema?
20.Point out the major difference between the star schema and the snowflake
schema?
2. 21.Which is popular in the data warehouse design, star schema model (or)
snowflake schema model?
22.Define concept hierarchy?
23.Define total order?
24.Define partial order?
25.Define schema hierarchy?
26.List out the OLAP operations in multidimensional data model?
27.What is roll-up operation?
28.What is drill-down operation?
29.What is slice operation?
30.What is dice operation?
31.What is pivot operation?
32.List out the views in the design of a data warehouse?
33.What are the methods for developing large software systems?
34.How the operation is performed in waterfall method?
35.How the operation is performed in spiral method?
36..List out the steps of the data warehouse design process?
37.Define ROLAP
38.Define MOLAP?
39..Define HOLAP?
40.What is enterprise warehouse?
41.What is data mart?
42.What are dependent and independent data marts?
43.What is virtual warehouse?
44.Define indexing?
3. 45.What are the types of indexing?
46.Differentiate partial and full materialization.
16 MARKS:
1. Discuss the components of data warehouse.
2. List out the differences between OLTP and OLAP.
3. Discuss the various schematic representations in multidimensional data model.
4. Explain the OLAP operations in the multidimensional data model.
5. Explain the design and construction of a data warehouse.
6. Explain the three-tier data warehouse architecture.
7. Explain the process of data warehouse implementation.
UNIT 2:
2MARKS
1. What is data pre-processing?
2. Define data cleaning.
3. List out the methods to fill the missing values.
4. What is data smoothing?
5. Define data integration.
6. Define data transformation.
7. List out the methods used for data normalization.
8. Differentiate min-max and z-score normalization.
9. What you mean by correlation analysis?
10.What are the methods used for data reduction?
4. 11. Define DWT.
12. What is principle component analysis?
13. Differentiate linear and multiple regression.
14. What is histogram?
15. Define sampling.
16. Define clustering.
17. What is visualization?
18. Define DMOL.
19. How to specify task relevant data using DMQL.
20. List out the coupling schema used by data mining system.
21. Differentiate semi tight and tight coupling.
22.What is Concept description?
23.Differentiate descriptive and predictive data mining.
24.What is AOI?
25.When to perform attribute removal?
26.What is attribute generalization?
27.What are the different types of class comparison?
28.What is attribute relevance analysis?
29.Differentiate quartiles and outliers.
30. Define box plot.
16 MARKS:
1. Explain about various data cleaning and integration process.
5. 2. Explain about data mining primitive tasks.
3. Explain dimensionality reduction and numerosity reduction techniques.
4. Explain about discretization and concept hierarchy generation.
5. Discuss about various data transformation techniques.
6. Explain about DMQL.
7. Write short notes on data mining system architecture?
8. Write short notes on concept description?
9. Explain about the statistical measures used in large databases.
10.Explain attribute oriented induction and its implementation.
11.Write short notes on attribute relevance analysis?
UNIT 3:
2MARKS
1. What is association rule mining?
2. Define support.
3. Define Confidence.
4. How is association rules mined from large databases?
5. What is the classification of association rules based on various criteria?
6. What is Apriori algorithm?
7. List the techniques for improving the efficiency of apriori algorithm.
8. What is portioning?
9. Define transaction reduction.
10.How sampling helps to improve efficiency of apriori algorithm.
11.List the drawbacks of apriori algorithm.
12.What is FP-Tree?
13.What is iceberg query and how it is used to improve market basket analysis?
14.List the different approaches of multilevel association rules.
15.What is controlled level cross filtering?
16.Define level passage threshold.
16 MARKS:
6. 1) Explain about mining association rules in large databases.
2) Explain apriori algorithm with example.
3) Explain FP-growth algorithm with example.
4) Discuss about mining multi-level association rules and also about different
approaches used for it.
5) A DB has 4 transactions.let min_sup=60%, min_conf=80%
TID DATE ITEMS_BOUGHT
T100
T200
T300
T400
10/15/99
10/15/99
10/19/99
10/22/99
{K,A,D,B}
{D,A,C,E,B}
{C,A,B,E}
{B,A,D}
i) Find all frequent itemsets using apriori algorithm.
ii) Find all frequent itemsets using FP-growth algorithm.
6) A database has five transactions. Let min_sup=60% and min_conf=75%.
TID Items brought
T100
T200
T300
T400
T500
{B,C,E,J}
{B,C,J}
{B,M,Y}
{B,J,M}
{C,J,M}
i) Find all frequent itemsets using apriori algorithm and FP-growth.
7. ii) List all of the strong association rules.
UNIT 4:
2MARKS
1. Define classification and prediction.
2. Describe the two common approaches to tree pruning.
3. What are Bayesian Classifiers?
4. What is a “decision tree”?
5. Where is decision trees mainly used?
6. How will you solve a classification problem using decision trees?
7. What is ID3?
8. What is decision tree pruning?
9. List some of attribute selection measures used in decision tree induction.
10.What is Baye’s theorem?
11.What is Bayesian belief network?
12.What is k-Nearest neighbor classifier?
13.Where case-based reasoning is used?
14.Differentiate eager learner and lazy learner.
15.Define least square?
16.List some available prediction methods available.
17.What is classifier accuracy?
8. 18.What is the purpose of using confusion matrix?
19.Define cluster analysis.
20.List out the types of data used in cluster analysis.
21.Differentiate data and dissimilarity matrix.
22.Define Manhattan distance.
23.List the categorization of clustering methods.
24.What is a density-based and Grid-based method?
25.Differentiate agglomerative and divisive approach.
26.What are the disadvantages of K-means algorithm?
27.List the initial inputs given to K-medoids algorithm.
28.Define outliers.
29.Differentiate statistical-based and distance based outlier detection.
30.What is the purpose of using smoothing factor?
16 MARKS:
1. Explain decision tree induction algorithm.
2. Explain about various attribute selection methods used in decision tree induction.
3. Explain about various methods used for prediction.
4. Explain cluster accuracy.
5. Explain about Bayesian classification.
6. Write short notes on outlier analysis?
7. Write short notes on cluster partitioning methods?
9. 8. Explain K-Means algorithm.
9. Explain K-Mediods algorithm.
10.What is cluster analysis? Explain about types of data used in cluster analysis.
11.Explain various methods used for outlier detection.
12.Classify the given training samples using ID3 algorithm. Apply the same to
construct a decision tree for the data given below. (Note: Use information gain as
attribute selection measure.)
SIZE COLOR SHAPE CLASS
Small
Big
Big
Small
Small
Big
Big
Big
Small
Yellow
Yellow
Red
Red
Black
Black
Yellow
Black
Yellow
Round
Round
Round
Round
Round
Cube
Cube
Round
Cube
A
A
A
A
B
B
B
B
B
13.The following table shows the mid-term and final exam grades obtained for students
in a database course.
X
MIDTERM EXAM
Y
FINAL EXAM
72
50
81
84
63
71
10. 74
94
86
59
83
65
33
88
81
78
90
75
49
79
77
52
74
90
Predict the final exam grade of a student who received 86 on the midterm
exam.
UNIT 5:
2MARKS
1. Define Spatial Databases
2. Define Transactional Databases.
3. What is Temporal Database?
4. Mention the types of dimensions in spatial data cube.
5. Name some of the data mining applications?
6. What are the contributions of data mining to DNA analysis?
7. Name some examples of data mining in retail industry?
8. Explain multimedia data mining
9. What does web mining mean
10.Define text mining
11.How to assess the quality of text retrieval?
12.List out the methods used for information retrieval.
13.What is web usage mining?
14.Differentiate time-series and sequence database.
15.List the kinds of association can be mined from multimedia data.
16 MARKS:
11. 1. Describe the applications and trends in data mining in detail.
2. Explain how data mining is used in banking industry.
3. Explain how data mining is used in health care analysis.
4. Explain data mining applications for Telecommunication industry.
5. Explain data mining applications for retail industry.
6. Explain data mining applications fro financial data analysis.
7. Explain data mining applications for Biomedical and DNA data analysis.
8. Explain in detail about spatial data mining.
9. Explain in detail about text mining.
10.Explain in detail about mining multimedia databases.
11.Explain in detail about mining WWW.
12.Explain in detail about mining time-series and sequence data.