This document contains a question bank for a data warehousing and mining course. It includes questions ranging from 2 to 16 marks covering topics such as data warehousing concepts, OLAP operations, data pre-processing, association rule mining, classification, clustering, and applications of data mining. The questions cover key concepts, algorithms, and processes within each topic area. Examples are provided for applying algorithms such as Apriori, FP-Growth, ID3, k-means and k-medoids to sample datasets.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.
Introduction to Analytics
Introduction to SAS
Introduction to Satistics
Introduction to Predictive Modeling
Introduction to Forecasting
Introduction to Bigdata
It is the best book on data mining so far, and I would defln,(teJ�_.,tdiiPt
my course. The book is very C011Jprehensive and cove� all of
topics and algorithms of which I am aware. The depth of CO!Irer•liM
topic or method is exactly right and appropriate. Each a/grorirtmti �r�
in pseudocode that is s , icient for any interested readers to
working implementation in a computer language of their choice.
-Michael H Huhns, Umversity of �UDilCiii
Discussion on distributed, parallel, and incremental algorithms is outst:tlftfi!tr··· '��
-Z an Obradovic, Temple Univef'Sf1tv
Margaret Dunham offers the experienced data base professional or graduate
level Computer Science student an introduction to the full spectrum of Data
Mining concepts and algorithms. Using a database perspective throughout,
Professor Dunham examines algorithms, data structures, data types, and
complexity of algorithms and space. This text emphasizes the use of data
mining concepts in real-world applications with large database components.
KEY FEATURES:
.. Covers advanced topics such as Web Mining and Spatialrremporal mining
Includes succinct coverage of Data Warehousing, OLAP, Multidimensional
Data, and Preprocessing
Provides case studies
Offers clearly written algorithms to better understand techniques
Includes a reference on how to use Prototypes and DM products
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
1. QUESTION BANK
SUBJECT : DATA WAREHOUSING AND MINING
UNIT 1:
2MARKS
1. What are the key features of data warehouse?
2. Define data warehouse.
3. What are operational databases?
4. Define OLTP?
5. Define OLAP?
6. How a database design is represented in OLTP systems?
7. How a database design is represented in OLAP systems?
8. Differentiate the features of operational database and data warehouse.
9. Write short notes on multidimensional data model?
10.Define data cube?
11.What are facts?
12.What are dimensions?
13.Define dimension table?
14.Define fact table?
15.What are lattice of cuboids?
16.What is apex cuboid?
17.List out the components of star schema?
18.What is snowflake schema?
19.List out the components of fact constellation schema?
20.Point out the major difference between the star schema and the snowflake
schema?
2. 21.Which is popular in the data warehouse design, star schema model (or)
snowflake schema model?
22.Define concept hierarchy?
23.Define total order?
24.Define partial order?
25.Define schema hierarchy?
26.List out the OLAP operations in multidimensional data model?
27.What is roll-up operation?
28.What is drill-down operation?
29.What is slice operation?
30.What is dice operation?
31.What is pivot operation?
32.List out the views in the design of a data warehouse?
33.What are the methods for developing large software systems?
34.How the operation is performed in waterfall method?
35.How the operation is performed in spiral method?
36..List out the steps of the data warehouse design process?
37.Define ROLAP
38.Define MOLAP?
39..Define HOLAP?
40.What is enterprise warehouse?
41.What is data mart?
42.What are dependent and independent data marts?
43.What is virtual warehouse?
44.Define indexing?
3. 45.What are the types of indexing?
46.Differentiate partial and full materialization.
16 MARKS:
1. Discuss the components of data warehouse.
2. List out the differences between OLTP and OLAP.
3. Discuss the various schematic representations in multidimensional data model.
4. Explain the OLAP operations in the multidimensional data model.
5. Explain the design and construction of a data warehouse.
6. Explain the three-tier data warehouse architecture.
7. Explain the process of data warehouse implementation.
UNIT 2:
2MARKS
1. What is data pre-processing?
2. Define data cleaning.
3. List out the methods to fill the missing values.
4. What is data smoothing?
5. Define data integration.
6. Define data transformation.
7. List out the methods used for data normalization.
8. Differentiate min-max and z-score normalization.
9. What you mean by correlation analysis?
10.What are the methods used for data reduction?
4. 11. Define DWT.
12. What is principle component analysis?
13. Differentiate linear and multiple regression.
14. What is histogram?
15. Define sampling.
16. Define clustering.
17. What is visualization?
18. Define DMOL.
19. How to specify task relevant data using DMQL.
20. List out the coupling schema used by data mining system.
21. Differentiate semi tight and tight coupling.
22.What is Concept description?
23.Differentiate descriptive and predictive data mining.
24.What is AOI?
25.When to perform attribute removal?
26.What is attribute generalization?
27.What are the different types of class comparison?
28.What is attribute relevance analysis?
29.Differentiate quartiles and outliers.
30. Define box plot.
16 MARKS:
1. Explain about various data cleaning and integration process.
5. 2. Explain about data mining primitive tasks.
3. Explain dimensionality reduction and numerosity reduction techniques.
4. Explain about discretization and concept hierarchy generation.
5. Discuss about various data transformation techniques.
6. Explain about DMQL.
7. Write short notes on data mining system architecture?
8. Write short notes on concept description?
9. Explain about the statistical measures used in large databases.
10.Explain attribute oriented induction and its implementation.
11.Write short notes on attribute relevance analysis?
UNIT 3:
2MARKS
1. What is association rule mining?
2. Define support.
3. Define Confidence.
4. How is association rules mined from large databases?
5. What is the classification of association rules based on various criteria?
6. What is Apriori algorithm?
7. List the techniques for improving the efficiency of apriori algorithm.
8. What is portioning?
9. Define transaction reduction.
10.How sampling helps to improve efficiency of apriori algorithm.
11.List the drawbacks of apriori algorithm.
12.What is FP-Tree?
13.What is iceberg query and how it is used to improve market basket analysis?
14.List the different approaches of multilevel association rules.
15.What is controlled level cross filtering?
16.Define level passage threshold.
16 MARKS:
6. 1) Explain about mining association rules in large databases.
2) Explain apriori algorithm with example.
3) Explain FP-growth algorithm with example.
4) Discuss about mining multi-level association rules and also about different
approaches used for it.
5) A DB has 4 transactions.let min_sup=60%, min_conf=80%
TID DATE ITEMS_BOUGHT
T100
T200
T300
T400
10/15/99
10/15/99
10/19/99
10/22/99
{K,A,D,B}
{D,A,C,E,B}
{C,A,B,E}
{B,A,D}
i) Find all frequent itemsets using apriori algorithm.
ii) Find all frequent itemsets using FP-growth algorithm.
6) A database has five transactions. Let min_sup=60% and min_conf=75%.
TID Items brought
T100
T200
T300
T400
T500
{B,C,E,J}
{B,C,J}
{B,M,Y}
{B,J,M}
{C,J,M}
i) Find all frequent itemsets using apriori algorithm and FP-growth.
7. ii) List all of the strong association rules.
UNIT 4:
2MARKS
1. Define classification and prediction.
2. Describe the two common approaches to tree pruning.
3. What are Bayesian Classifiers?
4. What is a “decision tree”?
5. Where is decision trees mainly used?
6. How will you solve a classification problem using decision trees?
7. What is ID3?
8. What is decision tree pruning?
9. List some of attribute selection measures used in decision tree induction.
10.What is Baye’s theorem?
11.What is Bayesian belief network?
12.What is k-Nearest neighbor classifier?
13.Where case-based reasoning is used?
14.Differentiate eager learner and lazy learner.
15.Define least square?
16.List some available prediction methods available.
17.What is classifier accuracy?
8. 18.What is the purpose of using confusion matrix?
19.Define cluster analysis.
20.List out the types of data used in cluster analysis.
21.Differentiate data and dissimilarity matrix.
22.Define Manhattan distance.
23.List the categorization of clustering methods.
24.What is a density-based and Grid-based method?
25.Differentiate agglomerative and divisive approach.
26.What are the disadvantages of K-means algorithm?
27.List the initial inputs given to K-medoids algorithm.
28.Define outliers.
29.Differentiate statistical-based and distance based outlier detection.
30.What is the purpose of using smoothing factor?
16 MARKS:
1. Explain decision tree induction algorithm.
2. Explain about various attribute selection methods used in decision tree induction.
3. Explain about various methods used for prediction.
4. Explain cluster accuracy.
5. Explain about Bayesian classification.
6. Write short notes on outlier analysis?
7. Write short notes on cluster partitioning methods?
9. 8. Explain K-Means algorithm.
9. Explain K-Mediods algorithm.
10.What is cluster analysis? Explain about types of data used in cluster analysis.
11.Explain various methods used for outlier detection.
12.Classify the given training samples using ID3 algorithm. Apply the same to
construct a decision tree for the data given below. (Note: Use information gain as
attribute selection measure.)
SIZE COLOR SHAPE CLASS
Small
Big
Big
Small
Small
Big
Big
Big
Small
Yellow
Yellow
Red
Red
Black
Black
Yellow
Black
Yellow
Round
Round
Round
Round
Round
Cube
Cube
Round
Cube
A
A
A
A
B
B
B
B
B
13.The following table shows the mid-term and final exam grades obtained for students
in a database course.
X
MIDTERM EXAM
Y
FINAL EXAM
72
50
81
84
63
71
10. 74
94
86
59
83
65
33
88
81
78
90
75
49
79
77
52
74
90
Predict the final exam grade of a student who received 86 on the midterm
exam.
UNIT 5:
2MARKS
1. Define Spatial Databases
2. Define Transactional Databases.
3. What is Temporal Database?
4. Mention the types of dimensions in spatial data cube.
5. Name some of the data mining applications?
6. What are the contributions of data mining to DNA analysis?
7. Name some examples of data mining in retail industry?
8. Explain multimedia data mining
9. What does web mining mean
10.Define text mining
11.How to assess the quality of text retrieval?
12.List out the methods used for information retrieval.
13.What is web usage mining?
14.Differentiate time-series and sequence database.
15.List the kinds of association can be mined from multimedia data.
16 MARKS:
11. 1. Describe the applications and trends in data mining in detail.
2. Explain how data mining is used in banking industry.
3. Explain how data mining is used in health care analysis.
4. Explain data mining applications for Telecommunication industry.
5. Explain data mining applications for retail industry.
6. Explain data mining applications fro financial data analysis.
7. Explain data mining applications for Biomedical and DNA data analysis.
8. Explain in detail about spatial data mining.
9. Explain in detail about text mining.
10.Explain in detail about mining multimedia databases.
11.Explain in detail about mining WWW.
12.Explain in detail about mining time-series and sequence data.