This document contains 8 questions related to the topics of data warehousing and data mining for a supplementary exam. The questions cover various topics such as data mining processes and algorithms, data warehouse architectures, schema definitions, data integration issues, data cleaning techniques, clustering methods, decision trees, association rule mining and more. Students were instructed to answer any 5 of the 8 questions.
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing
improves anomaly detection accuracy.
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing
improves anomaly detection accuracy.
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
https://imatge.upc.edu/web/publications/efficient-exploration-region-hierarchies-semantic-segmentation
The motivation of this work is the efficient exploration of hierarchical partitions for semantic segmentation as a method for locating objects in images. While many efforts have been focused on efficient image search in large-scale databases, few works have addressed the problem of locating and recognizing objects efficiently within a given image. My work considers as an input a hierarchical partition of an image that defines a set of regions as candidate locations to contain an object. This approach will be compared to other state of the art algorithms that extract object candidates for an image. The final goal of this work is to semantically segment images efficiently by exploiting the multiscale information provided by a hierarchical partition, maximizing the accuracy of the segmentation when only a very few regions of the partition are analysed.
Semantic Web mining using nature inspired optimization methodslucianb
In this paper, nature inspired methods are proposed for solving problems in the field of Semantic Web mining, namely the clustering of Web resources based on their metadata, as well as the automatic classification of Web pages.
Color Based Object Tracking with OpenCV A SurveyYogeshIJTSRD
Object tracking is a rapidly growing field in machine learning. Object tracking is exactly what name suggests, to keep tracking of an object. This method has all sorts of application in wide range of fields like military, household, traffic cameras, industries, etc. There are certain algorithms for the object tracking but the easiest one is color based object detection. This is a color based algorithm for object tracking supported very well in OpenCV library. OpenCV is an library popular among python developers, those who are interested in Computer vision. It is an open source library and hence anyone can use and modify it without any restrictions and licensing. The Color based method of object tracking is fully supported by OpenCVs vast varieties of functions. There is little bit of simple math and an excellent logic behind this method of object tracking. But in simple language the target object is identified from and image given explicitly by user or some area selected from frame of video, and algorithm continuously search for that object from each frame in video and highlights the best match for every frame. But like every algorithm it also has some pros and cons which are discussed here. Vatsal Bambhania | Harshad P Patel "Color Based Object Tracking with OpenCV - A Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd39964.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/39964/color-based-object-tracking-with-opencv--a-survey/vatsal-bambhania
Data Integration at the Ontology Engineering GroupOscar Corcho
Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
https://imatge.upc.edu/web/publications/efficient-exploration-region-hierarchies-semantic-segmentation
The motivation of this work is the efficient exploration of hierarchical partitions for semantic segmentation as a method for locating objects in images. While many efforts have been focused on efficient image search in large-scale databases, few works have addressed the problem of locating and recognizing objects efficiently within a given image. My work considers as an input a hierarchical partition of an image that defines a set of regions as candidate locations to contain an object. This approach will be compared to other state of the art algorithms that extract object candidates for an image. The final goal of this work is to semantically segment images efficiently by exploiting the multiscale information provided by a hierarchical partition, maximizing the accuracy of the segmentation when only a very few regions of the partition are analysed.
Semantic Web mining using nature inspired optimization methodslucianb
In this paper, nature inspired methods are proposed for solving problems in the field of Semantic Web mining, namely the clustering of Web resources based on their metadata, as well as the automatic classification of Web pages.
Color Based Object Tracking with OpenCV A SurveyYogeshIJTSRD
Object tracking is a rapidly growing field in machine learning. Object tracking is exactly what name suggests, to keep tracking of an object. This method has all sorts of application in wide range of fields like military, household, traffic cameras, industries, etc. There are certain algorithms for the object tracking but the easiest one is color based object detection. This is a color based algorithm for object tracking supported very well in OpenCV library. OpenCV is an library popular among python developers, those who are interested in Computer vision. It is an open source library and hence anyone can use and modify it without any restrictions and licensing. The Color based method of object tracking is fully supported by OpenCVs vast varieties of functions. There is little bit of simple math and an excellent logic behind this method of object tracking. But in simple language the target object is identified from and image given explicitly by user or some area selected from frame of video, and algorithm continuously search for that object from each frame in video and highlights the best match for every frame. But like every algorithm it also has some pros and cons which are discussed here. Vatsal Bambhania | Harshad P Patel "Color Based Object Tracking with OpenCV - A Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd39964.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/39964/color-based-object-tracking-with-opencv--a-survey/vatsal-bambhania
Data Integration at the Ontology Engineering GroupOscar Corcho
Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).
1. www.jntuworld.com www.jwjobs.net
Code No. M0522 R07 Set No.1
IV B.Tech I Semester Supplementary Examinations, February, 2012
DATA WAREHOUSING AND DATA MINING
(Computer Science and Engineering)
Time: 3 hours Max. Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
*****
1. a) What is data mining? Briefly explain the Knowledge discovery process.
b) Explain the three-tier data warehouse architecture.
D
2. a) With an example, describe any two schema (star/snowflake/fact constellation)
definitions using DMQL statements.
R L
b) What is data integration? Discuss the issues to be considered for data integration.
3. a) Briefly describe data generalization, summarization and analytical characterization
O
b) What is association and correlation? With an example describe classification and
prediction.
W
4. What is constraint-based mining? Describe in detail about the possible constraints in high-
level declarative DMQL and user interface.
U
T
5. a) What is backpropogation? Describe backpropogation algorithm.
b) Discuss about multidimensional association rule mining from relational databases.
J N
6. a) Describe how categorization of major clustering methods is being done.
b) What is Hierarchical clustering? Describe any one Hierarchical clustering algorithm.
7. a) What is text mining? Describe about basic measures for text retrieval.
b) Briefly describe document cluster analysis.
8. What is conceptual clustering? Describe in detail about COBWEB.
1 of 1
www.jntuworld.com
2. www.jntuworld.com www.jwjobs.net
Code No. M0522 R07 Set No.2
IV B.Tech I Semester Supplementary Examinations, February, 2012
DATA WAREHOUSING AND DATA MINING
(Computer Science and Engineering)
Time: 3 hours Max. Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
*****
1. a) What is a concept hierarchy? Describe the OLAP operations in the Multidimensional
data model.
b) What is association and correlation? With an example describe classification and
D
prediction.
R L
2. a) What is data cleaning? Describe the approaches to fill missing values.
b) What is noisy data? Explain the binning methods for data smoothening.
O
3. a) Draw and explain the architecture of a typical data mining system.
b) Briefly discuss about functional components of GUI based data mining system.
U W
4. a) Discuss the concept involved in designing GUI based on DMQL.
b) What is a data ware house? Differentiate between operational data base system and
data warehouses.
N T
5. a) What is Decision tree? With an example, briefly describe the algorithm for generating
decision tree.
J
b) What is a concept hierarchy? Describe the OLAP operations in the Multidimensional
data model.
6. a) What is cluster analysis? Describe the dissimilarity measures for interval-scaled
variables and binary variables.
b) What are Bayesian classifiers? With an example, describe how to predict a class
label using naive Bayesian classification.
1 of 2
www.jntuworld.com
3. www.jntuworld.com www.jwjobs.net
Code No. M0522
R07 Set No.2
7. a) What is misclassification rate of a classifier? Describe sensitivity and specificity
measures of a classifier.
b) Describe Constraint based association mining.
8. What is spatial data mining? What is spatial data cube, and what are the three
dimensions in a spatial data cube?
L D
O R
U W
N T
J
2 of 2
www.jntuworld.com
4. www.jntuworld.com www.jwjobs.net
Code No. M0522 R07 Set No.3
IV B.Tech I Semester Supplementary Examinations, February, 2012
DATA WAREHOUSING AND DATA MINING
(Computer Science and Engineering)
Time: 3 hours Max. Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
*****
1. a) What is transactional database? Describe any five advanced database systems.
b) Draw and explain the architecture of a typical data mining system.
L D
2. a) What is data normalization? Explain any two normalization methods.
b) What is data dispersion? Describe the common measures for data dispersion.
Generation for categorical data.
O R
3. With examples, describe in detail about the available techniques for concept Hierarchy
4. a) What is Analytical characterization? What is the need to perform attribute relevance
analysis?
W
b) Describe the procedure for mining class comparisons.
U
T
5. a) What is attribute selection measure? Briefly describe the attribute selection measures
for decision tree induction.
N
b) Briefly outline the major steps of decision tree classification.
J
6. a) Describe the dissimilarity measures for interval-scaled variables and binary variables.
b) Describe how categorization of major clustering methods is being done.
7. a) What is multimedia data mining? What kind of associations can be mined in
multimedia data?
b) What is Grid based clustering? Describe any one Grid based clustering algorithm.
8. What is WWW? What is meant by authoritative web page? Describe web usage mining.
1 of 1
www.jntuworld.com
5. www.jntuworld.com www.jwjobs.net
Code No. M0522 R07 Set No.4
IV B.Tech I Semester Supplementary Examinations, February, 2012
DATA WAREHOUSING AND DATA MINING
(Computer Science and Engineering)
Time: 3 hours Max. Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
*****
1. a) Describe the three challenges to data mining regarding data mining methodology
and user interaction issues.
b) With an example, describe snowflake and fact constellations.
2. a) Briefly describe various forms of data pre-processing.
L D
b) What is a measure? How measures are computed? Describe the organization of
R
measures.
3. a) What is a concept hierarchy? Briefly describe the OLAP operations in the
Multidimensional data model.
O
b) Briefly describe the primitives for specifying a data mining task.
steps.
U W
4. a) What are the quantitative association rules? What is ARCS and discuss the involved
T
b) What is transactional database? With an example, explain multilevel association rule
mining.
J N
5. a) Describe the criteria used to evaluate classification and prediction methods.
b) With an example, explain the classification by decision tree induction.
6. a) What is Density based clustering? Describe DBSCAN clustering algorithm.
b) What is partitioning method? Describe any one partition based clustering algorithm.
7. a) What is informational data store? Briefly describe the characteristics of informational
data.
b) What is Association rule mining? Briefly describe the criteria for classifying
association rules
8. What is multimedia data mining? How similarity search can be performed on multimedia
data? Describe the contents of a multimedia data cube.
1 of 1
www.jntuworld.com