Day by day data is increasing, and most of the data stored in a database after manual transformations and derivations. Scientists can facilitate data intensive applications to study and understand the behaviour of a complex system. In a data intensive application, a scientific model facilitates raw data products to produce new data products and that data is collected from various sources such as physical, geological, environmental, chemical and biological etc. Based on the generated output, it is important to have the ability of tracing an output data product back to its source values if that particular output seems to have an unexpected value. Data provenance helps scientists to investigate the origin of an unexpected value. In this paper our aim is to find a reason behind the unexpected value from a database using query inversion and we are going to propose some hypothesis to make an inverse query for complex aggregation function and multiple relationship (join, set operation) function.
Enhancement techniques for data warehouse staging areaIJDKP
Poor performance can turn a successful data warehousing project into a failure. Consequently, several
attempts have been made by various researchers to deal with the problem of scheduling the Extract-
Transform-Load (ETL) process. In this paper we therefore present several approaches in the context of
enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the
performance of extract and transform phases by proposing two algorithms that reduce the time needed in
each phase through employing the hidden semantic information in the data. Using the semantic
information, a large volume of useless data can be pruned in early design stage. We also focus on the
problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time.
We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we
experimentally show their behavior in terms of execution time in the sales domain to understand the impact
of implementing any of them and choosing the one leading to maximum performance enhancement.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
Enhancement techniques for data warehouse staging areaIJDKP
Poor performance can turn a successful data warehousing project into a failure. Consequently, several
attempts have been made by various researchers to deal with the problem of scheduling the Extract-
Transform-Load (ETL) process. In this paper we therefore present several approaches in the context of
enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the
performance of extract and transform phases by proposing two algorithms that reduce the time needed in
each phase through employing the hidden semantic information in the data. Using the semantic
information, a large volume of useless data can be pruned in early design stage. We also focus on the
problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time.
We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we
experimentally show their behavior in terms of execution time in the sales domain to understand the impact
of implementing any of them and choosing the one leading to maximum performance enhancement.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
The D-basis Algorithm for Association Rules of High ConfidenceITIIIndustries
We develop a new approach for distributed computing of the association rules of high confidence on the attributes/columns of a binary table. It is derived from the D-basis algorithm developed by K.Adaricheva and J.B.Nation (Theoretical Computer Science, 2017), which runs multiple times on sub-tables of a given binary table, obtained by removing one or more rows. The sets of rules retrieved at these runs are then aggregated. This allows us to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute. This paper focuses on some algorithmic details and the technical implementation of the new algorithm. Results are given for tests performed on random, synthetic and real data
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A Framework for Visualizing Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-framework-for-visualizing-association-mining-results/
Association mining is one of the most used data mining techniques due to interpretable and actionable results. In this study we pro-pose a framework to visualize the association mining results, speci¯cally frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several
interesting insights regarding the relationships among the items and sug-gest how they can be used as basis for decision making in retailing.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
FOR MORE CLASSES VISIT
www.cis336.com
CIS 336 Final Exam
Question 1. 1. (TCO 1) A DBMS performs several important functions that guarantee the integrity and consistency of the data in the database. Which of the following is NOT one of those functions?
Application Of Extreme Value Theory To Bursts PredictionCSCJournals
Bursts and extreme events in quantities such as connection durations, file sizes, throughput, etc. may produce undesirable consequences in computer networks. Deterioration in the quality of service is a major consequence. Predicting these extreme events and burst is important. It helps in reserving the right resources for a better quality of service. We applied Extreme value theory (EVT) to predict bursts in network traffic. We took a deeper look into the application of EVT by using EVT based Exploratory Data Analysis. We found that traffic is naturally divided into two categories, Internal and external traffic. The internal traffic follows generalized extreme value (GEV) model with a negative shape parameter, which is also the same as Weibull distribution. The external traffic follows a GEV with positive shape parameter, which is Frechet distribution. These findings are of great value to the quality of service in data networks, especially when included in service level agreement as traffic descriptor parameters.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
The D-basis Algorithm for Association Rules of High ConfidenceITIIIndustries
We develop a new approach for distributed computing of the association rules of high confidence on the attributes/columns of a binary table. It is derived from the D-basis algorithm developed by K.Adaricheva and J.B.Nation (Theoretical Computer Science, 2017), which runs multiple times on sub-tables of a given binary table, obtained by removing one or more rows. The sets of rules retrieved at these runs are then aggregated. This allows us to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute. This paper focuses on some algorithmic details and the technical implementation of the new algorithm. Results are given for tests performed on random, synthetic and real data
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A Framework for Visualizing Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-framework-for-visualizing-association-mining-results/
Association mining is one of the most used data mining techniques due to interpretable and actionable results. In this study we pro-pose a framework to visualize the association mining results, speci¯cally frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several
interesting insights regarding the relationships among the items and sug-gest how they can be used as basis for decision making in retailing.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
FOR MORE CLASSES VISIT
www.cis336.com
CIS 336 Final Exam
Question 1. 1. (TCO 1) A DBMS performs several important functions that guarantee the integrity and consistency of the data in the database. Which of the following is NOT one of those functions?
Application Of Extreme Value Theory To Bursts PredictionCSCJournals
Bursts and extreme events in quantities such as connection durations, file sizes, throughput, etc. may produce undesirable consequences in computer networks. Deterioration in the quality of service is a major consequence. Predicting these extreme events and burst is important. It helps in reserving the right resources for a better quality of service. We applied Extreme value theory (EVT) to predict bursts in network traffic. We took a deeper look into the application of EVT by using EVT based Exploratory Data Analysis. We found that traffic is naturally divided into two categories, Internal and external traffic. The internal traffic follows generalized extreme value (GEV) model with a negative shape parameter, which is also the same as Weibull distribution. The external traffic follows a GEV with positive shape parameter, which is Frechet distribution. These findings are of great value to the quality of service in data networks, especially when included in service level agreement as traffic descriptor parameters.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
Scalable frequent itemset mining using heterogeneous computing par apriori a...ijdpsjournal
Association Rule mining is one of the dominant tasks of data mining, which concerns in finding frequent
itemsets in large volumes of data in order to produce summarized models of mined rules. These models are
extended to generate association rules in various applications such as e-commerce, bio-informatics,
associations between image contents and non image features, analysis of effectiveness of sales and retail
industry, etc. In the vast increasing databases, the major challenge is the frequent itemsets mining in a
very short period of time. In the case of increasing data, the time taken to process the data should be
almost constant. Since high performance computing has many processors, and many cores, consistent runtime
performance for such very large databases on association rules mining is achieved. We, therefore,
must rely on high performance parallel and/or distributed computing. In literature survey, we have studied
the sequential Apriori algorithms and identified the fundamental problems in sequential environment and
parallel environment. In our proposed ParApriori, we have proposed parallel algorithm for GPGPU, and
we have also done the results analysis of our GPU parallel algorithm. We find that proposed algorithm
improved the computing time, consistency in performance over the increasing load. The empirical analysis
of the algorithm also shows that efficiency and scalability is verified over the series of datasets
experimented on many core GPU platform.
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
The Science Information Network (SINET) is a Japanese academic backbone network for more than 800 universities and research institutions. The characteristic of SINET traffic is that it is enormous and highly variable. In this paper, we present a task-decomposition based anomaly detection of massive and highvolatility session data of SINET. Three main features are discussed: Tash scheduling, Traffic discrimination, and Histogramming. We adopt a task-decomposition based dynamic scheduling method to handle the massive session data stream of SINET. In the experiment, we have analysed SINET traffic from 2/27 to 3/8 and detect some anomalies by LSTM based time-series data processing.
Simplified Data Processing On Large ClusterHarsh Kevadia
A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system. They are connected through fast local area network and are deployed to improve performance over that of single computer. We know that on the web large amount of data are being stored, processed and retrieved in a few milliseconds. Doing so with help of single computer machine is very difficult task. And so we require cluster of machines which can perform this task.
Although using cluster for processing data is not enough, we need to develop a technique that can perform this task easily and efficiently. MapReduce programming model is used for this type of processing. In this model Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Enhancing Keyword Query Results Over Database for Improving User Satisfaction ijmpict
Storing data in relational databases is widely increasing to support keyword queries but search results does not gives effective answers to keyword query and hence it is inflexible from user perspective. It would be helpful to recognize such type of queries which gives results with low ranking. Here we estimate prediction of query performance to find out effectiveness of a search performed in response to query and features of such hard queries is studied by taking into account contents of the database and result list. One relevant problem of database is the presence of missing data and it can be handled by imputation. Here an inTeractive Retrieving-Inferring data imputation method (TRIP) is used which achieves retrieving and inferring alternately to fill the missing attribute values in the database. So by considering both the prediction of hard queries and imputation over the database, we can get better keyword search results.
Machine learning techniques can be used to analyse data from different perspectives and enable developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAcscpconf
Web usage mining is the method of extracting interesting patterns from Web usage log file. Web usage mining is subfield of data mining uses various data mining techniques to produce association rules. Data mining techniques are used to generate association rules from transaction data. Most of the time transactions are boolean transactions, whereas Web usage data consists of quantitative values. To handle these real world quantitative data we used fuzzy data mining algorithm for extraction of association rules from quantitative Web log file. To generate fuzzy association rules first we designed membership function. This membership function is used to transform quantitative values into fuzzy terms. Experiments are carried out on different support and confidence. Experimental results show the performance of thealgorithm with varied supports and confidence.
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
Web usage mining is the method of extracting interesting patterns from Web usage log file. Web
usage mining is subfield of data mining uses various data mining techniques to produce
association rules. Data mining techniques are used to generate association rules from
transaction data. Most of the time transactions are boolean transactions, whereas Web usage
data consists of quantitative values. To handle these real world quantitative data we used fuzzy
data mining algorithm for extraction of association rules from quantitative Web log file. To
generate fuzzy association rules first we designed membership function. This membership
function is used to transform quantitative values into fuzzy terms. Experiments are carried out
on different support and confidence. Experimental results show the performance of the
algorithm with varied supports and confidence.
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
2. 18 Computer Science & Information Technology (CS & IT)
difficult to identify. So we want to use a kind of data provenance technology to automatically find
out from where the unexpected data users were obtained from when users see the anomalous and
suspicious data. To get better result we are going to introduce query inversion technique for some
complex aggregation or multiple relationship function. Uses of the property by which some
derivations can be inverted to find the input data supplied to them to derive the output data.
Examples include, if an output of a database query Q applied on some source data D and given
tuple is T then we want to understand which tuples in D contributed to get the output tuple T. A
natural approach is to generate a new query Q0, determined by Q, D and T, such that when the
query Q0 is applied to D, it generates a collection of input tuples that contributed to the output
tuple T. In other words, we would like to identify the provenance by inverting the original query
[10].
2. OVERVIEW OF EXISTING APPROACHES
Representing data provenance has two major approaches: annotation and inversion. Inversion
approach is used to operate on an output data to find an input data. In area of data warehouses,
Cui, Widom, and Wiener [27] first introduced the problem of relational database tracing data
using query inversion. A disadvantage of this approach is that it cannot be used as sub-queries of
normal relational queries and only partially benefit from the query optimization of the underlying
Database Management System (DBMS). Another mechanism is called where-provenance [1].
Mainly, we use this technique for determining where annotations are propagated from. Boris
Glavic et al. [28] also introduced a mechanism that is call Provenance Extension Relational
Model (PERM). The PERM prototype supports provenance computation using Structured Query
Language (SQL). But the disadvantage of this technique is that it does not work for correlated
sub-queries.
One of the earlier definitions was given in the context of geographic information system (GIS). In
GIS, data provenance is known as lineage which explicates the relationship among events and
source data in generating the data product [1]. In the context of data-base systems, data
provenance provides the description of how a data product is achieved through the transformation
activities from its input data [2].
Annotation systems like DB-Notes [CTV05] and MONDRIAN [GKM05] is a common approach
in life sciences [4] and it enables a user to annotate data item with an arbitrary number of notes
which are normally propagated when annotated data is transformed.
VDL (Virtual Data Language) provides query and data definition facilities for the Chimera
system [3] and it supports relational or object-oriented databases and SQL-like transformations.
The PReServ (Provenance Recording for Services) [GMM05, GJM+06a] approach uses a central
provenance management service. It uses a common interface to enable different storage systems
as a provenance store.
Trio is a recursive traversing lineage algorithm to achieve complete provenance of a par-ticular
output tuple which introduces a new query language TriQL [5] to deal with uncer-tainty and
lineage information.
3. QUERY INVERSION MECHANISM
A considerable research effort has been made by the database community to manage data
provenance. Data provenance can be defined at different granularity levels such as relation or
tuple. Furthermore, data provenance has been categorized based on the type of queries (e.g. why,
3. Computer Science & Information Technology (CS & IT) 19
where, how) it can satisfy. Different techniques have been proposed to generate data provenance
in the context of a database system. But we are using query inversion techniques to find out data
provenance from relational and complex database system easiest and fastest way.
To find specific data from databases we used some query. But we do not know behind this query
at the execution time compiler need to process some tuples to generate output. Sometimes due to
those tuples we get some unexpected results. We can not find which tuple is responsible for this
unexpected value. But following our query inversion technique we have become able to find the
problem showing the original data (see figure 1). Example, we wanted to see the sum of salary as
per job category.
Figure 1. Data provenance technique overview.
Code: select job,sum(salary) from emp group by job
So we wrote above query to see the result, but it was showing an “Invalid number”. Although
our query is right but it was showing unexpected result because in our database there might be
unwanted tuple. Due to that tuple this query is not working properly. Now we need to check all
the values of the database to figure out the problem but at the present time the data is increasing
and we have to process a huge amount of data daily. So it is not possible to check all the values
and it is not time efficient. After analyzing the query we have developed an inverse query that is
showing that unexpected tuple.
Code: select job,salary from emp where job in (select job from emp group by job) (see
figure 2).
Figure 2. Showing the unexpected tuple using an inverse query.
4. 20 Computer Science & Information Technology (CS & IT)
3.1. Inverse Query hypothesis making technique
Inversion technique provides compact representation of provenance and this is the main
advantage of this technique. But it is restricted to a certain class of multiple relationship queries
and not universally applicable [26]. So in this section, our aim is to provide a hypothesis about
possibility of developing of inverse query technique that can eliminate a limitation of multiple
relationship queries.
When we need data manipulation we use some queries to get output but behind this output
compiler need to process some tuples. After analyzing those tuples we have introduced some keys
and extra tuples for our inverse query. Here we have described some complex aggregation and
multiple relationship function in a generalized form of query and inverse query. We also have
described the flow chart of the whole development procedure. In the flow chart we have to follow
a life cycle to complete our inverse query (see figure 3, 4, 5). A Life cycle is a Black Arrow →
Green Arrow → Red Arrow → Green Arrow.
3.1.1. Aggregation Functions
Oracle and other query languages support at least five aggregation functions such as min, max,
count, sum, and avg. Aggregate functions are handled by adding parameter values to the group by
list and adding the keyword condition for the aggregate column in the having clause instead of the
where clause of the inverted query. For example if the general form query is:
Code: select ∆H1, f(∆H2) from r group by ∆H1 having <predicate>
Then, the inverse query is as follows:
Code: select ∆H1, ∆H2 from r where ∆H1 in (select ∆H1 from r group by ∆H1 having
<predicate>
Now, when adding keyword selections to the above keyword inverse query, any selections related
to ∆H1 are added to the WHERE clause as before; however, selections relating to f(∆H2) are
added to a HAVING clause.
The relational algebraic translation of our above inverse query is:
∏∆H1, ∆H2,…∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,… ∆Hn),k)(r)) ∩ ∏∆H1, ∆H2,… ∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,…
∆Hn),k)>0 (r))
Here ∏=select clause, Ծ=where clause for predicate, p=projection of attributes and r=table name.
Example of complex Aggregation function: Our generalized approach will work for all normal,
complex and multiple relational aggregation functions.
Code: select sum(e.sal), count(t.deptno) from emp e, dept t
Now if we want to make an inverse query following our algorithm (see figure 3) for complex
relationship, first of all we need to put select key then all attributes, from keyword and table
name. After that according our algorithm, we need to put where keyword and first attribute
without function attribute value, but here we can see that without function attribute there is no
attribute, so we do not need to check the rest of the query after a table name. Our final inverse
query will be as following:
5. Computer Science & Information Technology (CS & IT) 21
Code: select e.sal, t.deptno from emp e, dept t
Following our generalized formula, it is possible to find data provenance for all simple, complex
and multiple relationship aggregation functions. But for some correlated sub-queries our
generalized formula does not work and it is the only limitation.
Figure 3. Inverse Query making procedure for aggregation functions.
3.1.2. Join Operation
The problem of join ordering is very restricted and at the same time it is a very complex one
because it combines two or more tables in a relational database. At the compilation time if it has
found any abnormal tuple it can not produce a result. So it is very difficult to find out the problem
after searching multiple tables. For every tuple in the left input an output tuple must be produced
for every tuple in the right input. A join operation can be implemented much more efficiently.
The approaches to handle the join operations are very similar to the ones for handling the
aggregation function queries that are described in detail in the above section (3.1.1) but the
difference is here we need to create a relationship between two or multiple tables. For example if
the general form query is:
Code: select ∆H1, f(∆H2) from r1 natural join r2 group by ∆H1 having <predicate>
Then, the inverse query is as follows:
Code: select ∆H1, ∆H2 from r1 natural join r2 where ∆H1 in (select ∆H1 from r1 natural
join r2 group by ∆H1 having <predicate>
In this case we have to be more careful about the tuples in relationship to the different tables.
The relational algebraic translation of our above inverse query is:
6. 22 Computer Science & Information Technology (CS & IT)
∏∆H1, ∆H2,…∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,… ∆Hn),k)(r⋈⋈⋈⋈r)) ∩ ∏∆H1, ∆H2,… ∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,…
∆Hn),k)>0 (r⋈⋈⋈⋈r))
Here ∏=select clause, Ծ=where clause for predicate, p=projection of attributes, r=table name and
⋈⋈⋈⋈=join operation name.
Figure 4. Inverse Query making procedure for join operation.
Example of complex Join operation: Our generalized approach will work for all normal,
complex and multiple relational join operations.
Code: select count(cus.cust_first_name), sum(ord.order_total), sum(pro.quantity) from
demo_customers cus natural join demo_orders ord natural join demo_order_items pro
Now if we want to make an inverse query following our algorithm (see figure 4) for multiple
relationship, first of all we need to put select key then all attributes, from keyword and table name
with join keyword. After that according our algorithm, we need to put where keyword and first
attribute without function attribute value, but here we can see that without function attribute there
is no attribute, so we do not need to check the rest of query after a table name. Our final inverse
query will be as following:
Code: select cus.cust_first_name,ord.order_total,pro.quantity from demo_customers cus
natural join demo_orders ord natural join demo_order_items pro
Following our generalized formula it is possible to find data provenance for all simple, complex
and multiple relationship join operations. But for some correlated sub-queries our generalized
formula does not work and it is the only limitation.
7. Computer Science & Information Technology (CS & IT) 23
3.1.3. Set Operation
Set operation allows to be combined the results of multiple queries into a single result. Queries
containing set operators are called compound queries. It includes union, intersect, minus
operation in a database.
3.1.3.1 Intersect
A Set Intersection can be handled by individually inverting a query with respect to each keyword
and then taking a join of the inverted queries on the common parameters. Let’s consider a general
query:
Code: select ∆H1 from r where <predicate1> intersect select ∆H1 from r where
<predicate2>
Then, the inverse query is as follows:
Code: select distinct ∆H1, ∆H2, ∆H3….. from r where <predicate1> and ∆H1 in (select ∆H1
from r where <predicate2>
The relational algebraic translation of our above inverse query is:
∏∆H1, ∆H2,…∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,… ∆Hn) ∩k>0))( r⋈⋈⋈⋈r)) ∩ ∏∆H1, ∆H2,… ∆Hn( ԾԾԾԾp^contains((∆H1,
∆H2,… ∆Hn),k)>0 (r⋈⋈⋈⋈r))
3.1.3.2 Union
Handling the Union clause is complex mainly due to these reasons:
1. Each subquery involved in the Union may contain some of the keywords.
2. Each subquery in the Union may contain only a subset of the overall query parameters.
The approaches to handle the Union clause are very similar to the ones for handling the intersect
queries and are described in detail in section (3.1.3.3). Let’s consider a general query:
Code: select ∆H1 from r where <predicate1> union select ∆H1 from r where <predicate2>
Then, the inverse query is as follows:
Code: select distinct ∆H1, ∆H2, ∆H3….. from r where <predicate1> or ∆H1 in (select ∆H1
from r where <predicate2>
The relational algebraic translation of our above inverse query is:
∏∆H1, ∆H2,…∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,… ∆Hn) ՍՍՍՍ k>0))( r⋈⋈⋈⋈r)) ∩ ∏∆H1, ∆H2,… ∆Hn( ԾԾԾԾp^contains((∆H1,
∆H2,… ∆Hn),k)>0 (r⋈⋈⋈⋈r))
3.1.3.3 Minus/Except
The Minus/Except clause/operator is used to combine two SELECT statements and returns rows
from the first SELECT statement that are not returned by the second SELECT statement. This
means except/minus returns only rows, which are not available in the second SELECT statement.
8. 24 Computer Science & Information Technology (CS & IT)
Let’s consider a general query:
Code: select ∆H1 from r where <predicate1> minus select ∆H1 from r where <predicate2>
Then, the inverse query is as follows:
Code: select distinct ∆H1, ∆H2, ∆H3….. from r where <predicate1> and ∆H1 not in (select
∆H1 from r where <predicate2>
The relational algebraic translation of our above inverse query is:
∏∆H1, ∆H2,…∆Hn( ԾԾԾԾp^contains((∆H1, ∆H2,… ∆Hn) ∩k>0))( r⋈⋈⋈⋈r)) ¬∩ ∏∆H1, ∆H2,… ∆Hn( ԾԾԾԾp^contains((∆H1,
∆H2,… ∆Hn),k)>0 (r⋈⋈⋈⋈r))
Figure 5. Inverse Query making procedure for set operation.
Example of complex Set operation: Our generalized approach will work for all normal,
complex and multiple relational set operations.
Code: select course_id from section where semester='Fall' and year=2009 intersect select
course_id from section where semester='Spring' and year=2010 union select course_id from
teaches where semester='Fall' and year=2009
From above query we can see that there are two set operations: intersect and union. So following
our algorithm (see figure 5), the inverse query will be:
Code: select course_id,semester,year from section where semester='Fall' and year=2009 and
course_id in (select course_id from section where semester='Spring' and year=2010) or
course_id in ( select course_id from teaches where semester='Fall' and year=2009)
9. Computer Science & Information Technology (CS & IT) 25
Following our generalized formula it is possible to find data provenance for all simple, complex
and multiple relationship set operations. But for some correlated sub-queries our generalized
formula does not work and it is the only limitation.
3.2. Prototype Development Algorithm:
First of all we have separated all keys, tables, attributes and predicates from the main query then
we have checked the query to find it,s aggregation, set operation or join operation. Finally our
prototype dynamically set those values following the sequence of the previous flow chart (see
figure 3, 4, 5).
Figure 6. Algorithm for separating keys, attributes, tables and predicate.
4. EXPERIMENTAL RESULT
We have developed a prototype that is providing an inverse query for any given query. After that
we have checked this inverse query is right or wrong in Oracle Database XE 11.2. We have
written a query to get total summation of salary and bonus information the specific department
name category field.
Query: select dept_name,sum(bonus+salary) from instructor group by dept_name
Output: Invalid Number
10. 26 Computer Science & Information Technology (CS & IT)
After that, now we want to know how to get the above results which tuples are working and
which one is responsible for an unexpected result. Then we use our inverse query:
Inverse Query: select dept_name,bonus,salary from instructor where dept_name in(select
dept_name from instructor group by dept_name)
Figure 7. Output of inverse query for above aggregation function.
From our inverse query output (see figure 7) we can see that there is a tuple which contains an
unexpected value. Due to this value our main query provides an unexpected result.
We tested for set operation and join query, we got accurate result to find data provenience.
Query: select course_id from section where semester='Fall' and year=2009 intersect select
course_id from section where semester='Spring' and year=2010 union select course_id from
teaches where semester='Fall' and year=2009 minus select course_id from takes where
semester='Fall' and year=2010 (see figure 8).
Figure 8. Output of above intersect query.
Inverse query: select course_id,semester,year from section where semester='Fall' and
year=2009 and course_id in (select course_id from section where semester='Spring' and
year=2010) or course_id in ( select course_id from teaches where semester='Fall' and
year=2009) and course_id not in (select course_id from takes where semester='Fall' and
year=2010) (see figure 9).
Figure 9. Output of above intersect inverse query.
11. Computer Science & Information Technology (CS & IT) 27
Query: select cust_first_name,sum(order_total),sum(quantity) from demo_customers
natural join demo_orders natural join demo_order_items group by cust_first_name (see
figure 10).
Figure 10. Output of above join query.
Inverse Query: select cust_first_name,order_total,quantity from demo_customers natural
join demo_orders natural join demo_order_items where cust_first_name in(select
cust_first_name from demo_customers natural join demo_orders natural join
demo_order_items group by cust_first_name) (see figure 11).
Figure 11. Output of above join inverse query.
Finally, we have become able to find the data provenance (see figure 2, 7, 8, 9, 10, 11) using our
inverse query. Now if any user gets an abnormal or unexpected value, or they want to check
behind their query, which tuples work, they can easily check by creating an inverse query.
4.1. Performance Evaluation:
To check performance of our new algorithm all experiments are performed on Intel core i3
machine with 4 GB ram and the size of our test database 10MB, 100MB and 500MB. For testing
we have used three types of SQL query such as normal (Q1), complex (Q2) and multiple
relationship (Q3) of four tables. To get execution time for each query we wrote “set statistics time
12. 28 Computer Science & Information Technology (CS & IT)
on” before our query and at the end of our query we also added “set statistics time off”. We
evaluated execution time of each query for aggregation function (see table 1), join operation (see
table 2) and set operation (see table 3). Analyzing the experimental results of aggregation
function (see table 1), we can see that in most of cases there is a little time execution difference
between the main query and the inverse one. If we compare small (10MB) and large (500MB)
datasets, the execution time of our inverse queries is not increasing too much.
Table 1. The Execution time for each query of Aggregation function.
Query 10MB 100MB 500MB
Query Inverse query Query Inverse
query
Query Inverse query
Q1 35 ms 36 ms 69 ms 76 ms 77 ms 94 ms
Q2 66 ms 69 ms 149 ms 157 ms 158 ms 169 ms
Q3 19 ms 321 ms 29 ms 575 ms 31 ms 581 ms
Table 2. The Execution time for each query of Join operation.
Query 10MB 100MB 500MB
Query Inverse query Query Inverse
query
Query Inverse query
Q1 49 ms 70559 ms 69 ms 109513 ms 77 ms 250124 ms
Q2 52 ms 88015 ms 81 ms 191092 ms 98 ms 398029 ms
Q3 55 ms 90939 ms 92 ms 220306 ms 123 ms 502196 ms
Table 3. The Execution time for each query of Set operation.
Query 10MB 100MB 500MB
Query Inverse query Query Inverse
query
Query Inverse query
Q1 3 ms 9598 ms 8 ms 15598 ms 11 ms 18400 ms
Q2 2 ms 8598 ms 9 ms 15101 ms 12 ms 18139 ms
Q3 3 ms 9321 ms 14 ms 17004 ms 19 ms 21409 ms
Execution time changes rather high from main query to inverse query for join and set operations.
And if we compare the execution time of our inverse query for small (10MB) and large (500MB)
datasets, the difference will be higher due to in this case a processor needs to process a huge
dataset to provide the results.
5. CONCLUSION AND FUTURE WORK
Nowadays, provenance of data products is a widely-studied topic that attracts much attention of
researchers. In this paper the main focus was to provide a guideline to find data provenance for
unexpected values. To solve this problem we used an inverse query mechanism. Therefore, we
showed that can easily find data provenance with the use of inverse queries. We proposed the
13. Computer Science & Information Technology (CS & IT) 29
generalized forms that work for all types of normal, complex and multiple relationship queries
except nested query. We presented also the execution times of our inverse queries for small and
large datasets. Finally, we found that the execution time is not high for large datasets comparing
to small datasets, and in most of the cases our solution is rather fast. Thus this technique can be
used in different applications. For example, we generate business reports using different
applications. If we suspect any errors in these reports, we have to check the related source
datasets manually. Now this problem has been solved by using the proposed technique, as we can
easily find the possible error in rather efficient way.
Since the proposed technique does not work for sub-queries, but only works for normal, complex
and multiple relationship functions, the sub-queries go in focus of future research. Thus we plan
to solve the sub-queries problem by improving the technique and develop a web prototype so that
any user could generate the inverse query related his or her main query.
ACKNOWLEDGEMENTS
We would like to thank all the colleagues at the School of Software Engineering of NRU HSE for
their feedback and useful recommendations that contributed to bringing this paper to its final
form.
REFERENCES
[1] P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In
Proceedings of the International Conference on Database Theory, pages 316–330, 2001. (Cited on
pages 2 and 21.)
[2] D. P. Lanter. Design of a lineage-based meta-data base for GIS.Cartography and Geographic
Information Scence, 18(4):255–261, 1991.(Cited on pages 2 and 29.)
[3] Ian T. Foster, Jens-S. V¨ockler, Michael Wilde, and Yong Zhao. Chimera: A Vir-tual Data System for
Representing, Querying, and Automating Data Derivation. In SSDBM’02: Proceedings of the 14th
International Conference on Scientific and Statistical Database Management, pages 37–46,
Washington, DC, USA, 2002. IEEE Computer Society.
[4] Laura Chiticariu, Wang-Chiew Tan, and Gaurav Vijayvargiya. DBNotes: a post-it system for
relational databases based on provenance. In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD
international conference on Management of data, pages 942–944, New York, NY, USA, 2005. ACM
Press.
[5] O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and
lineage. In Proceedings of the International Conference on Very Large Data Bases, pages 953–964,
2006. (Cited on page 22.)
[6] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. Nabar,T. Sugihara, and J. Widom. Trio: A
system for data, uncertainty, and lineage. In Proceedings of the International Conference on Very
Large DataBases, pages 1151–1154, 2006. (Cited on page 22.)
[7] M. K. Anand, S. Bowers, T. McPhillips, and B. Ludäscher. Efficient provenance storage over nested
data collections. In Proceedings of the International Confe-rence on Extending Database Technology:
Advances in Database Technology, pages 958–969. ACM, 2009. (Cited on page 20.)
[8] S. Davidson, , S. C. Boulakia, A. Eyal, B. Ludäscher, T. M. McPhillips, S. Bowers, M. K. Anand, and
J. Freire. Provenance in scientific workflow systems. IEEE Data Engineering Bulletin, 30(4):44–50,
2007.(Cited on pages 15 and 17.)
14. 30 Computer Science & Information Technology (CS & IT)
[9] U. Park and J. Heidemann. Provenance in sensornet republishing. In Provenance and Annotation of
Data and Processes, volume 5272 of LNCS, pages 280–292. Springer, 2008. (Cited on pages 26, 28,
37, 38, 111, 122,and 264.)
[10] M. R. Huq, P. M. G. Apers, and A. Wombacher. An Inference-based Framework to Manage Data
Provenance in Geoscience Applications. Accepted in IEEE Transactions on Geoscience and Remote
Sensing, Earlyaccess article DOI: 10.1109/TGRS.2013.2247769, IEEE Geoscience and Remote
Sensing Society, 2013.
[11] C. Beeri, A. Eyal, S. Kamenkovich, and T. Milo. Querying business processes. In VLDB, 2006.
[12] O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and
lineage. In VLDB, 2006.
[13] Grigoris Karvounarakis, Zachary G. Ives and Val Tannen: Querying Data Prove-nance. SIGMOD’10,
June 6–11, 2010, Indianapolis, Indiana, USA.
[14] Boris Glavic, Klaus Dittrich: Data Provenance: A Categorization of Existing Ap-proaches. SNF Swiss
National Science Foundation: NFS SESAM
[15] Qi Yang. Computation of chain queries in distributed database systems.In Proc. of the ACM
SIGMOD Conf. on Management of Data, pages 348-355
[16] L. Becker and R. H. G uting. Rule-based optimization and query processing in an extensible
geometric database system. ACM Trans. on Database Systems (to appear)
[17] P. Bernstein, E. Wong, C. Reeve, and J. Rothnie. Query processing in a system for distributed
databases (sdd-1). ACM Trans. on Database Systems, 6(4):603-625
[18] Parag Agrawal, Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Shubha Nabar, Tomoe
Sugihara, and JenniferWidom. An Introduction to ULDBs and the Trio System. IEEE Data
Engineering Bulletin, 29(1):5–16, 2006.
[19] Yingwei Cui, Jennifer Widom, and Janet L. Wiener. Tracing the lineage of view data in a
warehousing environment. ACM Trans. Database Syst., 25(2):179–227, 2000.
[20] Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. A survey of data prove-nance in e-science.
SIGMOD Rec., 34(3):31–36, 2005.
[21] Paul Groth, Sheng Jiang, Simon Miles, Steve Munroe, Victor Tan, Sofia Tsasa-kou, and Luc Moreau.
An Architecture for Provenance Systems — Executive Summary.Technical report, University of
Southampton, February 2006.
[22] Ian T. Foster, Jens-S. V¨ockler, Michael Wilde, and Yong Zhao. Chimera: A Vir-tual Data System for
Representing, Querying, and Automating Data Derivation. In SSDBM ’02: Proceedings of the 14th
International Conference on Scientific and Statistical Database Management, pages 37–46,
Washington, DC, USA, 2002. IEEE Computer Society.
[23] Dennis P. Groth. Information Provenance and the Knowledge Rediscovery Prob-lem. In IV, pages
345–351. IEEE Computer Society, 2004.
[24] P. Yue, Z. Sun, J. Gong, L. Di, and X. Lu. A provenance framework for web geo-processing
workflows. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium,
pages 3811–3814. IEEE, 2011. (Cited on page 29.)
[25] M. R. Huq, A. Wombacher, A. Mileo. Data Provenance Inference in Logic Pro-gramming: Reducing
Effort of Instance-driven Debugging.Technical Report TR-CTIT-13-11, Centre for Telematics and
Information Technology, University of Twente, 2013.
15. Computer Science & Information Technology (CS & IT) 31
[26] Bhagwat, L. Chiticariu, W. C. Tan, and G.Vijayvargiya, "An Annotation Management System for
Relational Databases," in VLDB, 2004, pp. 900-911.
[27] Y. Cui, J. Widom, and J. Wiener. Tracing the Lineage of View Data in a Warehousing Environment.
ACM Transactions on Database Systems (TODS), 25(2):179–227, 2000.
[28] Perm: Processing Provenance and Data on the same Data Model through Query Rewriting –Boris
Glavic, Gustavo Alonso — 2009 — In ICDE ’09: Proceedings of the 25th International Conference
on Data Engineering
AUTHORS
Md Salah Uddin is Master’s student of the program “System and Software Engineering” at NRU HSE,
Moscow, Russian Federation.
Research interests: Cloud Computing and Security, Big Data, Solid State Drivers, Databases and Neural
networks.
Dmitry V. Alexandrov is Professor of the School of Software Engineering at NRU HSE, Moscow,
Russian Federation; Professor of the Chair of Innovative Entrepreneurship at Bauman MSTU, Moscow,
Russian Federation.
Research interests: Artificial Intelligence, Multi Agent Systems, Data Analysis, Databases, Mobile
Applications Development
Armanur Rahman is Junior Software Engineer at BJIT Limited, Dhaka, Bangladesh. He has awarded his
bachelor degree at East West University, Bangladesh.
Research interests: Data Mining, Databases and Neural networks.