International Refereed Journal of Engineering and Science (IRJES) is a peer reviewed online journal for professionals and researchers in the field of computer science. The main aim is to resolve emerging and outstanding problems revealed by recent social and technological change. IJRES provides the platform for the researchers to present and evaluate their work from both theoretical and technical aspects and to share their views.
www.irjes.com
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Column store databases approaches and optimization techniquesIJDKP
Column-Stores database stores data column-by-column. The need for Column-Stores database arose for
the efficient query processing in read-intensive relational databases. Also, for read-intensive relational
databases,extensive research has performed for efficient data storage and query processing. This paper
gives an overview of storage and performance optimization techniques used in Column-Stores.
Load balancing functionalities are crucial for best Grid performance and utilization. Accordingly,this paper presents a new meta-scheduling method called TunSys. It is inspired from the natural phenomenon of heat propagation and thermal equilibrium. TunSys is based on a Grid polyhedron model with a spherical like structure used to ensure load balancing through a local neighborhood propagation strategy. Furthermore, experimental results compared to FCFS, DGA and HGA show encouraging results in terms of system performance and scalability and in terms of load balancing efficiency.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly,
mining can performed at the local sites and the results can be aggregated. When the number of
features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The
reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network.
Decision tree clustering a columnstores tuple reconstructioncsandit
Column-Stores has gained market share due to promi
sing physical storage alternative for
analytical queries. However, for multi-attribute qu
eries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. T
his paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approa
ch exploits decision tree algorithm to
cluster attributes for each projection and also eli
minates frequent database scanning.
Experimentations with TPC-H data shows the effectiv
eness of proposed approach.
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...IJECEIAES
Aggregate function and iceberg queries are important and common in many applications of data warehouse because users are generally interested in looking for variance or unusual patterns. Normally, the nature of the queries to be executed on data warehouse are the queries with aggregate function followed by having clause, these type of queries are known as iceberg query. Especially to have efficient techniques for processing aggregate function of iceberg query is very important because their processing cost is much higher than that of the other basic relational operations such as SELECT and PROJECT. Presently available iceberg query processing techniques faces the problem of empty bitwise AND,OR XOR operation and requires more I/O access and time.To overcome these problems proposed research provides efficient algorithm to execute iceberg queries using priority based bitmap indexing strategy. Priority based approach consider bitmap vector to be executed as per the priority.Intermediate results are evaluated to find probability of result.Fruitless operations are identified and skipped in advance which help to reduce I/O access and time.Time and iteration required to process query is reduced [45-50] % compare to previous strategy. Experimental result proves the superiority of priorty based approach compare to previous bitmap processing approach.
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYijcseit
In this paper, clustering of scientific workflows is investigated. It proposes a work to encode workflows through workflow representations as sets of embedded workflows. Then, it embeds extracted workflow
motifs in sets of workflows. By motifs, common patterns of workflow steps and relationships are replaced with indices. Motifs are defined as small functional units that occur much more frequently than expected.They can show hidden relationships, and they keep as much underlying information as possible. In order to have a good estimate on distances between observed workflows, this work proposes the scientific workflow clustering problem with exploiting set descriptors, instead of vector based descriptors. It uses k-means
algorithm as a popular clustering algorithm for workflow clustering. However, one of the biggest limitations of the k-means algorithm is the requirement of the number of clusters, K, to be specified before the algorithm is applied. To address this problem it proposes a method based on the SFLA. The simulation results show that the proposed method is better than PSO and GA algorithms in the K selection.
Today the most commonly used system architectures in data processing can be divided into three categories, general purpose processors, application specific architectures and reconfigurable architectures. Application specific architectures are efficient and give good performance, but are inflexible. Recently reconfigurable systems have drawn increasing attention due to their combination of flexibility and efficiency. Re-configurable architectures limit their flexibility to a particular algorithm. This paper introduces approaches to mapping point arithmetic. After presenting an optimal formulation using applications onto CGRAs supporting both integer and floating point unit.High-level design entry tools are essential for reconfigurable systems, especially coarse-grained reconfigurable architectures.Coarse-grained reconfigurable architectures have drawn increasing attention due to their performance and flexibility. However, their applications have been restricted to domains based on integer arithmetic since typical CGRAs support only integer arithmetic or logical operations. In this project we introduce an approach to map applications onto CGRAs supporting floating point addition. The increase in requirements for more flexibility and higher performance in embedded systems design, reconfigurable computing is becoming more and more popular.
Practical Parallel Hypergraph Algorithms
Author: Julian Shun
Publication:PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming February 2020 Pages 232–249https://doi.org/10.1145/3332466.3374527
Column store databases approaches and optimization techniquesIJDKP
Column-Stores database stores data column-by-column. The need for Column-Stores database arose for
the efficient query processing in read-intensive relational databases. Also, for read-intensive relational
databases,extensive research has performed for efficient data storage and query processing. This paper
gives an overview of storage and performance optimization techniques used in Column-Stores.
Load balancing functionalities are crucial for best Grid performance and utilization. Accordingly,this paper presents a new meta-scheduling method called TunSys. It is inspired from the natural phenomenon of heat propagation and thermal equilibrium. TunSys is based on a Grid polyhedron model with a spherical like structure used to ensure load balancing through a local neighborhood propagation strategy. Furthermore, experimental results compared to FCFS, DGA and HGA show encouraging results in terms of system performance and scalability and in terms of load balancing efficiency.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly,
mining can performed at the local sites and the results can be aggregated. When the number of
features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The
reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network.
Decision tree clustering a columnstores tuple reconstructioncsandit
Column-Stores has gained market share due to promi
sing physical storage alternative for
analytical queries. However, for multi-attribute qu
eries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. T
his paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approa
ch exploits decision tree algorithm to
cluster attributes for each projection and also eli
minates frequent database scanning.
Experimentations with TPC-H data shows the effectiv
eness of proposed approach.
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...IJECEIAES
Aggregate function and iceberg queries are important and common in many applications of data warehouse because users are generally interested in looking for variance or unusual patterns. Normally, the nature of the queries to be executed on data warehouse are the queries with aggregate function followed by having clause, these type of queries are known as iceberg query. Especially to have efficient techniques for processing aggregate function of iceberg query is very important because their processing cost is much higher than that of the other basic relational operations such as SELECT and PROJECT. Presently available iceberg query processing techniques faces the problem of empty bitwise AND,OR XOR operation and requires more I/O access and time.To overcome these problems proposed research provides efficient algorithm to execute iceberg queries using priority based bitmap indexing strategy. Priority based approach consider bitmap vector to be executed as per the priority.Intermediate results are evaluated to find probability of result.Fruitless operations are identified and skipped in advance which help to reduce I/O access and time.Time and iteration required to process query is reduced [45-50] % compare to previous strategy. Experimental result proves the superiority of priorty based approach compare to previous bitmap processing approach.
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYijcseit
In this paper, clustering of scientific workflows is investigated. It proposes a work to encode workflows through workflow representations as sets of embedded workflows. Then, it embeds extracted workflow
motifs in sets of workflows. By motifs, common patterns of workflow steps and relationships are replaced with indices. Motifs are defined as small functional units that occur much more frequently than expected.They can show hidden relationships, and they keep as much underlying information as possible. In order to have a good estimate on distances between observed workflows, this work proposes the scientific workflow clustering problem with exploiting set descriptors, instead of vector based descriptors. It uses k-means
algorithm as a popular clustering algorithm for workflow clustering. However, one of the biggest limitations of the k-means algorithm is the requirement of the number of clusters, K, to be specified before the algorithm is applied. To address this problem it proposes a method based on the SFLA. The simulation results show that the proposed method is better than PSO and GA algorithms in the K selection.
Today the most commonly used system architectures in data processing can be divided into three categories, general purpose processors, application specific architectures and reconfigurable architectures. Application specific architectures are efficient and give good performance, but are inflexible. Recently reconfigurable systems have drawn increasing attention due to their combination of flexibility and efficiency. Re-configurable architectures limit their flexibility to a particular algorithm. This paper introduces approaches to mapping point arithmetic. After presenting an optimal formulation using applications onto CGRAs supporting both integer and floating point unit.High-level design entry tools are essential for reconfigurable systems, especially coarse-grained reconfigurable architectures.Coarse-grained reconfigurable architectures have drawn increasing attention due to their performance and flexibility. However, their applications have been restricted to domains based on integer arithmetic since typical CGRAs support only integer arithmetic or logical operations. In this project we introduce an approach to map applications onto CGRAs supporting floating point addition. The increase in requirements for more flexibility and higher performance in embedded systems design, reconfigurable computing is becoming more and more popular.
Practical Parallel Hypergraph Algorithms
Author: Julian Shun
Publication:PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming February 2020 Pages 232–249https://doi.org/10.1145/3332466.3374527
Enhancing keyword search over relational databases using ontologiescsandit
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users
to access relational databases using a set of keywords. Although much research has been done
and several prototypes have been developed recently, most of this research implements exact
(also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannot
get an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems
with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve more
relevant answers to user.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users to access relational databases using a set of keywords. Although much research has been done and several prototypes have been developed recently, most of this research implements exact also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannotget an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve morerelevant answers to user.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Data Partitioning in Mongo DB with CloudIJAAS Team
Cloud computing offers various and useful services like IAAS, PAAS SAAS for deploying the applications at low cost. Making it available anytime anywhere with the expectation to be it scalable and consistent. One of the technique to improve the scalability is Data partitioning. The alive techniques which are used are not that capable to track the data access pattern. This paper implements the scalable workload-driven technique for polishing the scalability of web applications. The experiments are carried out over cloud using NoSQL data store MongoDB to scale out. This approach offers low response time, high throughput and less number of distributed transaction. The result of partitioning technique is conducted and evaluated using TPC-C benchmark.
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Cars are a very important part of this modern world because they give luxury and comfort. Even
though they are comfortable, some problems always keep arising on the safety side. After a lot of research they
rectified certain problems using air bags, auto parking, turbo charger, pedal shift…, etc.
And now we are going to discuss about one such problem that arises on the safety side. An unsuspected
accident occurs when people smash their fingers in between the car doors. Due to this kind of accident around
120,000 people are injured every year. But this was not taken as a very major safety concern for the customer.
To avoid this kind accident due to car doors, we are introducing “SAFETY DOOR LOCK SYSTEM”
with the help of “HYDRAULIC PISTON AND IR SENSORS”.
The major working process of the “SAFETY DOOR LOCK SYSTEM”is, when a person places his/her
hand or fingers in the gap between the door and the outer panel, at the time when the closing action of the door
takes place, the Sensors start to transmit the Infra Red Rays to the Receivers at the
other end, and so even if someone closes the door without anybody‟s knowledge the hydraulic piston will
automatically come out and stop the door from closing and prevent the person from the unsuspected accident
and minor injuries by the car door and ensure maximum safety to the customer.
Extrusion can be defined as the process of subjecting a material to compression so that it is forced to
flow through an opening of a die and takes the shape of the hole. Multi-hole extrusion is the process of
extruding the products through a die having more than one hole. Multi-hole extrusion increases the production
rate and reduces the cost of production. In this study the ram force has calculated experimentally for single hole
and multi-hole extrusion. The comparison of ram forces between the single hole and multi-hole extrusion
provides the inverse relation between the numbers of holes in a die and ram force. The experimental lengths of
the extruded products through the various holes of multi-hole die are different. It indicates that the flow pattern
is dependent on the material behavior. The micro-hardness test has done for the extruded products of lead
through multi-hole die. It is observed that the hardness of the extruded lead products from the central hole is
found to be more than that of the products extruded from other holes. The study suggests that multi-hole
extrusion can be used for obtaining the extruded products of lead with varying hardness. The micro-structure
study has done for the lead material before and after extrusion. It is observed that the size of grains of lead
material after extrusion is smaller than the original lead.
Analysis of Agile and Multi-Agent Based Process Scheduling Modelirjes
As an answer of long growing frustration of waterfall Software development life cycle concepts,
agile software development concept was evolved in 90’s. The most popular agile methodologies is the Extreme
Programming (XP). Most software companies nowadays aim to produce efficient, flexible and valuable
Software in short time period with minimal costs, and within unstable, changing environments. This complex
problem can be modeled as a multi-agent based system, where agents negotiate resources. Agents can be used to
represent projects and resources. Crucial for the multi-agent based system in project scheduling model, is the
availability of an effective algorithm for prioritizing and scheduling of task. To evaluate the models, simulations
were carried out with real life and several generated data sets. The developed model (Multi-agent based System)
provides an optimized and flexible agile process scheduling and reduces overheads in the software process as it
responds quickly to changing requirements without excessive work in project scheduling.
Effects of Cutting Tool Parameters on Surface Roughnessirjes
This paper presents of the influence on surface roughness of Co28Cr6Mo medical alloy machined
on a CNC lathe based on cutting parameters (rotational speed, feed rate, depth of cut and nose radius).The
influences of cutting parameters have been presented in graphical form for understanding. To achieve the
minimum surface roughness, the optimum values obtained for rpm, feed rate, depth of cut and nose radius were
respectively, 318 rpm, 0,1 mm/rev, 0,7 mm and 0,8 mm. Maximum surface roughness has been revealed the
values obtained for rpm, feed rate, depth of cut and nose radius were respectively, 318 rpm, 0,25 mm/rev, 0,9
mm and 0,4 mm.
Possible limits of accuracy in measurement of fundamental physical constantsirjes
The measurement uncertainties of Fundamental Physical Constants should take into account all
possible and most influencing factors. One from them is the finiteness of the model that causes the existence of
a-priori error. The proposed formula for calculation of this error provides a comparison of its value with the
actual experimental measurement error that cannot be done an arbitrarily small. According to the suggested
approach, the error of the researched Fundamental Physical Constant, measured in conventional field studies,
will always be higher than the error caused by the finite number of dimensional recorded variables of physicalmathematical
models. Examples of practical application of the considered concept for measurement of fine
structure constant, speed of light and Newtonian constant of gravitation are discussed.
Performance Comparison of Energy Detection Based Spectrum Sensing for Cogniti...irjes
With the rapid deployment of new wireless devices and applications, the last decade has witnessed a growing
demand for wireless radio spectrum. However, the policy of fixed spectrum assignment produces a bottleneck for more
efficient spectrum utilization, such that a great portion of the licensed spectrum is severely under-utilized. So the concept of
cognitive radio was introduced to address this issue.The inefficient usage of the limited spectrum necessitates the
development of dynamic spectrum access techniques, where users who have no spectrum licenses, also known as secondary
users, are allowed to use the temporarily unused licensed spectrum. For this purpose we have to know the presence or
absence of primary users for spectrum usage. So spectrums sensing is one of the major requirements of cognitive radio.Many
spectrum sensing techniques have been developed to sense the presence or absence of a licensed user. This paper evaluates
the performance of the energy detection based spectrum sensing technique in noisy and fading environments.The
performance of the energy detection technique will be evaluated by use of Receiver Operating Characteristics (ROC) curves
over additive white Gaussian noise (AWGN) and fading channels.
Comparative Study of Pre-Engineered and Conventional Steel Frames for Differe...irjes
In this paper, the conventional steel frames having triangular Pratt truss as a roofing system of 60 m
length, span 30m and varying bay spacing 4m, 5m and 6m respectively having eaves level for all the portals is at
10m and the EOT crane is supported at the height of 8m from ground level and pre-engineered steel frames of
same dimensions are analyzed and designed for wind zones (wind zone 2, wind zone 3, wind zone 4 and wind
zone 5) by using STAAD Pro V8i. The study deals with the comparative study of both conventional and preengineered
with respect to the amount of structural steel required, reduction in dead load of the structure.
Flip bifurcation and chaos control in discrete-time Prey-predator model irjes
The dynamics of discrete-time prey-predator model are investigated. The result indicates that the
model undergo a flip bifurcation which found by using center manifold theorem and bifurcation theory.
Numerical simulation not only illustrate our results, but also exhibit the complex dynamic behavior, such as the
periodic doubling in period-2, -4 -8, quasi- periodic orbits and chaotic set. Finally, the feedback control method
is used to stabilize chaotic orbits at an unstable interior point.
Energy Awareness and the Role of “Critical Mass” In Smart Citiesirjes
A Smart City could be depicted as a place, logical and physical, in which a crowd of heterogeneous
entities is related in time and space through different types of interactions. Any type of entity, whether it is a
device or a person, clustered in communities, becomes a source of context-based data.
Energy awareness is able to drive the process of bringing our society to limit energy waste and to optimize
usage of available resources, causing a strong environmental and social impact. Then, following social network
analysis methodologies related to the dynamics of complex systems, it is possible to find out, emergent and
sometimes hidden new habits of electricity usage. Through an initial Critical Mass, involving a multitude of
consumers, each related to more contexts, we evaluate the triggering and spreading of a collective attitude. To
this aim, in this paper, we propose a novel analytical model defining a new concept of critical mass, which
includes centrality measures both in a single layer and in a multilayer social network.
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...irjes
Firefly algorithm is one of the emerging evolutionary approaches for complex and non-linear
optimization problems. It is inspired by natural firefly‟s behavior such as movement of fireflies based on
brightness and by overcoming the constraints such as light absorption, obstacles, distance, etc. In this research,
firefly‟s movement had been simulated computationally to identify the best parameters for spur gear pair by
considering the design and manufacturing constraints. The proposed algorithm was tested with the traditional
design parameters and found the results are at par in less computational time by satisfying the constraints.
The Effect of Orientation of Vortex Generators on Aerodynamic Drag Reduction ...irjes
One of the main reasons for the aerodynamic drag in automotive vehicles is the flow separation
near the vehicle’s rear end. To delay this flow separation, vortex generators are used in recent vehicles. The
vortex generators are commonly used in aircrafts to prevent flow separation. Even though vortex generators
themselves create drag, but they also reduce drag by delaying flow separation at downstream. The overall effect
of vortex generators is more beneficial and proved by experimentation. The effect depends on the shape,size and
orientation of vortex generators. Hence optimized shape with proper orientation is essential for getting better
results.This paper presents the effect of vortex generators at different orientation to the flow field and the
mechanism by which these effects takes place.
An Assessment of The Relationship Between The Availability of Financial Resou...irjes
The availability of financial resources is an important element in impacting the success of a planning
process for an effective physical planning. The extent to which however, they are articulated in the process
remained elusive both in scholarly and public discourse. The objective of this study wastherefore, to examine
the extent to which financial resources affect physical planning. In doing so, the study examinedwhether
financial resources were adequate or not to facilitate planning processes in Paidha. According to the study
findings,budget prioritization and ceilings are still a challenge in Paidha Town Council. This is partly due
limited level of knowledge of physical planning among the officials of Paidha Town Council. As a result, there
were no dedicated budget line for routine inspection of physical development plan compliance and enforcement
tools in Paidha. In conclusion, in addressing uncoordinated patterns of physical development that characterize
Uganda‟s urban centres, a critical starting point ought to be the analysis of physical planning process. The
research of this kind is not only significant to other emerging urban centres facing poor a road network,
mushrooming informal settlements and poor social services including poor pattern of residential and commercial
developments but also to all institutions that are involved in planning these towns. Knowing the extent of need
for financial influences in planning may assist local authorities to take the processes of planning seriously which
will help enhance the sustainable development of emerging urban centres including Paidha.
The Choice of Antenatal Care and Delivery Place in Surabaya (Based on Prefere...irjes
- Person's desire to do a pregnancy examination is determined by the service place that suits the tastes
and facilities owned by it. Until now, the utilization of antenatal care by pregnant women is still low (Mardiana,
2014). The purpose of the study is to analyze factors affecting the utilization of antenatal care and delivery place
in Surabaya city based on the preferences and choice theory.
Type of survey research is cross sectional approach, the population is mothers who have children aged 1-
12 months in Surabaya. The large sample of 250 mothers who have children aged 1-12 months in 2013 is taken
by simple random sampling technique. Variables of the research are the preference elements and steps, choice
elements and steps, utilization of antenatal care and delivery place. Data were collected through questionnaires
and secondary data were then analyzed with descriptive statistics in the form of a frequency distribution, shown
by the schematic diagram.
The result showed that the preference elements and steps showed almost half (42.9%) desire to give birth
in a health care because of information got from someone else, while the choice element and step shows the
bulk (57.1%) of the criteria of delivery place chosen is a safe, comfortable and cheap delivery place, the labor
place which is the main choice most (57.1%) is cheap, comfortable, close.
Conclusion of the research based on the preferences and choice theory can be found three (3) new
theories, they are preferences become choice, preferences do not become choice, choice is preceded by
preferences
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
E132833
1. International Refereed Journal of Engineering and Science (IRJES)
ISSN (Online) 2319-183X, (Print) 2319-1821
Volume 1, Issue 3 (November 2012), PP.28-33
www.irjes.com
Improving Analysis Of Data Mining By Creating Dataset Using
Sql Aggregations
* J.Mahesh , **M.Antony Kumar
* Dean[CSE/IT] P.M.R. Engineering College Chennai-India
Asst.Professor, IT Department, P.M.R. Engineering College Chennai-India
Abstract : In Data mining, an important goal is to generate efficient data. Efficiency and scalability have
always been important con-cerns in the field of data mining. The increased complexity of the task calls for
algorithms that are inherently more expensive. To analyze data efficiently, Data mining systems are widely
using datasets with columns in horizontal tabular layout. Preparing a data set is more complex task in a data
mining project, requires many SQL queries, joining tables and aggregating columns. Conventional RDBMS
usually manage tables with vertical form. Aggregated columns in a horizontal tabular layout returns set of
numbers, instead of one number per row. The system uses one parent table and different child tables, operations
are then performed on the data loaded from multiple tables. PIVOT operator, offered by RDBMS is used to
calculate aggregate operations. PIVOT method is much faster method and offers much scalability. Partitioning
large set of data, obtained from the result of horizontal aggregation, in to homogeneous cluster is important
task in this system. K-means algorithm using SQL is best suited for implementing this operation.
Index terms : PIVOT, SQL, Data Mining, Aggregation
I. INTRODUCTION
Existing SQL aggregate functions present important limitations to compute percentages. This article
proposes two SQL aggregate functions to compute percentages addressing such limitations. The first function
returns one row for each percentage in vertical form like standard SQL aggregations. The second function
returns each set of percentages adding 100% on the same row in horizontal form. These novel aggregate
functions are used as a framework to introduce the concept of percentage queries and to generate efficient SQL
code. Experiments study different percentage query optimization strategies and compare evaluation time of
percentage queries taking advantage of our proposed aggregations against queries using available OLAP
extensions. The proposed percentage aggregations are easy to use, have wide applicability and can be efficiently
evaluated.
II. RELATED WORK
SQL extensions to define aggregate functions for association rule mining. Their optimizations have the
purpose of avoiding joins to express cell formulas, but are not optimized to perform partial transposition for
each group of result rows. Conor Cunningalam [1] proposed an optimization and Execution strategies in an
RDBMS which uses two operators i.e., PIVOT operator on tabular data that exchange rows and columns, enable
data transformations useful in data modelling, data analysis, and data presentation. They can quite easily be
implemented inside a query processor system, much like select, project, and join operator. Such a design
provides opportunities for better performance, both during query optimization and query execution. Pivot is an
extension of Group By with unique restrictions and optimization opportunities, and this makes it very easy to
introduce incrementally on top of existing grouping implementations. H Wang.C.Zaniolo [2] proposed a small
but Complete SQL Extension for data Mining and Data Streams. This technique is a powerful database language
and system that enables users to develop complete data-intensive applications in SQL by writing new aggregates
and table functions in SQL, rather than in procedural languages as in current Object-Relational systems. The
ATLaS system consist of applications including various data mining functions, that have been coded in
ATLaS‟ SQL, and execute with a modest (20–40%) performance overhead with respect to the same
applications written in C/C++. This system can handle continuous queries using the schema and queries in
Query Repository. Sarawagi, S. Thomas, and R. Agrawal [3] proposed integrating association rule mining with
relational database systems. Integrating Association rule mining include several method. Loose - coupling
through a SQL cursor interface is an encapsulation of a mining algorithm in a stored procedure. Second method
is caching the data to a file system on-the-fly and mining tight-coupling using primarily user-defined functions
and SQL implementations for processing in the DBMS. Loose-coupling and Stored-procedure architectures: For
the loose-coupling and Stored-procedure architectures, can use the implementation of the Apriori algorithm for
finding association rules.C. Ordonez [4] proposes an Integration of K-means clustering with a relational DBMS
www.irjes.com 28 | Page
3. Improving Analysis Of Data Mining By Creating Dataset Using Sql Aggregations
Existing system
In existing work, preparing a data set for analysis is generally the most time consuming task in a data
mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL
aggregations have limitations to prepare data sets because they return one column per aggregated group.
Standard aggregations are hard to interpret when there are many result rows, especially when grouping attributes
have high cardinalities. There exist many aggregation functions and operators in SQL. Unfortunately, all these
aggregations have limitations to build data sets for data mining purposes.
IV. PROPOSED SYSTEM
Proposed System
Our proposed horizontal aggregations provide several unique features and advantages. First, they
represent a template to generate SQL code from a data mining tool to build data sets for data mining analysis.
Such SQL code automates writing SQL queries, optimizing them and testing them for correctness. Horizontal
aggregations represent an extended form of traditional SQL aggregations, which return a set of values in a
horizontal layout, instead of a single value per row. Horizontal aggregations preserve evaluation semantics of
standard SQL aggregations. The main difference will be returning a table with a horizontal layout, possibly
having extra nulls
.
V. FUNCTIONAL DIAGRAM
www.irjes.com 30 | Page
4. Improving Analysis Of Data Mining By Creating Dataset Using Sql Aggregations
VI. EXECUTION STRATEGIES IN AGGREGATION
Horizontal aggregations propose a new class of functions that aggregate numeric expressions and the
result are transposed to produce data sets with a horizontal layout. The operation is needed in a number of data
mining tasks, such as unsupervised classification and data summation, as well as segmentation of large
heterogeneous data sets into smaller homogeneous subsets that can be easily managed, separately modelled and
analyzed. To create datasets for data mining related works, efficient and summary of data are needed. For that
this proposed system collect particular needed attributes from the different fact tables and displayed columns in
order to create date in horizontal layout. Main goal is to define a template to generate SQL code combining
aggregation and transposition (pivoting). A second goal is to extend the SELECT statement with a clause that
combines transposition with aggregation. Consider the following GROUP BY query in standard SQL that takes
a subset L1, . . . , Lm from D1, . . ., Dp: SELECT L1, .., Lm, sum (A) FROM F1,F2 GROUP BY L1, Lm; In a
horizontal aggregation there are four input parameters to generate SQL code: 1) The input table F1,F2……,Fn
2) The list of GROUP BY columns L1, . . ., Lj , 3) The column to aggregate (A), 4) The list of transposing
columns R1, . . .,Rk. This aggregation query will produce a wide table with m+1 columns (automatically
determined), with one group for each unique combination of values L1, . . . , Lm and one aggregated value per
group (i.e., sum(A) ). In order to evaluate this query the query optimizer takes three input parameters. First
parameter is the input table F. Second parameter is the list of grouping columns L1, . . . , Lm. And the final
parameter is the column to aggregate (A). Example
In the Fig.1 there is a common field K in F1 and F2.In F2, D2 consist of only two distinct values X and Y and is
used to transpose the table. The aggregate operation is used in this is sum (). The values within D1 are repeated,
1 appears 3 times, for row 3, 4 and, and for row 3 & 4 value of D2 is X & Y. So D2X and D2Y is newly
generated columns in FH.
Fig 1.An example of Horizontal aggregation
www.irjes.com 31 | Page
5. Improving Analysis Of Data Mining By Creating Dataset Using Sql Aggregations
Commonly using Query Evaluation methods in Horizontal aggregation functions [12] are
SPJ method
The SPJ method is based on only relational operators. The basic concept in SPJ method is to build a
table with vertical aggregation for each resultant column. To produce Horizontal aggregation FH system must
join all those tables. There are two sub-strategies to compute Horizontal aggregation .First strategy includes
direct calculation of aggregation from fact table. Second one compute the corresponding vertical aggregation
and store it in temporary table FV grouping by LE1,......,LEi,RI1,......,RIj then FH can be computed from FV.
To get FH system need n left outer join with n+1 tables so that all individual aggregations are properly
assembled as a set of n dimensions for each group. Null should be set as default value for groups with missing
combinations for a particular group.
INSERT INTO FH
SELECT F0 . LE1 , F0. LE2 ,..., F0. LEj,
F1.A, F2 .A,......, Fn .A
FROM F0
LEFT OUTER JOIN F1
ON F0. LE1= F1. LE1 and. . . and F0. LEj= F1. LEj
LEFT OUTER JOIN F2
ON F0. LE1= F2. LE1 and. . . and F0. LEj= F2. LEj
....
LEFT OUTER JOIN Fn
ON F0. LE1= Fn. LE1 and. . . and F0. LEj= Fn. LEj
It is easy to see that left outer join is based on same columns. This strategy basically needs twice I/O operations
by doing updates rather than insertion.
CASE method
In SQL build-in “case” programming construct are available, it returns a selected value rather from a
set of values based on Boolean expression. Queries for FH can be evaluated by performing direct aggregation
form fact table F and at the same time rows are transposing to produce the FH.
SELECT DISTINCT RI1
FROM F;
INSERT INTO FH SELECT LE1,LE2,....,LEj,
V(CASE WHEN RI1=v11 and . . . Rk=vk1 THEN A ELSE null END)
..
, V(CASE WHEN RI1=v1n and . . . Rk=vkn THEN A ELSE null END)
FROM F
GROUP BY LE1,LE2,. . .,LEj
PIVOT method
Pivot transforms a series of rows into a series of fewer numbers of rows with additional columns Data
in one source column is used to determine the new column for a row, and another source column is used as the
data for that new column. The wide form can be considered as a matrix of column values, while the narrow form
is a natural encoding of a sparse matrix In current implementation PIVOT operator is used to calculate the
aggregations. One method to express pivoting uses scalar sub queries. Each pivoted is created through a separate
sub query. PIVOT operator provides a technique to allow rows to columns dynamically at the time of query
compilation and execution.
SELECT *FROM (Bill Table PIVOT (SUM (Amount) for Month in („Jan‟ ,‟Feb‟,‟Mar‟)
This query generate table with jan,feb and mar as column attribute and the sum of the amount of particular
customer that are stored inside the Bill Table. The pivot method is more efficient method than other two
methods. Because the pivot operator internally calculates the aggregation operation and no need to create extra
tables. So operation performed within this method is less compared to other methods.
VII. CONCLUSION
This article proposed two aggregate functions to compute percentages. The first function returns
one row for each Computed percentage and it is called a vertical percentage aggregation. The second
function returns each set of percentages adding 100% on the same row in horizontal form and it is called a
horizontal percentage aggregation. The proposed aggregations are used as a framework to study percentage
queries. Two practical issues when computing vertical percentage queries were identified: missing rows and
division by zero. We discussed alternatives to tackle them. Horizontal percentages do not present the
www.irjes.com 32 | Page
6. Improving Analysis Of Data Mining By Creating Dataset Using Sql Aggregations
missing row issue. We studied how to efficiently evaluate percentage queries with several optimizations
including indexing, computation from partial aggregates, using either row insertion or update to produce the
result table, and reusing vertical percentages to get horizontal percentages. Experiments study percentage
query optimization strategies and compare our proposed percentage aggregations against queries using
OLAP aggregations. Both proposed aggregations are significantly faster than existing OLAP aggregate
functions showing about an order of magnitude improvement.
REFERENCES
[1.] C. Cunningham, G. Graefe, and C.A. GalindoLegaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS.
In Proc. VLDB Conference, pages 998–1009, 2004.
[2.] H. Wang, C. Zaniolo, a006Ed C.R. Luo. ATLaS: A small but complete SQL extension for data mining and data streams. In Proc.
VLDB Conference, pages 1113–1116, 2003.
[3.] S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: alternatives and
implications. In Proc. ACM SIGMOD Conference, pages 343–354, 1998.
[4.] C. Ordonez. Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data
Engineering (TKDE), 18(2):188–201, 2006.
[5.] G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL
databases. In Proc. ACM KDD Conference, pages 204–208, 1998.
[6.] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab
and subtotal. In ICDE Conference, pages 152–159,1996.
[7.] G. Luo, J.F. Naughton, C.J. Ellmann, and M. Watzke. Locking protocols for materialized aggregate join views. IEEE
Transactions on Knowledge and Data engineering (TKDE), 17(6):796–807, 2005.
[8.] C. Ordonez and S. Pitchaimalai. Bayesian classifiers programmed in SQL. IEEE Transactions on Knowledge and Data Engineering
(TKDE), 22(1):139–144, 2010.
[9.] Xiang Lian, Student Member, IEEE, and Lei Chen, General Cost Models for Evaluating Dimensionality Reduction in High-
Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22(1):139–144, 2010.
[10.] C. Ordonez. Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22, 2010.
[11.] C. Ordonez. Vertical and horizontal percentage aggregations. In Proc. ACM IGMOD Conference, pages 866–871, 2004.
[12.] C. C. Ordonez and Zhibo Chen. Horizontal Aggregation in SQL to prepare Data Sets forData Mining Analysis. . IEEE Transactions
on Knowledge and Data Engineering (TKDE), 1041- 4347/11/$26.00 ,2011
www.irjes.com 33 | Page