SlideShare a Scribd company logo
1 of 6
Download to read offline
IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41
www.iosrjournals.org
www.iosrjournals.org 36 | Page
Horizontal Aggregations in SQL to Prepare Data Sets for Data
Mining Analysis
V. Pradeep Kumar1
, Dr. R. V. Krishnaiah2
1
(MTech-SE, D.R.K.Institute of science and technology, Hyderabad, India
2
(Principal, Dept of CSE, D.R.K Institute of science and technology, Hyd, India.
Abstract: Data mining is widely used domain for extracting trends or patterns from historical data. However,
the databases used by enterprises can’t be directly used for data mining. It does mean that Data sets are to be
prepared from real world database to make them suitable for particular data mining operations. However,
preparing datasets for analyzing data is tedious task as it involves many aggregating columns, complex joins,
and SQL queries with sub queries. More over the existing aggregations performed through SQL functions such
as MIN, MAX, COUNT, SUM, AVG return a single value output which is not suitable for making datasets meant
for data mining. In fact these aggregate functions are generating vertical aggregations. This paper presents
techniques to support horizontal aggregations through SQL queries. The result of the queries is the data which
is suitable for data mining operations. It does mean that this paper achieves horizontal aggregations through
some constructs built that includes SQL queries as well. The methods prepared by this paper include CASE, SPJ
and PIVOT. We have developed a prototype application and the empirical results reveal that these constructs
are capable of generating data sets that can be used for further data mining operations.
Index Terms – Aggregations, SQL, data mining, OLAP, and data set generation.s
I. Introduction
RDBMS has become a de facto standard for storing and retrieving large amount of data. This data is
permanently stored and retrieved through front end applications. The applications can use SQL to interact with
relational databases. Preparing databases needs identification of relevant data and then normalizing the tables.
Aggregations are supported by SQL to obtain summary of data. The aggregate functions supported by SQL are
SUM, MIN, MAX, COUNT and AVG. These functions produce single value output. These are known as
vertical aggregations. This is because each function operates on the values of a domain vertically and produces a
single value result. The result of vertical aggregations is useful in calculations or computations. However, they
can’t be directly used in data mining operations further. In fact summary data sets can be prepared and they can
be used further in data mining operations [1]. The summary data can also be used in statistical algorithms [2],
[3]. Most of the data mining operations expect a data set with horizontal layout with many tuples and one
variable or dimension per column. This is the case with many data mining algorithms like PCA, regression,
classification, and clustering [4], [2].
As horizontal aggregations are capable of producing data sets that can be used for real world data
mining activities, this paper presents three horizontal aggregations namely SPJ, PIVOT and CASE. It does mean
that we enhanced these operators that are provided by SQL in one way or the other. The SPJ aggregation is
developed using construct with standard SQL operations. PIVOT makes use of built in pivoting facility in SQL
while the CASE is built based on the SQL CASE construct. We have built a web based prototype application
that demonstrates the effectiveness of these constructs. The empirical results revealed that these operations are
able to produce data sets with horizontal layout that is suitable for OLAP operations or data mining operations.
The motivation behind this work is that developing data sets for data mining operations is very difficult and time
consuming. One problem in this area is that the existing SQL aggregations provide a single row output. This
output is not suitable for data mining operations. For this reason we have extended the functionalities of CASE,
SPJ and PIVOT operators in such a way that they produce data sets with horizontal layouts.
The proposed horizontal aggregations have some unique features and they are very useful. The
advantages include, they provide construct that can generate SQL code that results in dataset which is suitable
for data mining; the SQL code supports automation of writing SQL queries, testing them and optimizing them.
As the proposed constructs are based on SQL, it reduces lot of coding as it is a powerful data retrieval language.
The proposed system is user friendly as users are never expected to write queries. Instead, they just make
queries in a user-friendly fashion. End users who do not know SQL also can make use of the proposed system.
As SQL code is automatically generated based on the operator used in the query, it reduces lot of manual
mistakes. In modern database where data is stored because of day to day operations, users do not get a chance to
use data directly for mining operations. Instead, it has to be transformed to make sense for data mining
operations. Generally data from business database is converted, and loaded into some other data sets known as
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
www.iosrjournals.org 37 | Page
datawarehouse. The proposed horizontal aggregations can be used to generate data sets for the purpose of data
mining analysis. The organization of the rest of this paper includes section2, and 3 providing definitions and
horizontal aggregations respectively. Experiments are presented in section 4, and section 5 compares our work
with existing work in the same area. Section 6 provides conclusions
II. Related Work
SQL has been around since its inception and being used widely for interacting with relational databases
both for storing and retrieving data. The SQL provides all kinds of constructs such as projections, selections,
aggregations, joins and sub queries. Query optimization and using the result of query further is an essential task
in database operations. As part of queries, aggregations are used to get summary of data. Aggregate functions
such as SUM, MIN, MAX, COUNT, and AVG are used for obtaining summary of data [5]. These aggregations
produce a single value output and can’t provide data in horizontal layout which can be used for data mining
operations. In other words, the vertical aggregations can’t produce data sets for data mining. Association rule
mining is one of the problems pertaining to OLAP processing [6]. SQL aggregate functions are extended for the
purpose of association rule mining in [7]. The aim of this is to support data mining operations efficiently. The
drawback of this is that it is not capable of producing results in tabular format with horizontal layout convenient
for data mining operations. In [5] a clustering algorithm is explored which makes use of SQL queries internally.
It is capable of showing horizontal layout for further mining operations. For performing spreadsheet like
operations, alternative SQL extensions are proposed in [8]. They have optimizations too for joins and they do
not have optimizations for partial transposition of resultant groups. Joins can be avoided using CASE and
PIVOT constructs. Traditional relational algebra [9] has to be adapted to generate new class of aggregations
known as horizontal aggregations for generating data sets for data mining operations. This is the focus of our
work. The problem of optimizing outer joins is presented in [10]. However, it is not suitable for large queries.
Traditional query optimizations [11] use tree-based plans for optimization. This is similar to SPJ method. CASE
is also used with SQL optimizations. PIVOT in sql is used for pivoting results. Lot of research has been around
on aggregations and optimizations of SQL operations. They also include cross tabulation and explored much in
[12] in case of cube queries. Unpivoting relational tables is also explored in [13] where each input row is used
to calculate the decision trees. The result contains multiple rows with attribute – value pairs that behave like an
inverse operator for horizontal aggregations. Many SQL operators are available to transform data from one
format to another format [14]. The TRANSPOSE operator is similar to unpivot operator which produces many
rows for each input row. TRANSPOSE can reduce the number of operations when compared with PIVOT.
These two are having inverse relationship as the results are proving this. For data mining operations that
produce decision trees, vertical aggregations can be used while the horizontal aggregations produce more
convenient horizontal layout that is best suited for data mining operations. In SQL Server [15] both pivot and
unpivot operations are made available.
Horizontal aggregations are explored to some extent in [16] and [17] with some limitations. It does
mean that the result of these can’t be efficiently used for further data mining operations. The proposed
horizontal aggregations are different from the built in aggregations that come with SQL. Our operators such as
CASE, PIVOT and SPJ are extensions to corresponding SQL operators. For instance CASE is our programming
construct that is based on the CASE of SQL; PIVOT is our programming construct that is based on SQL
pivoting operation; and the SPJ construct is built using standard SQL queries only.
III. Horizontal Aggregations
For describing the methods pertaining to the proposed horizontal aggregations such as PIVOT, CASE
and SPJ, input table as shown in Fig. 1 (a), traditional vertical sum aggregation as shown in Fig 1 (b) and
horizontal aggregation as shown in Fig. 1 (c) are considered.
Fig. 1 – Input table (a), traditional vertical aggregation (b), and horizontal aggregation (c)
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
www.iosrjournals.org 38 | Page
As can be seen in fig. 1, input table has some sample data. Traditional vertical sum aggregations are presented in
(b) which is the result of SQL SUM function while (c) holds the horizontal aggregation which is the result of
SUM function.
IV. Steps Used in All Methods
Fig. 2 shows steps on all methods based on input table
As can be seen in fig. 2, for all methods such as SPJ, CASE and PIVOT steps are given. For every
method the procedure starts with SELECT query. Afterwards, corresponding operator through underlying
construct is applied. Then horizontal aggregation is computed.
Fig. 3 shows steps on all methods based on table containing results of vertical aggregations
As can be seen in fig. 2, for all methods such as SPJ, CASE and PIVOT steps are given. For every
method the procedure starts with SELECT query. Afterwards, corresponding operator through underlying
construct is applied. Then horizontal aggregation is computed.
IV.1 SPJ Method
This method is based on the relational operators only. In this method one table is created with vertical
aggregation for each column. Then all such tables are joined in order to generate a tbale containing horizontal
aggregations. The actual implementation is based on the details given in [18].
IV.2 PIVOT Method
This aggregation is based on the PIVOT operator available in RDBMS. As it can provide
transpositions, it can be used to evaluate horizontal aggregations. PIVOT operator determines how many
columns are required to hold transpose and it can be combined with GROUP BY clause.
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
www.iosrjournals.org 39 | Page
Listing 1 – Shows optimized instructions for PIVOT construct
As can be seen in listing1, the optimized query projects only the columns that participate in
computation of horizontal aggregations.
IV.3 CASE Method
This construct is built based on the existing CASE struct of SQL. Based on Boolean expression one of
the results is returned by CASE construct. It is same as projection/aggregation query from relational point of
view. Based on some conjunction of conditions each non key value is given by a function. Here two basic
strategies to compute horizontal aggregations. The first strategy is to compute directly from input table. The
second approach is to compute vertical aggregation and save the results into temporary table. Then that table is
further used to compute horizontal aggregations. The actual implementation is based on the details given in [18].
V. Experimental Evaluation
The environment used for the development of prototype web based application that demonstrates the
three horizontal aggregations include Visual Studio 2010, and SQL Server 2008. The former is used to build
front end application with web based interface while the latter is used to store data permanently. The
technologies used include ASP.NET which is meant for developing web services and web applications, and
AJAX (Asynchronous JavaScript and XML) for rich user experience. Programming language used in C# which
is an object oriented high level programming language. The result of horizontal aggregation SPJ is shown in fig.
1.
Fig. 1 – Results of SJP aggregation
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
www.iosrjournals.org 40 | Page
As can be seen in fig. 1, the results are through the SPJ operation that results in data in horizontal layout. Data in
this layout can be considered as data set that can be used for further data mining operations.
Fig. 2 – Result of Pivoting Aggregation
As can be seen in fig. 2, the results are through the PIVOT operation that results in data in horizontal layout.
Data in this layout can be considered as data set that can be used for further data mining operations.
Fig. 3 – Result of CASE Aggregation
As can be seen in fig. 2, the results are through the CASE operation that results in data in horizontal layout. Data
in this layout can be considered as data set that can be used for further data mining operations.
VI. Conclusions
In this paper we extended three aggregate functions such as CASE, SPJ and PIVOT. These are known
as horizontal aggregations. We have achieved it by writing underlying constructs for each operator. When they
are used, internally the corresponding construct gets executed and the resultant data set is meant for OLAP
(Online Analytical Processing). Vertical aggregations such as SUM, MIN, MAX, COUNT, and AVG return a
single value output. However, that output can’t be used for data mining operations. In order to prepare real
world datasets that are very much suitable for data mining operations, we explored horizontal aggregations by
developing constructs in the form of operators such as CASE, SPJ and PIVOT. Instead of single value, the
horizontal aggregations return a set of values in the form of a row. The result resembles a multidimensional
vector. We have implemented SPJ using standard relational query operations. The CASE construct is developed
extending SQL CASE. The PIVOT makes use of built in operator provided by RDBMS for pivoting data. To
evaluate these operators, we have developed a web based prototype application and results reveal that the
proposed horizontal aggregations are capable of preparing data sets for real world data mining operations.
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
www.iosrjournals.org 41 | Page
References
[1] C. Ordonez, “Data Set Preprocessing and Transformation in a Database System,” Intelligent Data Analysis, vol. 15, no. 4, pp. 613-
631, 2011.
[2] C. Ordonez, “Statistical Model Computation with UDFs,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 12, pp. 1752-1765,
Dec. 2010.
[3] C. Ordonez and S. Pitchaimalai, “Bayesian Classifiers Programmed in SQL,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1,
pp. 139-144, Jan. 2010.
[4] J. Han and M. Kamber, Data Mining: Concepts and Techniques, first ed. Morgan Kaufmann, 2001.
[5] C. Ordonez, “Integrating K-Means Clustering with a Relational DBMS Using SQL,” IEEE Trans. Knowledge and Data Eng., vol.
18, no. 2, pp. 188-201, Feb. 2006.
[6] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Relational Database Systems: Alternatives and
Implications,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’98), pp. 343-354, 1998.
[7] H. Wang, C. Zaniolo, and C.R. Luo, “ATLAS: A Small But Complete SQL Extension for Data Mining and Data Streams,” Proc.
29th Int’l Conf. Very Large Data Bases (VLDB ’03), pp. 1113- 1116, 2003.
[8] A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N. Folkert, A. Gupta, L. Sheng, and S. Subramanian, “Spreadsheets in
RDBMS for OLAP,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03), pp. 52-63, 2003.
[9] H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book, first ed. Prentice Hall, 2001.
[10] C. Galindo-Legaria and A. Rosenthal, “Outer Join Simplification and Reordering for Query Optimization,” ACM Trans. Database
Systems, vol. 22, no. 1, pp. 43-73, 1997.
[11] G. Bhargava, P. Goel, and B.R. Iyer, “Hypergraph Based Reorderings of Outer Join Queries with Complex Predicates,” Proc. ACM
SIGMOD Int’l Conf. Management of Data (SIGMOD ’95), pp. 304-315, 1995.
[12] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-
Tab and Sub-Total,” Proc. Int’l Conf. Data Eng., pp. 152-159, 1996.
[13] G. Graefe, U. Fayyad, and S. Chaudhuri, “On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL
Databases,” Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD ’98), pp. 204-208, 1998.
[14] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, “Non- Stop SQL/MX Primitives for Knowledge Discovery,” Proc.
ACM SIGKDD Fifth Int’l Conf. Knowledge Discovery and Data Mining (KDD ’99), pp. 425-429, 1999.
[15] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria, “PIVOT and UNPIVOT: Optimization and Execution Strategies in an
RDBMS,” Proc. 13th Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 998-1009, 2004.
[16] C. Ordonez, “Horizontal Aggregations for Building Tabular Data Sets,” Proc. Ninth ACM SIGMOD Workshop Data Mining and
Knowledge Discovery (DMKD ’04), pp. 35-42, 2004.
[17] C. Ordonez, “Vertical and Horizontal Percentage Aggregations,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD
’04), pp. 866-871, 2004.
[18] Carlos Ordonez and Zhibo Chen, “Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis”, IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 4, APRIL 2012.
AUTHORS
V.Pradeep kumar is student of DRK college of engineering and technology, Hyderabad,
AP, INDIA. He has received B.Tech information technology M.Tech Degree in software
engineering. His main research interest includes data mining. Software Engineering.
Dr. R.V. Krishnaiah is working as Principal at DRK Institute of Science & Technology,
Hyderabad, AP, and India. He has received M.Tech Degree EIE and CSE. His Main
Research interest includes Data Mining, Software Engineering.

More Related Content

What's hot

Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSyed Hadoop
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostDatabricks
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Mark Kromer
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoDB Database
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingnurmeen1
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vjhomeworkping4
 
Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1Amit Sharma
 
Redirected Recovery of Recovery Expert for DB2 on z/OS
Redirected Recovery of Recovery Expert for DB2 on z/OSRedirected Recovery of Recovery Expert for DB2 on z/OS
Redirected Recovery of Recovery Expert for DB2 on z/OSBharath Nunepalli
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryMark Kromer
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Revamp the tablespace reorg process with ibm db2 automation tool
Revamp the tablespace reorg process with ibm db2 automation toolRevamp the tablespace reorg process with ibm db2 automation tool
Revamp the tablespace reorg process with ibm db2 automation toolBharath Nunepalli
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology Nikhatfatima16
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFMark Kromer
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 
Harnessing the power of both worlds
Harnessing the power of both worldsHarnessing the power of both worlds
Harnessing the power of both worldsKaran Gulati
 

What's hot (20)

Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 
OLAP
OLAPOLAP
OLAP
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
 
Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1
 
Redirected Recovery of Recovery Expert for DB2 on z/OS
Redirected Recovery of Recovery Expert for DB2 on z/OSRedirected Recovery of Recovery Expert for DB2 on z/OS
Redirected Recovery of Recovery Expert for DB2 on z/OS
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power Query
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Revamp the tablespace reorg process with ibm db2 automation tool
Revamp the tablespace reorg process with ibm db2 automation toolRevamp the tablespace reorg process with ibm db2 automation tool
Revamp the tablespace reorg process with ibm db2 automation tool
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADF
 
OLAP
OLAPOLAP
OLAP
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Harnessing the power of both worlds
Harnessing the power of both worldsHarnessing the power of both worlds
Harnessing the power of both worlds
 

Viewers also liked

International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)Odyssey Recruitment
 
Herramientas para estimular la creatividad y la innovación
Herramientas para estimular la creatividad y la innovaciónHerramientas para estimular la creatividad y la innovación
Herramientas para estimular la creatividad y la innovaciónGustavo Martin
 
Modeling and Containment of Uniform Scanning Worms
Modeling and Containment of Uniform Scanning WormsModeling and Containment of Uniform Scanning Worms
Modeling and Containment of Uniform Scanning WormsIOSR Journals
 
Evaluation of Uncertain Location
Evaluation of Uncertain Location Evaluation of Uncertain Location
Evaluation of Uncertain Location IOSR Journals
 
Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...
Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...
Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...IOSR Journals
 
пьеса а.м. горького на дне
пьеса а.м. горького на днепьеса а.м. горького на дне
пьеса а.м. горького на днеNatalya Dyrda
 
Quick Identification of Stego Signatures in Images Using Suspicion Value (Sp...
Quick Identification of Stego Signatures in Images Using  Suspicion Value (Sp...Quick Identification of Stego Signatures in Images Using  Suspicion Value (Sp...
Quick Identification of Stego Signatures in Images Using Suspicion Value (Sp...IOSR Journals
 
A Novel Revolutionary highly secured Object authentication schema
A Novel Revolutionary highly secured Object authentication  schemaA Novel Revolutionary highly secured Object authentication  schema
A Novel Revolutionary highly secured Object authentication schemaIOSR Journals
 
Presentazione dogsinthecity
Presentazione dogsinthecityPresentazione dogsinthecity
Presentazione dogsinthecitylucaminetti
 
Planning for Planning
Planning for PlanningPlanning for Planning
Planning for PlanningKeely Galgano
 
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...Thomas Gottron
 

Viewers also liked (20)

International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)
 
Herramientas para estimular la creatividad y la innovación
Herramientas para estimular la creatividad y la innovaciónHerramientas para estimular la creatividad y la innovación
Herramientas para estimular la creatividad y la innovación
 
Modeling and Containment of Uniform Scanning Worms
Modeling and Containment of Uniform Scanning WormsModeling and Containment of Uniform Scanning Worms
Modeling and Containment of Uniform Scanning Worms
 
Evaluation of Uncertain Location
Evaluation of Uncertain Location Evaluation of Uncertain Location
Evaluation of Uncertain Location
 
D0961927
D0961927D0961927
D0961927
 
Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...
Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...
Combining both Plug-in Vehicles and Renewable Energy Resources for Unit Commi...
 
OFIPA
OFIPAOFIPA
OFIPA
 
пьеса а.м. горького на дне
пьеса а.м. горького на днепьеса а.м. горького на дне
пьеса а.м. горького на дне
 
Quick Identification of Stego Signatures in Images Using Suspicion Value (Sp...
Quick Identification of Stego Signatures in Images Using  Suspicion Value (Sp...Quick Identification of Stego Signatures in Images Using  Suspicion Value (Sp...
Quick Identification of Stego Signatures in Images Using Suspicion Value (Sp...
 
A Novel Revolutionary highly secured Object authentication schema
A Novel Revolutionary highly secured Object authentication  schemaA Novel Revolutionary highly secured Object authentication  schema
A Novel Revolutionary highly secured Object authentication schema
 
Vis van bij ons
Vis van bij onsVis van bij ons
Vis van bij ons
 
Pendidikan kewarganegaraan
Pendidikan kewarganegaraanPendidikan kewarganegaraan
Pendidikan kewarganegaraan
 
How to Create Little Planets
How to Create Little PlanetsHow to Create Little Planets
How to Create Little Planets
 
Presentazione dogsinthecity
Presentazione dogsinthecityPresentazione dogsinthecity
Presentazione dogsinthecity
 
Virus Informaticos
Virus InformaticosVirus Informaticos
Virus Informaticos
 
C0431320
C0431320C0431320
C0431320
 
Planning for Planning
Planning for PlanningPlanning for Planning
Planning for Planning
 
8 hw cells_organisation
8 hw cells_organisation8 hw cells_organisation
8 hw cells_organisation
 
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
 
A0310106
A0310106A0310106
A0310106
 

Similar to Generate Datasets for Data Mining Using SQL Horizontal Aggregations

Efficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP CubeEfficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP CubeIRJET Journal
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
 
AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...
AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...
AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...IRJET Journal
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiOllieShoresna
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudIJAAS Team
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkMahantesh Angadi
 
Technical Research Document - Anurag
Technical Research Document - AnuragTechnical Research Document - Anurag
Technical Research Document - Anuraganuragrajandekar
 
ArunAndGangadhar_OLaaS_v4
ArunAndGangadhar_OLaaS_v4ArunAndGangadhar_OLaaS_v4
ArunAndGangadhar_OLaaS_v4Arun Patil
 
Sql interview question part 5
Sql interview question part 5Sql interview question part 5
Sql interview question part 5kaashiv1
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
Sql interview question part 7
Sql interview question part 7Sql interview question part 7
Sql interview question part 7kaashiv1
 

Similar to Generate Datasets for Data Mining Using SQL Horizontal Aggregations (20)

Query Optimization for Big Data Analytics
Query Optimization for Big Data AnalyticsQuery Optimization for Big Data Analytics
Query Optimization for Big Data Analytics
 
Efficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP CubeEfficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP Cube
 
Oracle
OracleOracle
Oracle
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
Sql good practices
Sql good practicesSql good practices
Sql good practices
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...
AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...
AUTOMATIC TRANSFER OF DATA USING SERVICE-ORIENTED ARCHITECTURE TO NoSQL DATAB...
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
Ebook11
Ebook11Ebook11
Ebook11
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
Technical Research Document - Anurag
Technical Research Document - AnuragTechnical Research Document - Anurag
Technical Research Document - Anurag
 
ArunAndGangadhar_OLaaS_v4
ArunAndGangadhar_OLaaS_v4ArunAndGangadhar_OLaaS_v4
ArunAndGangadhar_OLaaS_v4
 
Ebook5
Ebook5Ebook5
Ebook5
 
Sql interview question part 5
Sql interview question part 5Sql interview question part 5
Sql interview question part 5
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Ebook7
Ebook7Ebook7
Ebook7
 
Sql interview question part 7
Sql interview question part 7Sql interview question part 7
Sql interview question part 7
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Generate Datasets for Data Mining Using SQL Horizontal Aggregations

  • 1. IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 www.iosrjournals.org www.iosrjournals.org 36 | Page Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis V. Pradeep Kumar1 , Dr. R. V. Krishnaiah2 1 (MTech-SE, D.R.K.Institute of science and technology, Hyderabad, India 2 (Principal, Dept of CSE, D.R.K Institute of science and technology, Hyd, India. Abstract: Data mining is widely used domain for extracting trends or patterns from historical data. However, the databases used by enterprises can’t be directly used for data mining. It does mean that Data sets are to be prepared from real world database to make them suitable for particular data mining operations. However, preparing datasets for analyzing data is tedious task as it involves many aggregating columns, complex joins, and SQL queries with sub queries. More over the existing aggregations performed through SQL functions such as MIN, MAX, COUNT, SUM, AVG return a single value output which is not suitable for making datasets meant for data mining. In fact these aggregate functions are generating vertical aggregations. This paper presents techniques to support horizontal aggregations through SQL queries. The result of the queries is the data which is suitable for data mining operations. It does mean that this paper achieves horizontal aggregations through some constructs built that includes SQL queries as well. The methods prepared by this paper include CASE, SPJ and PIVOT. We have developed a prototype application and the empirical results reveal that these constructs are capable of generating data sets that can be used for further data mining operations. Index Terms – Aggregations, SQL, data mining, OLAP, and data set generation.s I. Introduction RDBMS has become a de facto standard for storing and retrieving large amount of data. This data is permanently stored and retrieved through front end applications. The applications can use SQL to interact with relational databases. Preparing databases needs identification of relevant data and then normalizing the tables. Aggregations are supported by SQL to obtain summary of data. The aggregate functions supported by SQL are SUM, MIN, MAX, COUNT and AVG. These functions produce single value output. These are known as vertical aggregations. This is because each function operates on the values of a domain vertically and produces a single value result. The result of vertical aggregations is useful in calculations or computations. However, they can’t be directly used in data mining operations further. In fact summary data sets can be prepared and they can be used further in data mining operations [1]. The summary data can also be used in statistical algorithms [2], [3]. Most of the data mining operations expect a data set with horizontal layout with many tuples and one variable or dimension per column. This is the case with many data mining algorithms like PCA, regression, classification, and clustering [4], [2]. As horizontal aggregations are capable of producing data sets that can be used for real world data mining activities, this paper presents three horizontal aggregations namely SPJ, PIVOT and CASE. It does mean that we enhanced these operators that are provided by SQL in one way or the other. The SPJ aggregation is developed using construct with standard SQL operations. PIVOT makes use of built in pivoting facility in SQL while the CASE is built based on the SQL CASE construct. We have built a web based prototype application that demonstrates the effectiveness of these constructs. The empirical results revealed that these operations are able to produce data sets with horizontal layout that is suitable for OLAP operations or data mining operations. The motivation behind this work is that developing data sets for data mining operations is very difficult and time consuming. One problem in this area is that the existing SQL aggregations provide a single row output. This output is not suitable for data mining operations. For this reason we have extended the functionalities of CASE, SPJ and PIVOT operators in such a way that they produce data sets with horizontal layouts. The proposed horizontal aggregations have some unique features and they are very useful. The advantages include, they provide construct that can generate SQL code that results in dataset which is suitable for data mining; the SQL code supports automation of writing SQL queries, testing them and optimizing them. As the proposed constructs are based on SQL, it reduces lot of coding as it is a powerful data retrieval language. The proposed system is user friendly as users are never expected to write queries. Instead, they just make queries in a user-friendly fashion. End users who do not know SQL also can make use of the proposed system. As SQL code is automatically generated based on the operator used in the query, it reduces lot of manual mistakes. In modern database where data is stored because of day to day operations, users do not get a chance to use data directly for mining operations. Instead, it has to be transformed to make sense for data mining operations. Generally data from business database is converted, and loaded into some other data sets known as
  • 2. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis www.iosrjournals.org 37 | Page datawarehouse. The proposed horizontal aggregations can be used to generate data sets for the purpose of data mining analysis. The organization of the rest of this paper includes section2, and 3 providing definitions and horizontal aggregations respectively. Experiments are presented in section 4, and section 5 compares our work with existing work in the same area. Section 6 provides conclusions II. Related Work SQL has been around since its inception and being used widely for interacting with relational databases both for storing and retrieving data. The SQL provides all kinds of constructs such as projections, selections, aggregations, joins and sub queries. Query optimization and using the result of query further is an essential task in database operations. As part of queries, aggregations are used to get summary of data. Aggregate functions such as SUM, MIN, MAX, COUNT, and AVG are used for obtaining summary of data [5]. These aggregations produce a single value output and can’t provide data in horizontal layout which can be used for data mining operations. In other words, the vertical aggregations can’t produce data sets for data mining. Association rule mining is one of the problems pertaining to OLAP processing [6]. SQL aggregate functions are extended for the purpose of association rule mining in [7]. The aim of this is to support data mining operations efficiently. The drawback of this is that it is not capable of producing results in tabular format with horizontal layout convenient for data mining operations. In [5] a clustering algorithm is explored which makes use of SQL queries internally. It is capable of showing horizontal layout for further mining operations. For performing spreadsheet like operations, alternative SQL extensions are proposed in [8]. They have optimizations too for joins and they do not have optimizations for partial transposition of resultant groups. Joins can be avoided using CASE and PIVOT constructs. Traditional relational algebra [9] has to be adapted to generate new class of aggregations known as horizontal aggregations for generating data sets for data mining operations. This is the focus of our work. The problem of optimizing outer joins is presented in [10]. However, it is not suitable for large queries. Traditional query optimizations [11] use tree-based plans for optimization. This is similar to SPJ method. CASE is also used with SQL optimizations. PIVOT in sql is used for pivoting results. Lot of research has been around on aggregations and optimizations of SQL operations. They also include cross tabulation and explored much in [12] in case of cube queries. Unpivoting relational tables is also explored in [13] where each input row is used to calculate the decision trees. The result contains multiple rows with attribute – value pairs that behave like an inverse operator for horizontal aggregations. Many SQL operators are available to transform data from one format to another format [14]. The TRANSPOSE operator is similar to unpivot operator which produces many rows for each input row. TRANSPOSE can reduce the number of operations when compared with PIVOT. These two are having inverse relationship as the results are proving this. For data mining operations that produce decision trees, vertical aggregations can be used while the horizontal aggregations produce more convenient horizontal layout that is best suited for data mining operations. In SQL Server [15] both pivot and unpivot operations are made available. Horizontal aggregations are explored to some extent in [16] and [17] with some limitations. It does mean that the result of these can’t be efficiently used for further data mining operations. The proposed horizontal aggregations are different from the built in aggregations that come with SQL. Our operators such as CASE, PIVOT and SPJ are extensions to corresponding SQL operators. For instance CASE is our programming construct that is based on the CASE of SQL; PIVOT is our programming construct that is based on SQL pivoting operation; and the SPJ construct is built using standard SQL queries only. III. Horizontal Aggregations For describing the methods pertaining to the proposed horizontal aggregations such as PIVOT, CASE and SPJ, input table as shown in Fig. 1 (a), traditional vertical sum aggregation as shown in Fig 1 (b) and horizontal aggregation as shown in Fig. 1 (c) are considered. Fig. 1 – Input table (a), traditional vertical aggregation (b), and horizontal aggregation (c)
  • 3. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis www.iosrjournals.org 38 | Page As can be seen in fig. 1, input table has some sample data. Traditional vertical sum aggregations are presented in (b) which is the result of SQL SUM function while (c) holds the horizontal aggregation which is the result of SUM function. IV. Steps Used in All Methods Fig. 2 shows steps on all methods based on input table As can be seen in fig. 2, for all methods such as SPJ, CASE and PIVOT steps are given. For every method the procedure starts with SELECT query. Afterwards, corresponding operator through underlying construct is applied. Then horizontal aggregation is computed. Fig. 3 shows steps on all methods based on table containing results of vertical aggregations As can be seen in fig. 2, for all methods such as SPJ, CASE and PIVOT steps are given. For every method the procedure starts with SELECT query. Afterwards, corresponding operator through underlying construct is applied. Then horizontal aggregation is computed. IV.1 SPJ Method This method is based on the relational operators only. In this method one table is created with vertical aggregation for each column. Then all such tables are joined in order to generate a tbale containing horizontal aggregations. The actual implementation is based on the details given in [18]. IV.2 PIVOT Method This aggregation is based on the PIVOT operator available in RDBMS. As it can provide transpositions, it can be used to evaluate horizontal aggregations. PIVOT operator determines how many columns are required to hold transpose and it can be combined with GROUP BY clause.
  • 4. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis www.iosrjournals.org 39 | Page Listing 1 – Shows optimized instructions for PIVOT construct As can be seen in listing1, the optimized query projects only the columns that participate in computation of horizontal aggregations. IV.3 CASE Method This construct is built based on the existing CASE struct of SQL. Based on Boolean expression one of the results is returned by CASE construct. It is same as projection/aggregation query from relational point of view. Based on some conjunction of conditions each non key value is given by a function. Here two basic strategies to compute horizontal aggregations. The first strategy is to compute directly from input table. The second approach is to compute vertical aggregation and save the results into temporary table. Then that table is further used to compute horizontal aggregations. The actual implementation is based on the details given in [18]. V. Experimental Evaluation The environment used for the development of prototype web based application that demonstrates the three horizontal aggregations include Visual Studio 2010, and SQL Server 2008. The former is used to build front end application with web based interface while the latter is used to store data permanently. The technologies used include ASP.NET which is meant for developing web services and web applications, and AJAX (Asynchronous JavaScript and XML) for rich user experience. Programming language used in C# which is an object oriented high level programming language. The result of horizontal aggregation SPJ is shown in fig. 1. Fig. 1 – Results of SJP aggregation
  • 5. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis www.iosrjournals.org 40 | Page As can be seen in fig. 1, the results are through the SPJ operation that results in data in horizontal layout. Data in this layout can be considered as data set that can be used for further data mining operations. Fig. 2 – Result of Pivoting Aggregation As can be seen in fig. 2, the results are through the PIVOT operation that results in data in horizontal layout. Data in this layout can be considered as data set that can be used for further data mining operations. Fig. 3 – Result of CASE Aggregation As can be seen in fig. 2, the results are through the CASE operation that results in data in horizontal layout. Data in this layout can be considered as data set that can be used for further data mining operations. VI. Conclusions In this paper we extended three aggregate functions such as CASE, SPJ and PIVOT. These are known as horizontal aggregations. We have achieved it by writing underlying constructs for each operator. When they are used, internally the corresponding construct gets executed and the resultant data set is meant for OLAP (Online Analytical Processing). Vertical aggregations such as SUM, MIN, MAX, COUNT, and AVG return a single value output. However, that output can’t be used for data mining operations. In order to prepare real world datasets that are very much suitable for data mining operations, we explored horizontal aggregations by developing constructs in the form of operators such as CASE, SPJ and PIVOT. Instead of single value, the horizontal aggregations return a set of values in the form of a row. The result resembles a multidimensional vector. We have implemented SPJ using standard relational query operations. The CASE construct is developed extending SQL CASE. The PIVOT makes use of built in operator provided by RDBMS for pivoting data. To evaluate these operators, we have developed a web based prototype application and results reveal that the proposed horizontal aggregations are capable of preparing data sets for real world data mining operations.
  • 6. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis www.iosrjournals.org 41 | Page References [1] C. Ordonez, “Data Set Preprocessing and Transformation in a Database System,” Intelligent Data Analysis, vol. 15, no. 4, pp. 613- 631, 2011. [2] C. Ordonez, “Statistical Model Computation with UDFs,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 12, pp. 1752-1765, Dec. 2010. [3] C. Ordonez and S. Pitchaimalai, “Bayesian Classifiers Programmed in SQL,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1, pp. 139-144, Jan. 2010. [4] J. Han and M. Kamber, Data Mining: Concepts and Techniques, first ed. Morgan Kaufmann, 2001. [5] C. Ordonez, “Integrating K-Means Clustering with a Relational DBMS Using SQL,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 2, pp. 188-201, Feb. 2006. [6] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’98), pp. 343-354, 1998. [7] H. Wang, C. Zaniolo, and C.R. Luo, “ATLAS: A Small But Complete SQL Extension for Data Mining and Data Streams,” Proc. 29th Int’l Conf. Very Large Data Bases (VLDB ’03), pp. 1113- 1116, 2003. [8] A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N. Folkert, A. Gupta, L. Sheng, and S. Subramanian, “Spreadsheets in RDBMS for OLAP,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03), pp. 52-63, 2003. [9] H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book, first ed. Prentice Hall, 2001. [10] C. Galindo-Legaria and A. Rosenthal, “Outer Join Simplification and Reordering for Query Optimization,” ACM Trans. Database Systems, vol. 22, no. 1, pp. 43-73, 1997. [11] G. Bhargava, P. Goel, and B.R. Iyer, “Hypergraph Based Reorderings of Outer Join Queries with Complex Predicates,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’95), pp. 304-315, 1995. [12] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross- Tab and Sub-Total,” Proc. Int’l Conf. Data Eng., pp. 152-159, 1996. [13] G. Graefe, U. Fayyad, and S. Chaudhuri, “On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases,” Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD ’98), pp. 204-208, 1998. [14] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, “Non- Stop SQL/MX Primitives for Knowledge Discovery,” Proc. ACM SIGKDD Fifth Int’l Conf. Knowledge Discovery and Data Mining (KDD ’99), pp. 425-429, 1999. [15] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria, “PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS,” Proc. 13th Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 998-1009, 2004. [16] C. Ordonez, “Horizontal Aggregations for Building Tabular Data Sets,” Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD ’04), pp. 35-42, 2004. [17] C. Ordonez, “Vertical and Horizontal Percentage Aggregations,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’04), pp. 866-871, 2004. [18] Carlos Ordonez and Zhibo Chen, “Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 4, APRIL 2012. AUTHORS V.Pradeep kumar is student of DRK college of engineering and technology, Hyderabad, AP, INDIA. He has received B.Tech information technology M.Tech Degree in software engineering. His main research interest includes data mining. Software Engineering. Dr. R.V. Krishnaiah is working as Principal at DRK Institute of Science & Technology, Hyderabad, AP, and India. He has received M.Tech Degree EIE and CSE. His Main Research interest includes Data Mining, Software Engineering.