Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Jasna S; Manu J Pillai

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Assessing LLMs as Cognitive Interpreters of Student Prompts: A Typological Framework

Tadeu da Ponte Matevz Vremec Matej Mertik

Random Articles

Reseach Article

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

by Jasna S, Manu J Pillai

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 86 - Number 13

Year of Publication: 2014

Authors: Jasna S, Manu J Pillai

10.5120/15047-3415

Jasna S, Manu J Pillai . Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL. International Journal of Computer Applications. 86, 13 ( January 2014), 32-36. DOI=10.5120/15047-3415

@article{ 10.5120/15047-3415,

author = { Jasna S, Manu J Pillai },

title = { Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL },

journal = { International Journal of Computer Applications },

issue_date = { January 2014 },

volume = { 86 },

number = { 13 },

month = { January },

year = { 2014 },

issn = { 0975-8887 },

pages = { 32-36 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume86/number13/15047-3415/ },

doi = { 10.5120/15047-3415 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:04:08.657328+05:30

%A Jasna S

%A Manu J Pillai

%T Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

%J International Journal of Computer Applications

%@ 0975-8887

%V 86

%N 13

%P 32-36

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

A huge amount of time is needed for making the dataset for the data mining analysis because data mining practitioners required to write complex SQL queries and many tables are to be joined to get the aggregated result. The traditional SQL aggregations prepare the data sets in vertical layout that is; they return result on one column per aggregated group. But for the data mining project, the data set to be required in horizontal layout. In order to transform the data into suitable form the existing three horizontal aggregation methods are used. The existing method for evaluating horizontal aggregation are SPJ (select, project, join) method, CASE method and PIVOT method. The analysis become more efficient if the dataset obtained is in the horizontal form. The main aim is to identify the most efficient method from these three methods in terms of time and space complexity. So these methods are compared using large tables and identified that the CASE method is more efficient than SPJ and PIVOT method.

References

H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 1st edition, 2001.
C. Ordonez Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(2):188-201, 2006.
C. Galindo-Legaria and A. Rosenthal. Outer join simplification and reordering for query optimization. ACM TODS, 22(1):43. 73, 1997.
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross- tab and subtotal. In ICDE Conference, pages 152. 159, 1996.
G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In proc. ACM KDD Conference, pages 204. 208, 1998.
J. Clear, D. Dunn, B. Harvey, M. L. Heytens, and P. Lohman. Non-stop SQL/MX primitives for knowledge discovery. In ACM KDD Conference, pages 425. 429, 1999.
C. Ordonez. Vertical and horizontal percentage aggregations. In Proc. ACM SIGMOD Conference, pages 866. 871, 2004.
C. Cunningham, G. Graefe, and C. A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages 998. 1009, 2004.
C. Ordonez. Horizontal aggregations for building tabular data sets. In Proc. ACM SIGMOD Data Mining and Knowledge Discovery Workshop, pages 35. 42, 2004.
C. Ordonez, Zhibo Chen. Horizontal aggregations in SQL to prepare Data Sets for Data Mining Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2012.

Index Terms

Computer Science

Information Sciences

Keywords

SQL Operators Aggregate functions Data Set Preparation