CFP last date
20 January 2025
Reseach Article

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

by Jasna S, Manu J Pillai
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 86 - Number 13
Year of Publication: 2014
Authors: Jasna S, Manu J Pillai
10.5120/15047-3415

Jasna S, Manu J Pillai . Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL. International Journal of Computer Applications. 86, 13 ( January 2014), 32-36. DOI=10.5120/15047-3415

@article{ 10.5120/15047-3415,
author = { Jasna S, Manu J Pillai },
title = { Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 86 },
number = { 13 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 32-36 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume86/number13/15047-3415/ },
doi = { 10.5120/15047-3415 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:04:08.657328+05:30
%A Jasna S
%A Manu J Pillai
%T Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
%J International Journal of Computer Applications
%@ 0975-8887
%V 86
%N 13
%P 32-36
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A huge amount of time is needed for making the dataset for the data mining analysis because data mining practitioners required to write complex SQL queries and many tables are to be joined to get the aggregated result. The traditional SQL aggregations prepare the data sets in vertical layout that is; they return result on one column per aggregated group. But for the data mining project, the data set to be required in horizontal layout. In order to transform the data into suitable form the existing three horizontal aggregation methods are used. The existing method for evaluating horizontal aggregation are SPJ (select, project, join) method, CASE method and PIVOT method. The analysis become more efficient if the dataset obtained is in the horizontal form. The main aim is to identify the most efficient method from these three methods in terms of time and space complexity. So these methods are compared using large tables and identified that the CASE method is more efficient than SPJ and PIVOT method.

References
  1. H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 1st edition, 2001.
  2. C. Ordonez Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(2):188-201, 2006.
  3. C. Galindo-Legaria and A. Rosenthal. Outer join simplification and reordering for query optimization. ACM TODS, 22(1):43. 73, 1997.
  4. J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross- tab and subtotal. In ICDE Conference, pages 152. 159, 1996.
  5. G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In proc. ACM KDD Conference, pages 204. 208, 1998.
  6. J. Clear, D. Dunn, B. Harvey, M. L. Heytens, and P. Lohman. Non-stop SQL/MX primitives for knowledge discovery. In ACM KDD Conference, pages 425. 429, 1999.
  7. C. Ordonez. Vertical and horizontal percentage aggregations. In Proc. ACM SIGMOD Conference, pages 866. 871, 2004.
  8. C. Cunningham, G. Graefe, and C. A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages 998. 1009, 2004.
  9. C. Ordonez. Horizontal aggregations for building tabular data sets. In Proc. ACM SIGMOD Data Mining and Knowledge Discovery Workshop, pages 35. 42, 2004.
  10. C. Ordonez, Zhibo Chen. Horizontal aggregations in SQL to prepare Data Sets for Data Mining Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2012.
Index Terms

Computer Science
Information Sciences

Keywords

SQL Operators Aggregate functions Data Set Preparation