International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 118 - Number 12 |
Year of Publication: 2015 |
Authors: Archana A. Chaudhari, Harmeet Kaur Khanuja |
10.5120/20800-3482 |
Archana A. Chaudhari, Harmeet Kaur Khanuja . Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model. International Journal of Computer Applications. 118, 12 ( May 2015), 41-45. DOI=10.5120/20800-3482
In Data mining project most of the time consuming task is to prepare a required data set for data mining analysis because in general the relational database has collection of tables and views that must be joined, aggregated and transformed in order to build the required data set. As result, most of the complex SQL queries are written multiple times independently from each other and in a disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model. Similarly existing SQL aggregations having some limitations to prepare normalized data sets because they return only one column per aggregated group. To address this issue, we propose simple methods to generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column is associated to a one variable. This new class of functions is called horizontal aggregations. Horizontal aggregations is extension of standard SQL aggregation for building data sets with a horizontal denormalized layout, which is input for most of the data mining algorithms. By providing these standard normalized data-set as an input to the Decision tree generation algorithm for generating Decision tree, similarly we can generate extended ER model.