We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

An Open Source ETL Tool - Medium and Small Scale Enterprise ETL(MaSSEETL)

by Rupali Gill, Jaiteg Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 4
Year of Publication: 2014
Authors: Rupali Gill, Jaiteg Singh
10.5120/18899-0190

Rupali Gill, Jaiteg Singh . An Open Source ETL Tool - Medium and Small Scale Enterprise ETL(MaSSEETL). International Journal of Computer Applications. 108, 4 ( December 2014), 15-22. DOI=10.5120/18899-0190

@article{ 10.5120/18899-0190,
author = { Rupali Gill, Jaiteg Singh },
title = { An Open Source ETL Tool - Medium and Small Scale Enterprise ETL(MaSSEETL) },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 4 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 15-22 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number4/18899-0190/ },
doi = { 10.5120/18899-0190 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:42:06.576137+05:30
%A Rupali Gill
%A Jaiteg Singh
%T An Open Source ETL Tool - Medium and Small Scale Enterprise ETL(MaSSEETL)
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 4
%P 15-22
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Data Warehouse (DW) environment, Extraction-Transformation-Loading (ETL) processes consumes up to 70% of resources. Data quality tools aim at detecting and correcting data problems that affect the accuracy and efficiency of data analysis applications. Source data imported into the data warehouse often has different quality, format, coding etc. In order to bring all the data together in a standard, homogeneous environment, Extraction–transformation–loading (ETL) tools are used. ETL solutions provided so far are either proprietary and have limited functionality. Small and Medium Scale Enterprises(SME) and Small Scale Enterprises (SSE) cannot afford the licensing cost of these paid tools. The developed tool is capable of providing an integrated and open source data quality solution - MaSSEETL is to deal with naming conflicts, structural conflicts, date conversions, missing values and changing dimensions. MaSSEETL solves the appropriate errors with appropriate level of warning. In this paper, we are presenting the working of MaSSEETL. The tool provides an pragmatic evidence of strategic intensification of quality data in the academic and business enterprises.

References
  1. Pandey K. Rahul (2014). Data Quality in Data warehouse: problems and solution. IOSR-Journal of Computer Engineering Volume 16 Issue 1 pp. 18-24.
  2. Saravanan P. (2014) "An Iterative Estimator for Predicting the Heterogeneous Data Sets", Weekly Science Research Journal ISSN: 2321-7871 Volume- 1 Issue -27 pp-1-15'
  3. Choudhary N. (2014) "A Study over Problems and Approaches of Data Cleansing/Cleaning", International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X Volume 4 Issue 2 pp- 774-779
  4. Srikanth K. ; Murthy N. V. E. S; Anitha J. (2013) " Data Waehousing Concept Using ETL Process For SCD Type-3" International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) ISSN: 2276-6856 Vol. 2, Issue 5 pp-142-145
  5. Sujatha. R (2013) "Enhancing Iterative Non-Parametric Algorithm for Calculating Missing Values of Heterogeneous Datasets by Clustering" , International Journal of Scientific and Research Publication ISSN: 2250-3153 Volume 3 Issue 3 pp-1-4'
  6. Kabiri A. ; Chiadmi D. (2013) "Survey on ETL Processes", Journal of Theoretical and Applied Information Technology. Vol. 54 No. 2
  7. Srikanth K. ; Murthy N. V. E. S. ; Anitha J. (2013) "Data Warehousing Concept Using ETL Process for SCD Type-2", American Journal of Engineering Research (AJER) e-ISSN: 2320-0847 p-ISSN: 2320-0936 Volume-2, Issue-4, pp-86-91' 2013
  8. Rao S. Chinta; Rajanikanth J. ; Chandra Sekhar V. ; MSVS Bhadri R. (2012) "Data Cleaning Framework for Robust Data Quality in Enterprise Data Warehouse" , IJCST e- ISSN : 0976-8491 p- ISSN : 2229-4333 Vol. 3, Issue 3, pp 36-41
  9. Singh R. ; Singh K. (2009). "A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing", International Journal of Computer and Electrical Engineering, Vol. 1, No. 4
  10. Vassiliadis P. ; Simitsis A. ; Baikousi E. (2009) "A Taxonomy of ETL Activities" DOLAP '09 Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP, pp 25-32
  11. Singh J. ; Singh K. (2009) "Statistically Analyzing the Impact of Automated ETL Testing on the Data Quality of a Data Warehouse", International Journal of Computer and Electrical Engineering, Vol. 1, No. 4
  12. Rodi´c J. ; Baranovi´c M. (2009) "Generating Data Quality Rules and Integration into ETL Process", DOLAP'09 ACM
  13. Muller H. ; Freytag J. (2003). "Problems, Methods, and Challenges in Comprehensive Data Cleansing", pp. 21.
  14. Rahm, E. ; Do; H. H. (2000). "Data Cleaning: Problems and Current Approaches" IEEE Data Engineering Bull. Vol 23 No. 4, pp. 3-13
Index Terms

Computer Science
Information Sciences

Keywords

Data inconsistency identification of errors organization growth ETL data quality