International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 183 - Number 16 |
Year of Publication: 2021 |
Authors: Hussein Bakiri, Hamisi Ndyetabura, Libe Massawe, Hellen Maziku |
10.5120/ijca2021921503 |
Hussein Bakiri, Hamisi Ndyetabura, Libe Massawe, Hellen Maziku . A Novel Cleansing Method for Random-Walk Data using Extended Multivariate Nonlinear Regression: A Data Preprocessor for Load Forecasting Mechanism. International Journal of Computer Applications. 183, 16 ( Jul 2021), 49-57. DOI=10.5120/ijca2021921503
The efficiency of any load forecasting mechanism depends on the quality and distribution characteristics of the training data. Outliers and missing values are the primary concern, especially in developing countries’ load data. Several research works have proposed the models for the imputation process to deal with outliers before forecasting. However, the efficiency of these approaches is compromised when it comes to data that falls into a random-walk distribution. Thus, this study aims to develop an efficient data cleansing model that accounts for a random-walk distributionby extending the Multivariate Nonlinear Regression (MNLR) method. The k-mean algorithm is used to detect and analyze the size of an outlier in the data. Twenty-minutes interval load data from 2015 to 2019 collected at Kinondoni-North (at Mikocheni distribution network in Dar es salaam) is used in this study. After analyzing the data for outliers, the empirical results detect the presence of outliers by 5.17852% (which is 5207 out of 105192). Finally, the extended-MNLR (e-MNLR) modelachieves promising results over the ANN, SVM, Miss Forest, MICE, and KNN algorithms by attaining 2.109137, 1.956039, and 7.787976 values of RMSE, MAE, and MAPE, respectively.