A Comparison of Imputation Techniques using Network Traffic Data

Fidan Kaya Gülağız; Onur Gök; Adnan Kavak

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

A Comparison of Imputation Techniques using Network Traffic Data

by Fidan Kaya Gülağız, Onur Gök, Adnan Kavak

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 142 - Number 7

Year of Publication: 2016

Authors: Fidan Kaya Gülağız, Onur Gök, Adnan Kavak

10.5120/ijca2016909903

Fidan Kaya Gülağız, Onur Gök, Adnan Kavak . A Comparison of Imputation Techniques using Network Traffic Data. International Journal of Computer Applications. 142, 7 ( May 2016), 25-29. DOI=10.5120/ijca2016909903

@article{ 10.5120/ijca2016909903,

author = { Fidan Kaya Gülağız, Onur Gök, Adnan Kavak },

title = { A Comparison of Imputation Techniques using Network Traffic Data },

journal = { International Journal of Computer Applications },

issue_date = { May 2016 },

volume = { 142 },

number = { 7 },

month = { May },

year = { 2016 },

issn = { 0975-8887 },

pages = { 25-29 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume142/number7/24909-2016909903/ },

doi = { 10.5120/ijca2016909903 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:44:21.119482+05:30

%A Fidan Kaya Gülağız

%A Onur Gök

%A Adnan Kavak

%T A Comparison of Imputation Techniques using Network Traffic Data

%J International Journal of Computer Applications

%@ 0975-8887

%V 142

%N 7

%P 25-29

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Creation of data sets to be used for studies in many different fields of research is really important process. However these data sets suffer from the problem of missing values. There are many different ways of handling missing values. Deletion methods and single imputation methods are the most common ones of these methods. However, this methods lead to high errors in data sets with high loss rates. Data sets used for the analysis of network traffic are also commonly encounters with the missing values. In this study, data produced in different sizes and different missing value rates for the analysis of network traffic in distributed systems. Then, different data imputation methods are compared for dealing with missing values in these datasets. Experimental results showed that Expectation Maximization Method is more applicable and performs better at relatively high missing data rates and k Nearest Neighbors Method performs better at low missing rates.

References

Giraldo, M. M., Sanchez, J. S., Traver, V. J. 2010. A comparison of techniques for handling incomplete data with a focus on attributes relevance influence. In Proceedings of the Ninth International Conference on Machine Learning and Applications.
Twala, B., Cartwright, M., Shepperd, M. 2005. Comparison of various methods for handling incomplete data in software engineering database. In Proceedings of the International Symposium on Empirical Software Engineering.
Chang, G., Ge, T. 2011. Comparison of missing data imputation methods for traffic flow. In Proceedings of the International Conference on Transportation, Mechanical, and Electrical Engineering.
Lıu, C. F., Chen, T. T., Lee, S. J. 2012. A comparison of approaches for dealing with missing values. In Proceedings of the International Conference on Machine Learning and Cybernetics.
Y. Li, Z. Li, L. Li, “Missing traffic data: comparison of imputation methods”, IET Intelligent Transport Systems, 2013.
Chang, G., Ge, T. 2011. Comparison of missing data imputation methods for traffic data. In Proceedings of the International Conference on Transportation, Mechanical and Electrical Engineering.
Yılmaz, H. 2014. Random Forests YöntemindeKayıpVeriProblemininİncelenmesiveSağlıkAlanındaBirUygulama. Master Thesis. University of Eskişehir Osmangazi.
Sezgin, E., Çelik, Y. 2013. Veri madenciliğinde kayıp veriler için kullanılan yöntemlerin karşılaştırılması. In Proceedings of the Akademik Bilişim Konferansı.
Wasito, I. 2003. Least Squares Algorithms with Nearest Neighbour Techniques for Imputing Missing Data Values. Doctora Thesis. University of London.
Goldberger, A. S. 1964 Econometric Theory. New York: John Wiley & Sons.
C. F. J. Wu, “On the convergence properties of the EM Algorithm”, The Annals of Statistics, 1983.
Liu, C., Chen, T., Lee, S. 2012. A comparison of approaches for dealing with missing values. In Proceedings of the International Conference on Machine Learning and Cybernetics.
A. P. Dempster, N. M. Laird, "Maximum likelihood from incomplete data via the EM Algorithm", Journal of the Royal Statistical Society, 1977.
Xu, G, Zong, Y., Yang, Z. 2013 Applied Data Mining. CRC Press.
T. Eylen, C. F. Bazlamaçcı, “One - way active delay measurement with error bounds ”, IEEE Transactions on Instrumentation and Measurement, 2015.

Index Terms

Computer Science

Information Sciences

Keywords

Least Square Estimation (LSE) Expectation Maximization (EM) k Nearest Neighbors (k-NN) Traffic Data Missing Value Imputation.