CFP last date
20 January 2025
Reseach Article

An Enhanced Approach for Treating Missing Value using Boosted K-NN

by K. Sathesh Kumar, M. Hemalatha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 54 - Number 9
Year of Publication: 2012
Authors: K. Sathesh Kumar, M. Hemalatha
10.5120/8597-2361

K. Sathesh Kumar, M. Hemalatha . An Enhanced Approach for Treating Missing Value using Boosted K-NN. International Journal of Computer Applications. 54, 9 ( September 2012), 35-41. DOI=10.5120/8597-2361

@article{ 10.5120/8597-2361,
author = { K. Sathesh Kumar, M. Hemalatha },
title = { An Enhanced Approach for Treating Missing Value using Boosted K-NN },
journal = { International Journal of Computer Applications },
issue_date = { September 2012 },
volume = { 54 },
number = { 9 },
month = { September },
year = { 2012 },
issn = { 0975-8887 },
pages = { 35-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume54/number9/8597-2361/ },
doi = { 10.5120/8597-2361 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:55:17.361603+05:30
%A K. Sathesh Kumar
%A M. Hemalatha
%T An Enhanced Approach for Treating Missing Value using Boosted K-NN
%J International Journal of Computer Applications
%@ 0975-8887
%V 54
%N 9
%P 35-41
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Knowledge Discovery in Dataset (KDD) plays a vital role in information analysis and retrieval based applications. Quality of data is the most indispensable component of KDD. The factor which affects the quality of datasets is presence of missing values. The data collected from the real world often contains serious data quality troubles such as incomplete, redundant, inconsistent, and/or noisy data. Handling missing values should be cautiously considered, or else prejudice might be introduced into the knowledge induced. The current work investigates three different treatments for dealing with missing values in United States Congressional Voting Records Database. All the machine learning methods were employed in one of the leading open-source data mining applications. This proposed study centers on the performance Evaluation of several classification models induced from data after applying three different methods to treat missing values. Results show that by boosting the k-nearest neighbor for imputation bids significant enhancement over traditional techniques (case/pairwise deletion and Replace missing value using mean ).

References
  1. Suneetha, K. R, and R. Krishnamoorthi, 2009. Data Preprocessing and Easy Access Retrieval of Data through Data Warehouse. Proc. World Congress on Engineering and Computer Science (IWCECS'09).
  2. Data mining handling missing value http://www. developerzenom/2009/08/14/data-mining-handling-missing-values-the-database/
  3. Bernadette Bouchon-Meunier,Marcin Detyniecki, Marie-Jeanne Lesott,Christophe Marsala. and Maria Rifqi. : Real-World Fuzzy Logic Applications in Data Mining and Information Retrieval:Page(s)219-247
  4. Bikash Mukhopadhyay, Sripati Mukhopadhyay,. Data Mining Techniques for Information Retrieval 2nd International CALIBER-2004, New Delhi, 11-13 February, 2004
  5. Jaana Kekäläinen & Kalervo Järvelin," evaluating information retrieval system under the challenges of interaction of multidimensional dynamicrelevance", Published in: Harry Bruce, Raya Fidel, Peter Ingwersen, andPertti Vakkari (Eds. ) Proceed-ings of the 4th CoLIS Conference. Greenwood Village, CO: Libraries Unlimited, Page(s). 253-270.
  6. Tarun Jain Sai Ram Kunala Ravi Kishore Kandala C. V. Jawahar. :ASystem for Information Retrieval Applications on Broadcast NewsVideos. : International Institute of Information Technology. Hyderabad, India
  7. James Allan (editor), Jay Aslam, Nicholas Belkin, Chris Buckley, Jamie Callan, Bruce Croft (editor), Sue Dumais,Norbert Fuhr. :Challenges in Information Retrieval and Language Modeling. : Report of a Workshop held at the Center for Intelligent Information Retrieval,University of Massachusetts Amherst, September 2002 Page(s)1-17
  8. Congressional Quarterly Almanac, 98th Congress, 2nd session 1984,Volume XL: Congressional Quarterly Inc. Washington, D. C. , 1985.
  9. BayesTheorem,"http://www. cuttheknot. org/Probability/BayesTheorem. shtml"
  10. chao-ying joanne peng, kuk lida lee, gary m. Ingersoll. : An Introduction to Logistic Regression Analysis and Reporting: The Journal of Educational Research:Page(s)3-14
  11. Tejaswini Abhijit Hilage, R. V. Kulkarni. : Review of literature on data mining. IJRRAS 10 (1) January 2012 Page{s}107-114
  12. "greedy best first search walkthrough. Book reference: Artificialintelligence, A modern approach 4. 1,http://www. cs. utah. edu/~hal/courses/2009S_AI/Walkthrough/GreedyBFS.
  13. David A. Dickey, N. Carolina State U. , Raleigh, NC. : Introduction to Predictive Modeling with Examples. Statistics and Data Analysis Global Forum 2012. Page{s}1-14
  14. Acuna, E. & Rodriguez, C. (2009). The treatment of missing valuesand its effect in theclassifier accuracy. Retrieved 30/01/2009 fromhttp://academic. uprm. edu/~eacuna/IFCS04r. pdf
  15. "precision and recall", http://www. bainsight. com/blog-archive/Pages/3-7-2011-1. aspx
  16. Evaluatingaclassificationmodel,"http://www. compumine. com/web/public/newsletter/20071/precision-recall"
  17. Precision and accuracy,"http://www. worsleyschool. net/science/files/precision/andaccuracy. html"
  18. Virpi Lyytikäinen, Pasi Tiitinen, Airi Salminen. : Challenges for European legal information retrieval Published in F. Galindo & G. Quirchmayer (Eds. ), Proceedings of the IFIP 8. 5 Working. Page(s)1-16
  19. Vandana Dhingra, Komal Kumar Bhatia. : Towards Intelligent Information Retrieval on Web. International Journal on Computer Science and Engineering Vol. 3 No. 4 Apr 2011. Page{s}1721-1726
  20. Max J. Egenhofer. : Toward the Semantic Geospatial Web: In Proc. 10th ACM Int. Symp. on Advances in Geographic Information Systems, 2002.
  21. Raghda Fouad, Mohamed Hashem, Nagwa Badr and Max J. Egenhofer Hanaa Talha,: Exploring a Hybrid of Geospatial Semantic Information in Ubiquitous Computing Environments : International Journal of Computer Science Issues, Vol. 8, Issue 6, No 2 on December 2011,Page(s) 117-121
  22. Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. : A Practical Guide to Support Vector Classication. Page{s}1-16
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Information Retrieval Data Preprocessing Data Cleaning Data Warehouse