CFP last date
20 December 2024
Reseach Article

Dynamic k-NN with Attribute Weighting for Automatic Web Page Classification(Dk-NNwAW)

by Manan Gupta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 58 - Number 10
Year of Publication: 2012
Authors: Manan Gupta
10.5120/9321-3554

Manan Gupta . Dynamic k-NN with Attribute Weighting for Automatic Web Page Classification(Dk-NNwAW). International Journal of Computer Applications. 58, 10 ( November 2012), 34-40. DOI=10.5120/9321-3554

@article{ 10.5120/9321-3554,
author = { Manan Gupta },
title = { Dynamic k-NN with Attribute Weighting for Automatic Web Page Classification(Dk-NNwAW) },
journal = { International Journal of Computer Applications },
issue_date = { November 2012 },
volume = { 58 },
number = { 10 },
month = { November },
year = { 2012 },
issn = { 0975-8887 },
pages = { 34-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume58/number10/9321-3554/ },
doi = { 10.5120/9321-3554 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:02:07.954676+05:30
%A Manan Gupta
%T Dynamic k-NN with Attribute Weighting for Automatic Web Page Classification(Dk-NNwAW)
%J International Journal of Computer Applications
%@ 0975-8887
%V 58
%N 10
%P 34-40
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Internet has been in a state of explosive expansion over the last decade and a half. The addition of numerous web pages to the World Wide Web by a vast array of authors on a plethora of topics leaves behind the problem of organizing these web pages in order to improve search results leading to more relevant information. In this paper, a modified attribute weighted dynamic k-Nearest Neighbor classification algorithm, using k-Means clustering, is proposed. This presents a solution to the automatic classification of Web Pages on the WWW, supported by the adaptive dynamic nature of the algorithm. Web pages are classified based on the class distribution of the pages in their neighborhood. Attribute weighting is used primarily to improve classification accuracy in cases of imbalanced class distribution. Empirical results observed show good classification accuracy, while at the same time, improving on other shortcomings of the traditional k-NN classification model.

References
  1. Wakaki T. , Itakura H. , and Tamura M. , Rough Set-Aided Feature Selection for web page classification, In Proceedings ofIEEE/WIC/ACM International Conference on Web Intelligence, pp. 70-76, 2004.
  2. Xu Y. , and Wang H. , A new Feature Selection method based on support vector machine for text categorization, International Journal of Data Analysis Techniques and Strategies, Inderscience publishers, vol. 3, no. 1, pp. 1-20, 2011.
  3. Devi M. I. , Rajaram R. , and Selvakuberan K. , Generating best features for web page classification, Webology, vol. 5, no. 1, Article 52, 2008.
  4. Chih-Ming Chen, Hahn-Ming Lee, and Yu-Jung Chang, Twonovel feature selection approaches for web page classification,Expert systems with Applications, vol. 36, issue 1,pp. 260-272, 2009.
  5. Peng X. , Ming Z. , and Wang H. , Text learning andHierarchialFeature Selection in Web page Classification,LNCS, Advanced Data Mining and Applications, vol. 5139, pp. 452-459, 2008.
  6. Qi X. , and Davison B. D. , Web Page Classification: Features and Algorithms, ACM Computing Surveys, Vol. 14 Issue 2, Article 12, 2009.
  7. Asirvatham A. P. , and Ravi K. K. , Web Page Classificationbased on Document Structure, Awarded Second Prize inNational Level Student Paper Contest conducted by IEEEIndia Council,2001.
  8. Tsukada M. , Washio T. , and Motoda H. , Automatic Web-Page Classification by Using Machine Learning Methods, WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development, pp. 303-313, 2001.
  9. Materna J. , Automatic Web Page Classification, Recent Advances in Slavonic Natural Language Processing, pp. 10, 2008.
  10. Jiang L. , Cai Z. , Wang D. , and Jiang S. , Survey of Improving K-Nearest-Neighbor for Classification, In Proceedingsof the Fourth International Conference on Fuzzy Systemsand Knowledge Discovery (FSKD '07), vol. 1, pp. 679-683,2007.
  11. Ougiaroglou S. , Nanopoulos A. , Papadopoulos A. N. , Manolopoulos Y. , and Welzer-Druzovec T. , Adaptive k- Nearest Neighbor Classification Based on a Dynamic Number of Nearest Neighbors, ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems, pp. 66-82.
  12. Baoli L. , Shiwen Y. , and Qin L. , An Improved k-Nearest Neighbor Algorithm for Text Categorization, In Proceedings of the 20th International Conference on Computer Processing of Oriental Languages, Shenyang, China, 2003.
  13. Tan S. , Neighbor-weighted K-nearest neighbor for unbalanced text corpus, Expert Systems with Applications, vol. 28, pp. 667–671, 2005.
  14. Eui-Hong Han, Karypis G. , and Kumar V. , Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification, 1999.
  15. Wu J. , Cai Z. , and Gao Z. , Dynamic K-Nearest Neighbor with Distance and Attribute Weighted for Classification, 2010 International Conference On Electronics and Information Engineering (ICEIE), vol. 1, pp. 356-360, 2010.
  16. He Z. , Xu X. , and Deng S. , Attribute Value Weighting in k-Modes Clustering, Expert Systems with Applications, Vol. 38 Issue 12, pp. 15365–15369, 2011.
  17. Frazee A. C. , Hathcock M. A. , and Bates Prins S. C. , Distance Functions And Attribute Weighting In a k-Nearest Neighbors Classifier with an Ecological Application, Electronic Proceedings of Undergraduate Mathematics Day, Vol. 4 Issue 3, pp. 1-13, 2010.
  18. Srisawat A. , Phienthrakul T. , and Kijsirikul B. , SV-kNNC: An Algorithm for Improving the Efficiency of k-Nearest Neighbor, 9th Pacific Rim International Conference on Artificial Intelligence Guilin, China, Vol. 4099, pp. 975-979, 2006.
  19. Yong Z. , Youwen L. , and Shixiong X. , An Improved KNN Text Classification Algorithm Based on Clustering, Journal of Computers, Vol. 4 Issue 3, March 2009.
  20. Kyriakopoulou A. , and Kalamboukis T. , Text Classification Using Clustering, ECML-PKDD Discovery Challenge Workshop, 2006.
  21. Parvin H. , Alizadeh H. , and Minaei-Bidgoli B. , MKNN: Modified K-Nearest Neighbor, Proceedings of the World Congress on Engineering and Computer Science (WCECS), October 22 - 24, 2008.
  22. Blake, Merz C. L. , and C. J. , UCI Repository of machine learning databases. Available at http://www. ics. uci. edu/mlearn/MLRepository. html, 1998.
  23. Wu X. , Kumar V. , Quinlan J. R. , Ghosh J. , Yang Q. , Motoda H. , McLachlan G. J. , Ng A. , Liu B. , Yu P. S. , Zhi-Hua Zhou, Steinbach M. , Hand D. J. , and Steinberg D. , Top 10 algorithms in data mining, Knowledge and Information Systems, Vol. 14, pp. 1–37, 2008.
  24. Mangai J. A. , Kumar V. S. , Appavu S. , A Novel Feature Selection Framework for Automatic Web Page Classification International Journal of Automation and Computing, Vol. 9, No. 3, 2012. (Article in Press)
  25. The 4 Universities data set. [Online], Available: http://www. cs. cmu. edu/afs/cs. cmu. edu/project/theo-20/www/data/, July 20-25, 2011.
Index Terms

Computer Science
Information Sciences

Keywords

Dynamic k-NN