We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Normalization Technique for Structure based Web Documents Classification using Rough Set Theory

by Amit Rathore, Kamlesh Namdev
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 179 - Number 26
Year of Publication: 2018
Authors: Amit Rathore, Kamlesh Namdev
10.5120/ijca2018916543

Amit Rathore, Kamlesh Namdev . Normalization Technique for Structure based Web Documents Classification using Rough Set Theory. International Journal of Computer Applications. 179, 26 ( Mar 2018), 1-4. DOI=10.5120/ijca2018916543

@article{ 10.5120/ijca2018916543,
author = { Amit Rathore, Kamlesh Namdev },
title = { Normalization Technique for Structure based Web Documents Classification using Rough Set Theory },
journal = { International Journal of Computer Applications },
issue_date = { Mar 2018 },
volume = { 179 },
number = { 26 },
month = { Mar },
year = { 2018 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume179/number26/29093-2018916543/ },
doi = { 10.5120/ijca2018916543 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:56:31.418997+05:30
%A Amit Rathore
%A Kamlesh Namdev
%T Normalization Technique for Structure based Web Documents Classification using Rough Set Theory
%J International Journal of Computer Applications
%@ 0975-8887
%V 179
%N 26
%P 1-4
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The rapid development of the internet and web publishing techniques create numerous information sources published as HTML document on World Wide Web. WWW is now a popular medium by which people all around the world can spread and gather the information of all kinds. But web document of various sites that are generated. Contain undesired information also. This information is called noisy or irrelevant content. The need for innovative and effective technologies to help find and use the useful information and knowledge from a large variety of data sources is continually increasing. Web information has become increasingly diverse. In order to utilize the Web information better, people pursue the latest technology, which can effectively organize and use online information. Classification is one of the vital and important data mining techniques that grouped various items in a collection to predefined classes or groups. The main goal of classification is to exactly predict the target class for each case in the data. Web Document Classification is technique of data mining to discover classification of Web Documents. The information providers on the web will be interested in techniques that could improve the effectiveness of the web search engine. In this paper, the relationships among the techniques used in data mining are studied. A study of web usage is also done on optimization of this web classification.

References
  1. “A Review on Optimization in Web Document Classification”, International Journal of Advance Foundation and Research in Computer (IJAFRC) Volume 1, Issue 9, September 2014. ISSN 2348 – 4853.
  2. XIAOGUANG QI and BRIAN D. DAVISON, “Web Document Classification: Features and Algorithms”.
  3. Makoto Tsukada, Takashi Washio, Hiroshi Motoda, Mihogaoka, Ibaraki, Osaka 567-0047, JAPAN, “Automatic Web-Page Classification by Using Machine Learning Methods”.
  4. E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems. New York: Oxford Univ.Press, 1999.
  5. M. Dorigo and T. Stützle, Ant Colony Optimization. Cambridge, MA:MIT Press, 2004.
  6. M. Dorigo, V. Maniezzo, and A. Colorni, Positive feedback as a search strategy Dipartimento di Elettronica e Informatica, Politecnico di Milano, Milano, Italy, Tech. Rep. 91016, 1991.
  7. “Ant system: Optimization by a colony of cooperating agents,” IEEE Trans. Syst., Man, Cybern. Part B, vol. 26, no. 1, pp. 29–41, Feb. 1996.
  8. Yang, X. S. Nature-Inspired Metaheuristic Algorithms. Frome: Luniver, Press. (2008). ISBN 1-905986-10-6.
  9. Breiman,L.: Random Forest. Machine Learning. vol. 45, No. 1, pp.5-32 (2001).
  10. Daniele Riboni, D.S.I., University’ degli Studi di Milano, Italy,“Feature Selection for Web Document Classification”.
  11. International Journal of Computer Science & Information Technology (IJCSIT) Vol 4, No 5, October 2012, “Machine Learning Algorithms In Web Document Classification”.
  12. S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD ’98: proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 307–318, New York, NY, USA, 1998.
  13. Qi, X. and B. D. Davison (2009). "Web page classification: Features and algorithms." ACM Computing Surveys (CSUR) 41(2): Article No.: 12.
  14. 3.Xiaogang Peng, Ben Choi (2002), “Automatic Web Page Classification in a Dynamic and Hierarchical Way”, In proceedings of Second IEEE International Conference on Data Mining, Washington DC, IEEE Computer Society, pp: 386-393.
Index Terms

Computer Science
Information Sciences

Keywords

Structured data Unstrutured data Rough Set Theory Web Document Classification.