Normalization Technique for Structure based Web Documents Classification using Rough Set Theory

Amit Rathore; Kamlesh Namdev

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

Normalization Technique for Structure based Web Documents Classification using Rough Set Theory

by Amit Rathore, Kamlesh Namdev

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 179 - Number 26

Year of Publication: 2018

Authors: Amit Rathore, Kamlesh Namdev

10.5120/ijca2018916543

Amit Rathore, Kamlesh Namdev . Normalization Technique for Structure based Web Documents Classification using Rough Set Theory. International Journal of Computer Applications. 179, 26 ( Mar 2018), 1-4. DOI=10.5120/ijca2018916543

@article{ 10.5120/ijca2018916543,

author = { Amit Rathore, Kamlesh Namdev },

title = { Normalization Technique for Structure based Web Documents Classification using Rough Set Theory },

journal = { International Journal of Computer Applications },

issue_date = { Mar 2018 },

volume = { 179 },

number = { 26 },

month = { Mar },

year = { 2018 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume179/number26/29093-2018916543/ },

doi = { 10.5120/ijca2018916543 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:56:31.418997+05:30

%A Amit Rathore

%A Kamlesh Namdev

%T Normalization Technique for Structure based Web Documents Classification using Rough Set Theory

%J International Journal of Computer Applications

%@ 0975-8887

%V 179

%N 26

%P 1-4

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The rapid development of the internet and web publishing techniques create numerous information sources published as HTML document on World Wide Web. WWW is now a popular medium by which people all around the world can spread and gather the information of all kinds. But web document of various sites that are generated. Contain undesired information also. This information is called noisy or irrelevant content. The need for innovative and effective technologies to help find and use the useful information and knowledge from a large variety of data sources is continually increasing. Web information has become increasingly diverse. In order to utilize the Web information better, people pursue the latest technology, which can effectively organize and use online information. Classification is one of the vital and important data mining techniques that grouped various items in a collection to predefined classes or groups. The main goal of classification is to exactly predict the target class for each case in the data. Web Document Classification is technique of data mining to discover classification of Web Documents. The information providers on the web will be interested in techniques that could improve the effectiveness of the web search engine. In this paper, the relationships among the techniques used in data mining are studied. A study of web usage is also done on optimization of this web classification.

References

“A Review on Optimization in Web Document Classification”, International Journal of Advance Foundation and Research in Computer (IJAFRC) Volume 1, Issue 9, September 2014. ISSN 2348 – 4853.
XIAOGUANG QI and BRIAN D. DAVISON, “Web Document Classification: Features and Algorithms”.
Makoto Tsukada, Takashi Washio, Hiroshi Motoda, Mihogaoka, Ibaraki, Osaka 567-0047, JAPAN, “Automatic Web-Page Classification by Using Machine Learning Methods”.
E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems. New York: Oxford Univ.Press, 1999.
M. Dorigo and T. Stützle, Ant Colony Optimization. Cambridge, MA:MIT Press, 2004.
M. Dorigo, V. Maniezzo, and A. Colorni, Positive feedback as a search strategy Dipartimento di Elettronica e Informatica, Politecnico di Milano, Milano, Italy, Tech. Rep. 91016, 1991.
“Ant system: Optimization by a colony of cooperating agents,” IEEE Trans. Syst., Man, Cybern. Part B, vol. 26, no. 1, pp. 29–41, Feb. 1996.
Yang, X. S. Nature-Inspired Metaheuristic Algorithms. Frome: Luniver, Press. (2008). ISBN 1-905986-10-6.
Breiman,L.: Random Forest. Machine Learning. vol. 45, No. 1, pp.5-32 (2001).
Daniele Riboni, D.S.I., University’ degli Studi di Milano, Italy,“Feature Selection for Web Document Classification”.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 4, No 5, October 2012, “Machine Learning Algorithms In Web Document Classification”.
S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD ’98: proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 307–318, New York, NY, USA, 1998.
Qi, X. and B. D. Davison (2009). "Web page classification: Features and algorithms." ACM Computing Surveys (CSUR) 41(2): Article No.: 12.
3.Xiaogang Peng, Ben Choi (2002), “Automatic Web Page Classification in a Dynamic and Hierarchical Way”, In proceedings of Second IEEE International Conference on Data Mining, Washington DC, IEEE Computer Society, pp: 386-393.

Index Terms

Computer Science

Information Sciences

Keywords

Structured data Unstrutured data Rough Set Theory Web Document Classification.