An Improved Technique for Web Page Classification in Respect of Domain Specific Search

Vivek Chandra; Nidhi Saxena

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Real-time Synchronization Mechanisms Between Batch-oriented Legacy Systems and Modern Interfaces in the Retirement Domain

Balamurugan Krishnaswamy Gnanasekaran

Random Articles

Estimation of Population Variance in Simple Random Sampling using Auxiliary Information

Nov

2020

Compiler for Detection of Program Vulnerabilities

October

2014

Color Content based Video Retrieval using Block Truncation Coding with Different Color Spaces

February

2013

A Novel Progressive Sampling based Approach for Effective Mining of Association Rules

November

2010

Reseach Article

An Improved Technique for Web Page Classification in Respect of Domain Specific Search

by Vivek Chandra, Nidhi Saxena

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 102 - Number 4

Year of Publication: 2014

Authors: Vivek Chandra, Nidhi Saxena

10.5120/17801-8615

Vivek Chandra, Nidhi Saxena . An Improved Technique for Web Page Classification in Respect of Domain Specific Search. International Journal of Computer Applications. 102, 4 ( September 2014), 7-10. DOI=10.5120/17801-8615

@article{ 10.5120/17801-8615,

author = { Vivek Chandra, Nidhi Saxena },

title = { An Improved Technique for Web Page Classification in Respect of Domain Specific Search },

journal = { International Journal of Computer Applications },

issue_date = { September 2014 },

volume = { 102 },

number = { 4 },

month = { September },

year = { 2014 },

issn = { 0975-8887 },

pages = { 7-10 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume102/number4/17801-8615/ },

doi = { 10.5120/17801-8615 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:32:13.522793+05:30

%A Vivek Chandra

%A Nidhi Saxena

%T An Improved Technique for Web Page Classification in Respect of Domain Specific Search

%J International Journal of Computer Applications

%@ 0975-8887

%V 102

%N 4

%P 7-10

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

A domain specific crawler, as diverse from a general web search engine, focuses on a specific segment of web content. They are also called vertical or topical search engines. Common vertical search engines are meant for shopping, automotive industry, legal information, medical information, scholarly literature, and travel. Examples of vertical search engines are Trulia. com, Mocavo. com and Yelp. In contrast to genera lpurpose Web search engines, which attempt to index large portions of the World Wide Web using a web crawler, vertical search engines typically use a domain specific crawler that attempts to index only Web pages that are relevant to a pre-defined topic or set of topics. Vertical search offers several potential benefits over general search such as greater precision due to their limited scope, leverage domain knowledge including taxonomies and ontology and support of specific unique user tasks. This paper aims at analyzing the machine learning Techniques namely ANN, SVM and Hi-SVM being used for Web Page Classification and suggesting suitable improvements. Here a crawling framework has been designed and developed that allows flexible addition of new classifiers. This crawler has been used for classification of web content for few domains. The crawlers themselves are implemented as multithreaded objects that run concurrently. The results show that Hi-SVM is a better choice for guiding a topical crawler when compared to Support Vector Machine and Neural Network. The comparative analysis of the three classifier techniques namely ANN, SVM and Hi-SVM showed that the performance of Hi-SVM is most efficient.

References

De Bra, P. , Houben, G. , Kornatzky, Y. , and Post, R. "Information Retrieval in Distributed Hypertexts". Proceedings of RIAO'94, Intelligent Multimedia, Information Retrieval Systems and Management, pages 481–491,New York, 1994.
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks, 31(11-16):1623–1640, 1999.
Menczer, F. , Pant, G. and Srinivasan, P. "Topical Web Crawlers: Evaluating Adaptive Algorithms". ACM Transactions on Internet Technology (TOIT). 4(4):378–419, Nov. 2004.
F. Menczer, G. Pant, and P. Srinivasan. Topical Web crawlers: evaluating adaptive algorithms. ACM Transactions on Internet Technology, 4(4):378–419, Nov. 2004.
S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW2002, Hawaii, May 2002.
Data Mining Algorithms In R-Classification-penalizedSVM - Wikibooks, open books for an open world. htm.
Artificial Neural Networks Neural Network Basics - Wikibooks, open books for an open world. htm.
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999.
M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
Robert Krovetz and W. Bruce Croft. Lexical ambiguity and information retrieval. Information Systems, 10(2):115–141, 1992.
Yilmazel, O. Finneran, C. M. , Liddy E. D. Metaextract: an NLP system to automatically assign metadata. In Proc. JCDL. 2004.

Index Terms

Computer Science

Information Sciences

Keywords

ANN SVM HiSVM VSM ROC REC POS WSD SOE.