CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

An Improved Technique for Web Page Classification in Respect of Domain Specific Search

by Vivek Chandra, Nidhi Saxena
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 102 - Number 4
Year of Publication: 2014
Authors: Vivek Chandra, Nidhi Saxena
10.5120/17801-8615

Vivek Chandra, Nidhi Saxena . An Improved Technique for Web Page Classification in Respect of Domain Specific Search. International Journal of Computer Applications. 102, 4 ( September 2014), 7-10. DOI=10.5120/17801-8615

@article{ 10.5120/17801-8615,
author = { Vivek Chandra, Nidhi Saxena },
title = { An Improved Technique for Web Page Classification in Respect of Domain Specific Search },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 102 },
number = { 4 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 7-10 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume102/number4/17801-8615/ },
doi = { 10.5120/17801-8615 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:32:13.522793+05:30
%A Vivek Chandra
%A Nidhi Saxena
%T An Improved Technique for Web Page Classification in Respect of Domain Specific Search
%J International Journal of Computer Applications
%@ 0975-8887
%V 102
%N 4
%P 7-10
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A domain specific crawler, as diverse from a general web search engine, focuses on a specific segment of web content. They are also called vertical or topical search engines. Common vertical search engines are meant for shopping, automotive industry, legal information, medical information, scholarly literature, and travel. Examples of vertical search engines are Trulia. com, Mocavo. com and Yelp. In contrast to genera lpurpose Web search engines, which attempt to index large portions of the World Wide Web using a web crawler, vertical search engines typically use a domain specific crawler that attempts to index only Web pages that are relevant to a pre-defined topic or set of topics. Vertical search offers several potential benefits over general search such as greater precision due to their limited scope, leverage domain knowledge including taxonomies and ontology and support of specific unique user tasks. This paper aims at analyzing the machine learning Techniques namely ANN, SVM and Hi-SVM being used for Web Page Classification and suggesting suitable improvements. Here a crawling framework has been designed and developed that allows flexible addition of new classifiers. This crawler has been used for classification of web content for few domains. The crawlers themselves are implemented as multithreaded objects that run concurrently. The results show that Hi-SVM is a better choice for guiding a topical crawler when compared to Support Vector Machine and Neural Network. The comparative analysis of the three classifier techniques namely ANN, SVM and Hi-SVM showed that the performance of Hi-SVM is most efficient.

References
  1. De Bra, P. , Houben, G. , Kornatzky, Y. , and Post, R. "Information Retrieval in Distributed Hypertexts". Proceedings of RIAO'94, Intelligent Multimedia, Information Retrieval Systems and Management, pages 481–491,New York, 1994.
  2. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks, 31(11-16):1623–1640, 1999.
  3. Menczer, F. , Pant, G. and Srinivasan, P. "Topical Web Crawlers: Evaluating Adaptive Algorithms". ACM Transactions on Internet Technology (TOIT). 4(4):378–419, Nov. 2004.
  4. F. Menczer, G. Pant, and P. Srinivasan. Topical Web crawlers: evaluating adaptive algorithms. ACM Transactions on Internet Technology, 4(4):378–419, Nov. 2004.
  5. S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW2002, Hawaii, May 2002.
  6. Data Mining Algorithms In R-Classification-penalizedSVM - Wikibooks, open books for an open world. htm.
  7. Artificial Neural Networks Neural Network Basics - Wikibooks, open books for an open world. htm.
  8. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999.
  9. M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
  10. Robert Krovetz and W. Bruce Croft. Lexical ambiguity and information retrieval. Information Systems, 10(2):115–141, 1992.
  11. Yilmazel, O. Finneran, C. M. , Liddy E. D. Metaextract: an NLP system to automatically assign metadata. In Proc. JCDL. 2004.
Index Terms

Computer Science
Information Sciences

Keywords

ANN SVM HiSVM VSM ROC REC POS WSD SOE.