Optimization of Internet Search based on Noun Phrases and Clustering Techniques

R. Subhashini; V. Jawahar Senthil Kumar

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

by R. Subhashini, V. Jawahar Senthil Kumar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 20 - Number 2

Year of Publication: 2011

Authors: R. Subhashini, V. Jawahar Senthil Kumar

10.5120/2402-3195

R. Subhashini, V. Jawahar Senthil Kumar . Optimization of Internet Search based on Noun Phrases and Clustering Techniques. International Journal of Computer Applications. 20, 2 ( April 2011), 49-54. DOI=10.5120/2402-3195

@article{ 10.5120/2402-3195,

author = { R. Subhashini, V. Jawahar Senthil Kumar },

title = { Optimization of Internet Search based on Noun Phrases and Clustering Techniques },

journal = { International Journal of Computer Applications },

issue_date = { April 2011 },

volume = { 20 },

number = { 2 },

month = { April },

year = { 2011 },

issn = { 0975-8887 },

pages = { 49-54 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume20/number2/2402-3195/ },

doi = { 10.5120/2402-3195 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:06:47.348610+05:30

%A R. Subhashini

%A V. Jawahar Senthil Kumar

%T Optimization of Internet Search based on Noun Phrases and Clustering Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 20

%N 2

%P 49-54

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Information Retrieval plays a vital role in our daily activities and its most prominent role marked in search engines. Retrieval of the relevant natural language text document is of more challenge. Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In this paper, we present linguistic phenomena of NLP using shallow parsing and Chunking to extract the Noun Phrases. These noun phrases are used as key phrases to rank the documents (typically a list of titles and snippets returned by a certain Web search engine). Organizing Web search results in to clusters facilitates user’s quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. Here, we also proposed an approach for web search results clustering based on a phrase based clustering algorithm Known as Optimized Snippet Flat Clustering (OSFC). It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify our method's feasibility and effectiveness.

References

L. Page and S. Brin, “The anatomy of a search engine”, in. Proc. of the 7th International WWW Conference (WWW 98), Brisbane, Australia, April 14–18, 1998.
Jansen, B. J, “The effect of query complexity on Web searching results”, Information Research, Volume 6 No. 1, October, 2000.
M. Liu, X. & Croft, W.B, “Statistical Language Modeling for Information Retrieval”, In Cronin, B. (Ed.). Annual Review of Information Science & Technology. Vol 38, 2004.
D. R. Cutting, D. R. Karger, J. O. Pedersen and J. W. Tukey, “Scatter/Gather: a cluster-based approach to browsing large document collections”, In Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318-29, 1992.
Zamir O., Etzioni O, “Web Document Clustering: A Feasibility Demonstration”, Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'98), 46-54, 1998.
Baeza-Yates, R., Ribeiro-Neto, B. Modern Information Retrieval. ACM Press. New York. pp. 25-30, 1999.
Majumder P., Mitra M., Chaudhari B, “N-gram: A Language Independent Approach to IR and Natural Language Processing”, Lecture Notes, 2002.
Narita, M.& Ogawa, Y, “The use of phrases from query texts in information retrieval”, SIGIR Forum, 34, 318-20 RIAO, College de France, pp. 665-681, 2000.
Khaled M. Hammouda, Mohamed s. Kame, “Efficient Phrase-Based document Indexing for web document clustering”, IEEE Transactions on Knowledge and Data Engineering, vol. 16, No. 10, Oct, 2004.
Hua-Jun Zeng and et.at., “Learning to Cluster Web Search Results ”, SIGIR’04 , Peking University, 2004.
Hung, C. and D. Xiaotie, “A new suffix tree similarity measure for document clustering”, In Proceedings of the 16th international conference on World Wide Web.ACM: Banff, Alberta, Canada, 2007.
J.W.Yang, “A Chinese Web Page Clustering Algorithm Based on the Suffix Tree”, Wuhan University Journal of National Sciences [M]. 9 (5):817-822, 2004
Yahoo! Search BOSS (Build your Own Search Service) http://developer.yahoo.com/search/boss/
A Vector Space Model For Automatic Indexing, G. Salton, A. Wong and C. S. Yang, Cornell University.
SharpNLP - open source natural language processing tools, http://www.codeplex.com/sharpnlp
M. F. Porter, “An algorithm for suffix stripping”, Program, 14(3), pp.130-137, 1980.
Salton, Gerald, and Christopher Buckley,“ Term-weighting approaches in automatic text retrieval”, IP&M 24(5):513–523. 133, 520, 530, 1988.

Index Terms

Computer Science

Information Sciences

Keywords

Noun Phrases Document Clustering Information Retrieval Natural Language Processing Web Mining