Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold

Gourav Bathla; Rajni Jindal

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

Multi-Band RLS Estimation with Rank Two Updates: Application to Short-Term Temperature Forecast

Alexander Stotsky

Random Articles

Consumer Preferences for Mobile Carriers in Tanzania: A Case of Group, Family, Age and Gender

April

2015

Artificial Neural Network for Human Behavior Prediction through Handwriting Analysis

May

2010

Reverse Engineering Java Code to Class Diagram: An Experience Report

September

2011

A Model for African Fabrics Analysis and Recognition

November

2013

Reseach Article

Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold

by Gourav Bathla, Rajni Jindal

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 33 - Number 5

Year of Publication: 2011

Authors: Gourav Bathla, Rajni Jindal

10.5120/4014-5701

Gourav Bathla, Rajni Jindal . Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold. International Journal of Computer Applications. 33, 5 ( November 2011), 9-13. DOI=10.5120/4014-5701

@article{ 10.5120/4014-5701,

author = { Gourav Bathla, Rajni Jindal },

title = { Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold },

journal = { International Journal of Computer Applications },

issue_date = { November 2011 },

volume = { 33 },

number = { 5 },

month = { November },

year = { 2011 },

issn = { 0975-8887 },

pages = { 9-13 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume33/number5/4014-5701/ },

doi = { 10.5120/4014-5701 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:19:16.615508+05:30

%A Gourav Bathla

%A Rajni Jindal

%T Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold

%J International Journal of Computer Applications

%@ 0975-8887

%V 33

%N 5

%P 9-13

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Patents and Research papers are published in various fields. These are stored in various conferences and journals database. If a user (researcher or any general user) want to search for any patent or research paper in any particular field, then there is lack of search criteria available for this. In this paper, we have used nearest neighbor algorithm with cosine similarity to categorize patents and research papers. In this paper, experimental results show that if a user want to search for the patent or research paper in any particular field or category, then user would get better results. The advantage of the approach presented in this paper is that the search area becomes very small and so waiting time of user to get answer of query reduces to a large extent. To take decision about category of particular research paper or patent, there have been a lot of research work but categorizing was not that much accurate. In this paper, we have calculated threshold based on the similarity of terms between query and research paper or patent. This proposed calculation of threshold value is not based on numerical values. So, this novel approach of threshold calculation categorize more accurately than previous research work.

References

Juan Ramos, Department of Computer Science, ICML 2005.Using TF-IDF to determine Word Relevance in Document Queries.
Peter D. Turney, Patric Pantel, Journal of Artificial Intelligence Research, 141-188, 2010. From frequency to Meaning: Vector Space Models of Semantics.
Christian Platzer, Schahram Dustdar ECOWS, IEEE 2005. A Vector Space Search Engine for Web Services.
Stephan Robertson. Journal of Documentation, Volume 60, Number 5, pp. 503-520,2004.Understanding Inverse Document Frequency: On theoretical arguments for IDF, Microsoft Research.
Sergey Brin, Lawrence Page. CNISDNS, Volume 30, Issue 1-7, pp.101-117, ACM 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine.
S.Suseela. Periyar Maniammai University 2009. Document Clustering Based on Term Frequency and Inverse Document Frequency.
Gang Qian, Shamik Sural, Yuelong Gu, Sakti Pramanik. SAC, pp.1232-1237, ACM 2004. Similarity between Euclidean and Cosine angle distance for nearest neighbor queries.
T.W.Fox. IEEE 2005. Document Vector Compression and Its Application in Document Clustering.
John Zakos, Brijesh Verma. ICDAR, pp.909-913, IEEE 2005.A Novel Context Matching Based Technique for Web Document Retrieval
Yun-lei Cai, Duo Ji, Dong-feng Cai. NTCIR-8, 2010. A KNN Research Paper Classification Method Based on Shared Nearest Neighbor.
Isa, D., Lee, L. H., Kallimani, V. P., and Rajkumar, R. IEEE Transactions on Knowledge and Data Engineering, Vol. 20, pp. 23-31. Text document preprocessing with the Bayes formula for classification using the support vector machine.
Songbo, T., Cheng, X., Ghanem, M. M., Wnag, B., and Xu, H. Proceedings of Fourteenth ACM International Conference on Information and Knowledge Management, pp 469 – 476, 2005. A novel refinement approach for text categorization.
Lan, M., Tan, C. L., Su. J., and Lu, Y. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 31 (4), pp. 721 – 735, 2009. Supervised and Traditional Term weighting methods for Automatic Text Categorization.
Juan Zhang, Yi Nui, Huabei Nie. International Conference on Computational Intelligence and Security 2009. Web Document Classification Based on Fuzzy k-NN Algorithm.
Alok Ranjan, Eatesh Kandpal, Harish Verma, Joydip Dhar. IJCSIS Vol.7 ,No. 2, pp. 257-261, 2010. An Analytical Approach to Document Clustering Based on Internal Criterion Function.

Index Terms

Computer Science

Information Sciences

Keywords

Search Engine Term Frequency Inverse Document Frequency Vector Space Model Nearest Neighbor S-Cut Threshold