CFP last date
20 December 2024
Reseach Article

Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold

by Gourav Bathla, Rajni Jindal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 33 - Number 5
Year of Publication: 2011
Authors: Gourav Bathla, Rajni Jindal
10.5120/4014-5701

Gourav Bathla, Rajni Jindal . Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold. International Journal of Computer Applications. 33, 5 ( November 2011), 9-13. DOI=10.5120/4014-5701

@article{ 10.5120/4014-5701,
author = { Gourav Bathla, Rajni Jindal },
title = { Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold },
journal = { International Journal of Computer Applications },
issue_date = { November 2011 },
volume = { 33 },
number = { 5 },
month = { November },
year = { 2011 },
issn = { 0975-8887 },
pages = { 9-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume33/number5/4014-5701/ },
doi = { 10.5120/4014-5701 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:19:16.615508+05:30
%A Gourav Bathla
%A Rajni Jindal
%T Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold
%J International Journal of Computer Applications
%@ 0975-8887
%V 33
%N 5
%P 9-13
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Patents and Research papers are published in various fields. These are stored in various conferences and journals database. If a user (researcher or any general user) want to search for any patent or research paper in any particular field, then there is lack of search criteria available for this. In this paper, we have used nearest neighbor algorithm with cosine similarity to categorize patents and research papers. In this paper, experimental results show that if a user want to search for the patent or research paper in any particular field or category, then user would get better results. The advantage of the approach presented in this paper is that the search area becomes very small and so waiting time of user to get answer of query reduces to a large extent. To take decision about category of particular research paper or patent, there have been a lot of research work but categorizing was not that much accurate. In this paper, we have calculated threshold based on the similarity of terms between query and research paper or patent. This proposed calculation of threshold value is not based on numerical values. So, this novel approach of threshold calculation categorize more accurately than previous research work.

References
  1. Juan Ramos, Department of Computer Science, ICML 2005.Using TF-IDF to determine Word Relevance in Document Queries.
  2. Peter D. Turney, Patric Pantel, Journal of Artificial Intelligence Research, 141-188, 2010. From frequency to Meaning: Vector Space Models of Semantics.
  3. Christian Platzer, Schahram Dustdar ECOWS, IEEE 2005. A Vector Space Search Engine for Web Services.
  4. Stephan Robertson. Journal of Documentation, Volume 60, Number 5, pp. 503-520,2004.Understanding Inverse Document Frequency: On theoretical arguments for IDF, Microsoft Research.
  5. Sergey Brin, Lawrence Page. CNISDNS, Volume 30, Issue 1-7, pp.101-117, ACM 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine.
  6. S.Suseela. Periyar Maniammai University 2009. Document Clustering Based on Term Frequency and Inverse Document Frequency.
  7. Gang Qian, Shamik Sural, Yuelong Gu, Sakti Pramanik. SAC, pp.1232-1237, ACM 2004. Similarity between Euclidean and Cosine angle distance for nearest neighbor queries.
  8. T.W.Fox. IEEE 2005. Document Vector Compression and Its Application in Document Clustering.
  9. John Zakos, Brijesh Verma. ICDAR, pp.909-913, IEEE 2005.A Novel Context Matching Based Technique for Web Document Retrieval
  10. Yun-lei Cai, Duo Ji, Dong-feng Cai. NTCIR-8, 2010. A KNN Research Paper Classification Method Based on Shared Nearest Neighbor.
  11. Isa, D., Lee, L. H., Kallimani, V. P., and Rajkumar, R. IEEE Transactions on Knowledge and Data Engineering, Vol. 20, pp. 23-31. Text document preprocessing with the Bayes formula for classification using the support vector machine.
  12. Songbo, T., Cheng, X., Ghanem, M. M., Wnag, B., and Xu, H. Proceedings of Fourteenth ACM International Conference on Information and Knowledge Management, pp 469 – 476, 2005. A novel refinement approach for text categorization.
  13. Lan, M., Tan, C. L., Su. J., and Lu, Y. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 31 (4), pp. 721 – 735, 2009. Supervised and Traditional Term weighting methods for Automatic Text Categorization.
  14. Juan Zhang, Yi Nui, Huabei Nie. International Conference on Computational Intelligence and Security 2009. Web Document Classification Based on Fuzzy k-NN Algorithm.
  15. Alok Ranjan, Eatesh Kandpal, Harish Verma, Joydip Dhar. IJCSIS Vol.7 ,No. 2, pp. 257-261, 2010. An Analytical Approach to Document Clustering Based on Internal Criterion Function.
Index Terms

Computer Science
Information Sciences

Keywords

Search Engine Term Frequency Inverse Document Frequency Vector Space Model Nearest Neighbor S-Cut Threshold