We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences

by Rama Subbu Lakshmi B, Jayabhaduri R
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 41 - Number 19
Year of Publication: 2012
Authors: Rama Subbu Lakshmi B, Jayabhaduri R
10.5120/5651-8038

Rama Subbu Lakshmi B, Jayabhaduri R . Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences. International Journal of Computer Applications. 41, 19 ( March 2012), 23-29. DOI=10.5120/5651-8038

@article{ 10.5120/5651-8038,
author = { Rama Subbu Lakshmi B, Jayabhaduri R },
title = { Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 41 },
number = { 19 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 23-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume41/number19/5651-8038/ },
doi = { 10.5120/5651-8038 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:30:02.160358+05:30
%A Rama Subbu Lakshmi B
%A Jayabhaduri R
%T Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences
%J International Journal of Computer Applications
%@ 0975-8887
%V 41
%N 19
%P 23-29
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Many celebrities and experts from various fields may have been referred by not only their personal names but also by their aliases on web. Aliases are very important in information retrieval to retrieve complete information about a personal name from the web, as some of the web pages of the person may also be referred by his aliases. The aliases for a personal name are extracted by previously proposed alias extraction method. In information retrieval, the web search engine automatically expands the search query on a person name by tagging his aliases for complete information retrieval thereby improving recall in relation detection task and achieving a significant mean reciprocal rank (MRR) of search engine. For the further substantial improvement on recall and MRR from the previously proposed methods, our proposed method will order the aliases based on their associations with the name using the definition of anchor texts-based co-occurrences between name and aliases in order to help the search engine tag the aliases according to the order of associations. The association orders will automatically be discovered by creating an anchor texts-based co-occurrence graph between name and aliases. Ranking support vector machine (SVM) will be used to create connections between name and aliases in the graph by performing ranking on anchor texts-based co-occurrence measures. The hop distances between nodes in the graph will lead to have the associations between name and aliases. The hop distances will be found by mining the graph. The proposed method will outperform previously proposed methods, achieving substantial growth on recall and MRR.

References
  1. J. Artiles, J. Gonzalo , and F. Verdejo, " A Testbed for People Searching Strategies in the WWW," Proc. SIGIR '05, pp. 569-570, 2005.
  2. R. Guha and A. Garg, " Disambiguating People in Search," technical report, Stanford Univ. , 2004.
  3. D. Bollegala, Y. Matsuo, and M. Ishizuka , "Automatic Discovery of Personal Name Aliases from the Web," IEEE Transactions on Knowledge and Data Engineering, vol. 23, No. 6, June 2011.
  4. Y. Matsuo, and M. Ishizuka," Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information", International Journal on Artificial Intelligence Tools, 2004.
  5. W. Lu, L. Chien and H. Lee, "Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach," ACM Transactions on Information Systems, Vol. 22, No. 2, Aprill 2004, Pages 242-269.
  6. Z. Liu, W. Yu, Y. Deng, Y. Wang, and Z. Bian," A Feature selection Method for Document Clustering based on Part-of-Speech and Word Co-occurrence," Proceedings of 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 10), pp. 2331-2334, Aug 2010.
  7. F. Figueiredo, L. Rocha, T. Couto, T. Salles, M. A. Gonclaves, and W. Meira Jr, "Word Co-occurrence Features for Text Classification", Vol 36, Issues 5, Pages 843-858, July 2011.
  8. G. Salton and C. Buckley," Term-Weighting Approaches in Automatic Text Retrieval," Information processing and Management, vol. 24, pp. 513-523, 1988.
  9. T. Dunning, "Accurate Methods for the Statistics of Surprise and Coincidence," Computational Linguistics, vol. 19, pp. 61-74, 1993.
  10. K. Church and P. Hanks, "Word Association Norms, Mutual Information and Lexicography," Computational Linguistics, Vol. 16, pp. 22-29, 1991.
  11. T. Hisamitsu and Y. Niwa, " Topic-Word Selection Based on Combinatorial Probability," Proc. Natural Language Processing Pacific-Rim Symp. (NLPRS '01), pp. 289-296, 2001.
  12. F. Smadja, "Retrieveing Collocations from Text: Xtract," Computational Liguistics, Vol. 19, no 1, pp. 143-177, 1993.
  13. T. Joachims, " Optimizing Search Engines using Clickthrough Data," proc. ACM SIGKDD '02, 2002.
  14. D. Chakrabarti and C. Faloutsos , "Graph Mining: Laws, Generators, and Algorithms," ACM Computing Surveys, Vol. 38, March 2006, Article 2.
  15. C. C. Agarwal and H. Wang, " Graph Data Management and Mining : A Survey of Algorithms and Applications," DOI 10. 1007/978-1-4419-6045-0_2,@ Springler Science+Business Media, LLC 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Anchor Text Mining Graph Mining Word Co-occurrence Graph