CFP last date
20 December 2024
Reseach Article

Top-K Search Query Grouping using SOM Clustering for Search Engine

by Sami Uddin, Amit Kumar Nandanwar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 109 - Number 8
Year of Publication: 2015
Authors: Sami Uddin, Amit Kumar Nandanwar
10.5120/19210-0984

Sami Uddin, Amit Kumar Nandanwar . Top-K Search Query Grouping using SOM Clustering for Search Engine. International Journal of Computer Applications. 109, 8 ( January 2015), 32-39. DOI=10.5120/19210-0984

@article{ 10.5120/19210-0984,
author = { Sami Uddin, Amit Kumar Nandanwar },
title = { Top-K Search Query Grouping using SOM Clustering for Search Engine },
journal = { International Journal of Computer Applications },
issue_date = { January 2015 },
volume = { 109 },
number = { 8 },
month = { January },
year = { 2015 },
issn = { 0975-8887 },
pages = { 32-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume109/number8/19210-0984/ },
doi = { 10.5120/19210-0984 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:44:16.680453+05:30
%A Sami Uddin
%A Amit Kumar Nandanwar
%T Top-K Search Query Grouping using SOM Clustering for Search Engine
%J International Journal of Computer Applications
%@ 0975-8887
%V 109
%N 8
%P 32-39
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is important task for any recommendation system. Clustering method suggested by many researchers for search engine optimization. Search engine help user for better searching by user's query recommendation. Clustering is helpful for finding actual relation between different queries which are not same as they seems. But do clustering of user query is also a difficult task because of user enters lots of type and varying queries. Many time these queries may very short to get their real meaning and also can generate different meanings. Any single query may have various meaning on other hand many different query words may have common meaning for searching contents. Lots of clustering methods are given in last decades for search engine optimization but these methods unable to proper utilization various information hidden in user query log. This paper gives a novel clustering approach based on to identify query similarity and apply SOM clustering for effective clustering results. We propose a novel similarity matrix for user queries by uses of URL clicked by user trough searching results. Text similarity and time similarity are also measure for calculating similarity between two queries. This method shows good results within clustering performance to compare with other existing methods.

References
  1. Dr. G. K. Gupta, "Introduction to Data Mining with Case Studies", PHI Publication, 2005.
  2. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan, "Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data", SIGKDD Explorations, Vol. 1, No. 2, 2000, Page 12-23.
  3. Adel T. Rahmani and B. Hoda Helmi, "EIN-WUM an AIS-based Algorithm for Web Usage Mining", Proceedings of GECCO'08, Atlanta, Georgia, USA, ACM978-1-60558-130-9/08/07, 2008, Pp. 291-292.
  4. Shailey Minocha, Nicola Millard, Lisa Dawson, "Integrating Customer Relationship Management Strategies in (B2C) E-Commerce Environments", IFIP Conference on Human-Computer Interaction- INTERACT, 2003.
  5. C. Ramya, G. Kavitha, K. S. Shreedhara, "Preprocessing: A Prerequisite for Discovering Patterns in Web Usage Mining Process", Computing Research Repository - CORR, vol. abs/1105. 0, 2011.
  6. V. Chitraa, Antony Selvdoss Davamani, "A Survey on Preprocessing Methods for Web Usage Data", Computing Research Repository-CORR, Vol. abs/1004. 1, 2010. Nizar R. Mabroukeh, Christie I. Ezeife, "A taxonomy of sequential pattern mining algorithms", ACM Computing Surveys - CSUR, Vol. 43, No. 1, 2010, Pp. 1-41.
  7. Francesco Moscato, Nicola Mazzocca, Valeria Vittorini, Giusy Di Lorenzo, Paola Mosca, Massimo Magaldi, "Workflow Pattern Analysis in Web Services", High Performance Computing and Communications - HPCC, 2005, Pp. 395-400.
  8. Heasoo Hwang, Hady W. Lauw, Lise Getoor, and Alexandros Ntoulas, "Organizing User Search Histories", IEEE Transactions On Knowledge And Data Engineering, Vol. 24, NO. 5, IEEE, 2012, Page 912-925.
  9. R. Jones and K. L. Klinkner, "Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), 2008.
  10. P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna, "The Query-Flow Graph: Model and Applications," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), 2008.
  11. P. Anick, "Using Terminological Feedback for Web Search Refinement: A Log-Based Study," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2003.
  12. B. J. Jansen, A. Spink, C. Blakely, and S. Koshman, "Defining a Session on Web Search Engines: Research Articles," J. the Am. Soc. for Information Science and Technology, vol. 58, no. 6, pp. 862-871, 2007.
  13. L. D. Catledge and J. E. Pitkow, "Characterizing Browsing Strategies in the World-Wide Web," Computer Networks and ISDN Systems, vol. 27, no. 6, 1995, pp. 1065-1073.
  14. D. He, A. Goker, and D. J. Harper, "Combining Evidence for Automatic Web Session Identification," Information Processing and Management, vol. 38, no. 5, 2002, pp. 727-742.
  15. R. Jones and F. Diaz, "Temporal Profiles of Queries," ACM Trans. Information Systems, vol. 25, no. 3, 2007, p. 14.
  16. A. L. Montgomery and C. Faloutsos, "Identifying Web Browsing Trends and Patterns," Computer, vol. 34, no. 7, July 2001, pp. 94-95.
  17. C. Silverstein, H. Marais, M. Henzinger, and M. Moricz, "Analysis of a Very Large Web Search Engine Query Log," SIGIR Forum, vol. 33, no. 1, 1999, pp. 6-12.
  18. H. C. Ozmutlu and F. C¸ avdur, "Application of Automatic Topic Identification on Excite Web Search Engine Data Logs," Information Processing and Management, vol. 41, no. 5, 2005, pp. 1243-1262.
  19. T. Lau and E. Horvitz, "Patterns of Search: Analyzing and Modeling Web Query Refinement," Proc. Seventh Int'l Conf. User Modeling (UM), 1999.
  20. F. Radlinski and T. Joachims, "Query Chains: Learning to Rank from Implicit Feedback," Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD), 2005.
  21. J. Yi and F. Maghoul, "Query Clustering Using Click-through Graph," Proc. the 18th Int'l Conf. World Wide Web (WWW '09), 2009.
  22. E. Sadikov, J. Madhavan, L. Wang, and A. Halevy, "Clustering Query Refinements by User Intent," Proc. the 19th Int'l Conf. World Wide Web (WWW '10), 2010.
  23. T. Radecki, "Output Ranking Methodology for Document- Clustering-Based Boolean Retrieval Systems," Proc. Eighth Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 1985, pp. 70-76.
  24. V. R. Lesser, "A Modified Two-Level Search Algorithm Using Request Clustering," Report No. ISR-11 to the Nat'l Science Foundation, Section 7, Dept. of Computer Science, Cornell Univ. , 1966.
  25. R. Baeza-Yates, "Graphs from Search Engine Queries," Proc. 33rd Conf. Current Trends in Theory and Practice of Computer Science (SOFSEM), vol. 4362, pp. 1-8, 2007.
  26. K. Collins-Thompson and J. Callan, "Query Expansion Using Random Walk Models," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM), 2005.
  27. N. Craswell and M. Szummer, "Random Walks on the Click Graph," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), 2007.
  28. Spink, M. Park, B. J. Jansen, and J. Pedersen, "Multitasking during Web Search sessions," Information Processing and Management, vol. 42, no. 1, pp. 264-275, 2006
  29. D. Beeferman and A. Berger, "Agglomerative Clustering of a Search Engine Query Log," Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2000.
  30. R. Baeza-Yates and A. Tiberi, "Extracting Semantic Relations from Query Logs," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2007.
  31. P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna, "The Query-Flow Graph: Model and Applications," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM),2008
  32. Lecture Notes in Data Mining,M. Berry, and M. Browne, eds. World Scientific Publishing Company, 2006.
  33. V. I. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions and Reversals,"Soviet Physics Doklady,vol. 10, pp. 707-710, 1966
  34. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal, "Using the Wisdom of the Crowds for Keyword Generation" Proc. the 17th Int'l Conf. World Wide Web (WWW '08),2008.
  35. W. M. Rand, "Objective Criteria for the Evaluation of Clustering Methods" J. the Am. Statistical Assoc. ,vol. 66, no. 336, pp. 846-850, 1971.
  36. Spink, M. Park, B. J. Jansen, and J. Pedersen, "Multitasking during Web Search Sessions,"Information Processing and Manage-ment,vol. 42, no. 1, pp. 264-275, 2006.
  37. R. Baeza-Yates and A. Tiberi, "Extracting Semantic Relations from Query Logs,"Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD),2007.
  38. Yuan Hong, Jaideep Vaidya and Haibing Lu, "Search Engine Query Clustering using Top-k Search Results", IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE, 2011.
  39. Tahira Tabassum, Amit Dubey, "User Search Query Grouping using Association Fusion Graph", International Journal of Advanced Research in Computer Science and Software Engineering, Page 259-267, Volume 4, Issue 4, April 2014.
  40. Heasoo Hwang, Hady W. Lauw, Lise Getoor, Alexandros Ntoulas, "Organizing User Search Histories", IEEE Transactions on Knowledge & Data Engineering, vol. 24, no. 5, pp. 912-925, May 2012.
  41. J. Re. ddy Susmitha & K. Srinivasa Rao, "Systematize Online Query Search With Application Interface", IJAEA, Vol-3 Issue-1, PP 13-17, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Query Logs Query Process SOM Clustering