CFP last date
22 July 2024
Reseach Article

A Survey on using Clustering to Enhance Search Engine Performance

by Mennatollah Mamdouh Mahmoud, Doaa Saad Elzanfaly, Ahmed El-Sayed Yacoup
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 27
Year of Publication: 2023
Authors: Mennatollah Mamdouh Mahmoud, Doaa Saad Elzanfaly, Ahmed El-Sayed Yacoup
10.5120/ijca2023923016

Mennatollah Mamdouh Mahmoud, Doaa Saad Elzanfaly, Ahmed El-Sayed Yacoup . A Survey on using Clustering to Enhance Search Engine Performance. International Journal of Computer Applications. 185, 27 ( Aug 2023), 6-13. DOI=10.5120/ijca2023923016

@article{ 10.5120/ijca2023923016,
author = { Mennatollah Mamdouh Mahmoud, Doaa Saad Elzanfaly, Ahmed El-Sayed Yacoup },
title = { A Survey on using Clustering to Enhance Search Engine Performance },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2023 },
volume = { 185 },
number = { 27 },
month = { Aug },
year = { 2023 },
issn = { 0975-8887 },
pages = { 6-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number27/32858-2023923016/ },
doi = { 10.5120/ijca2023923016 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:27:10.077579+05:30
%A Mennatollah Mamdouh Mahmoud
%A Doaa Saad Elzanfaly
%A Ahmed El-Sayed Yacoup
%T A Survey on using Clustering to Enhance Search Engine Performance
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 27
%P 6-13
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Searching the web seeking relevant documents or information becomes a difficult task. The reason for this problem is the availability of huge amounts of information on the web and poor indexing. So, it is necessary to manage the web resources to help people find the content they are interested in. As a result, the user needs a reliable information retrieval system to find and arrange the pertinent information. Many research studies are conducted to improve the performance of the web to find the most crucial information based on the query of the user. Some of them use ontology and semantic web to get the most relevant information to the user. Others use machine learning techniques, such as clustering to enhance the performance of the search engine. This paper provides a review about the search engine components, and the search engine index structure and ways to update it. This paper also reviews the clustering techniques, such as hard clustering techniques and overlapping clustering techniques, and the methods employed for labeling clusters. The different techniques that have been proposed to improved clustering techniques, cluster labeling and web search index are also discussed in this paper.

References
  1. Shantanu Shahi, AkhileshShukla, and S. Rastogi, "SEARCH ENGINE TECHNIQUES: A REVIEW," Journal of Natural Remedies, vol. 21, pp. 48-55, 2020.
  2. P. P. Joby, "Expedient Information Retrieval System for Web Pages Using the Natural Language Modeling," Journal of Artificial Intelligence and Capsule Networks (2020), vol. 2, pp. 100-110, 2020.
  3. Lata Jaywant Sankpal and S. H. Patil, "Rider-Rank Algorithm-Based Feature Extraction for Re-ranking the Webpages in the Search Engine," The Computer Journal, pp. 1-11, 2020.
  4. C. C. Aggarwal, "Information Retrieval and Search Engines," in Machine Learning for Text, ed Cham: Springer International Publishing, 2022, pp. 257-302.
  5. R. S. T. Lee, Ontological-Based Search Engine, 2020.
  6. D. Sharma, A. K. Giri, R. Shukla, and S. Kumar, "A Brief Review on Search Engine Optimization," presented at the 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2019.
  7. Hussein Al-Bahadili, Saif Al-Saab, Reyadh Naoum, and S. M. Hussain, "A web search engine model based on index-query bit-level compression," ACM International Conference Proceeding Series, 2010.
  8. A. K. Mohideen, S. Majumdar, M. St-Hilaire, and A. El-Haraki, "A Graph-Based Indexing Technique to Enhance the Performance of Boolean AND Queries in Big Data Systems," presented at the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, 2020.
  9. A. K. Mohideen, S. Majumdar, M. St-Hilaire, and A. El-Haraki, "A Data Indexing Technique to Improve the Search Latency of AND Queries for Large Scale Textual Documents," presented at the 2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Leicester, UK, 2020.
  10. M. Yarlagadda, K. G. Rao, and A. Srikrishna, "Document Retrieval and Cluster Based Indexing using Rider Spider Monkey Optimization Algorithm," International Journal of Recent Technology and Engineering (IJRTE), vol. 8, pp. 1318-1327, 2020.
  11. L. Lim, M. Wang, S. Padmanabhan, J. S. Vitter, and R. Agarwal, "Efficient Update of Indexes for Dynamically Changing Web Documents," Springer Verlag, pp. 37-69, 2007.
  12. S. Baadel, F. Thabtah, and J. Lu, "Overlapping Clustering: A Review," presented at the 2016 SAI Computing Conference (SAI), London, UK, 2016.
  13. E. Kerstens, "Non-Exhaustive, Overlapping k-medoids for Document Clustering," presented at the Proceedings of the 53rd Hawaii International Conference on System Sciences, 2020.
  14. J. J. Whang, Y. Hou, D. F. Gleich, and I. S. Dhillon, "Non-exhaustive, Overlapping Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, pp. 1-14, 2019.
  15. A. Alam, M. Muqeem, and S. Ahmad, "Comprehensive review on Clustering Techniques and its application on High Dimensional Data," IJCSNS International Journal of Computer Science & Network Security, vol. 21, pp. 237-244, 2021.
  16. S. M. Mohammed, K. Jacksi, and S. R. M. Zeebaree, "A state-of-the-art survey on semantic similarity for document clustering using GloVe and density-based algorithms," Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, pp. 552-562, 2021.
  17. A. N. Attri Ghosal, A. K. Das, S. Goswami, and M. Panday, "A Short Review on Different Clustering Techniques and Their Applications," presented at the Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing, Singapore, 2020.
  18. V. Mehta, S. Bawa, and J. Singh, "Analytical review of clustering techniques and proximity measures," Artificial Intelligence Review, vol. 53, pp. 5995-6023, 2020/12/01 2020.
  19. C.-E. B. N’Cir, G. Cleuziou, and N. Essoussi, Overview of overlapping partitional clustering methods: © Springer International Publishing Switzerland, 2015.
  20. T. Limungkura and P. Vateekul, "Enhance Accuracy of Partition-based Overlapping Clustering by Exploiting Benefit of Distances between Clusters," presented at the 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), Hanoi, Vietnam, 2016.
  21. T. Limungkura and P. Vateekul, "Partition-based Overlapping Clustering Using Cluster’s Parameters and Relations," presented at the 2017 9th International Conference on Knowledge and Smart Technology (KST), Chonburi, Thailand, 2017.
  22. J. J. Whang, I. S. Dhillon, and D. F. Gleichy, "Non-exhaustive, Overlapping k-means," In SIAM International Conference on Data Mining (SDM), pp. 936-944, 2015.
  23. H. Poostchi and M. Piccardi, "Cluster Labeling by Word Embeddings and WordNet’s Hypernymy," presented at the Proceedings of the Australasian Language Technology Association Workshop 2018, Dunedin, New Zealand, 2018.
  24. M. M. Joshi, "k-Means Clustering to Enhance SEO: A Data Driven Approach " International Journal of Science and Research (IJSR) pp. 550-553, 2020.
  25. M. SHAHROZ, M. F. MUSHTAQ, R. MAJEED, A. SAMAD, Z. MUSHTAQ, and U. AKRAM, "Feature Discrimination of News Based on Canopy and KMGC-Search Clustering," IEEE Access, vol. 10, pp. 26307-26319, 2022.
  26. T. Jenson and A. S. Girsang, "Performance of news clustering using ant colony optimization," Journal of Physics: Conference Series vol. 1566, pp. 12101-12108, 2019.
  27. Z. ZHANG, L. CHEN, F. YIN, X. ZHANG, and L. GUO, "Improving Online Clustering of Chinese Technology Web News With Bag-of-Near-Synonyms," IEEE Access, vol. 8, pp. 94245-94257, 2020.
  28. D. Bansal, R. Grover, and S. Saha, "A Multi-view Multiobjective Partitioning Technique for Search Results Clustering," presented at the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 2021.
  29. H. B. Abdalla, A. M. Ahmed, and M. A. A. Sibahee, "Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data," KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, vol. 14, 2020.
  30. A. B. A. K. S. Guha, "Zone based Indexing Model for Database Identification in Search Query Processing," presented at the 2020 IEEE 1st International Conference for Convergence in Engineering (ICCE), Kolkata, India, 2020.
  31. K. Boukhari and M. N. Omri, "Information Retrieval Approach based on Indexing Text Documents: Application to Biomedical Domain," presented at the 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, 2017.
  32. A. Curiel, C. Gutiérrez-Soto, P.-N. Soto-Borquez, and P. Galdames, "Measuring the Effects of Summarization in Cluster-based Information Retrieval," presented at the 2020 39th International Conference of the Chilean Computer Science Society (SCCC), Coquimbo, Chile, 2020.
  33. A. A. Aroche-Villarruel, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, J. A. Olvera-López, and A. Pérez-Suárez, "Study of Overlapping Clustering Algorithms Based on Kmeans through FBcubed Metric," presented at the
  34. Mexican Conference on Pattern Recognition, 2014.
  35. I. Peganova, A. Rebrova, and Y. Nedumov, "Labelling hierarchical clusters of scientific articles," Ivannikov Memorial Workshop (IVMEM), pp. 26-32, 2019.
  36. M. Billah, M. Bhuiyan, and M. Akterujjaman, "The Unsupervised Method of Clustering and Labeling of the Online Product Based on Reviews," International Journal of Modeling, Simulation, and Scientific Computing, 2020.
  37. N. Niu, S. Reddivari, A. Mahmoud, T. Bhowmik, and S. Xu, "Automatic Labeling of Software Requirements Clusters," presented at the 2012 4th International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation (SUITE), Zurich, Switzerland, 2012.
  38. K. Gutiérrez-Batista, J. R. Campaña, M.-A. Vila, and M. J. Martin-Bautista, "An ontology-based framework for automatic topic detection in multilingual environments," International Journal of Intelligent Systems, pp. 1459- 1475, 2018.
  39. S. Vahidnia, A. Abbasi, and H. A. Abbass, "Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping," presented at the EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, Wuhan, China, 2020.
  40. H. Kim, H. K. Kim, and S. Cho, "Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling," Expert Systems with Applications, vol. 150, pp. 1-12, 2020.
  41. S. Reddivari, "Enhancing Software Requirements Cluster Labeling Using Wikipedia," presented at the 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, 2019.
  42. S. R. Kolhe and S. D. Sawarkar, "A Concept Driven Document Clustering Using WordNet," presented at the 2017 International Conference on Nascent Technologies in the Engineering Field (ICNTE-2017), Vashi, India, 2017.
  43. T. Vishnu and K. Himakireeti, "Automated Text Clustering and Labeling using Hypernyms " International Journal of Applied Engineering Research, vol. 14, pp. 447-451, 2019.
  44. D. K. J. B. Saini, PratapPatil, K. D. Gupta, S. Kumar, P. Singh, and M. Diwakar, "Optimized Web Searching Using Inverted Indexing Technique," presented at the 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT), Indore, India, 2022.
  45. Nay Nandar Linn and T. T. Win, "Efficient Semantic Web Data Searching Using Virtual Documents Algorithm," International Journal of Innovative Science and Research Technology, vol. 5, pp. 298-303, 2020.
  46. Yaya Traoré, Sadouanouan Malo, Bassolé Didier, and S. Abdoulaye, "TOWARD MULTI-LABEL CLASSIFICATION USING AN ONTOLOGY FOR WEB PAGE CLASSIFICATION," pp. 183-191, 2019.
Index Terms

Computer Science
Information Sciences

Keywords

Information Retrieval (IR) Search Engine Clustering Clustering Labeling Web Document Indexing.