Improving Web Search Results by removing Outliers using Data Mining Techniques

Mennatollah M. Mahmoud; Shaimaa Salama; Doaa S. Elzanfaly

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Towards Greener Data Centers: Integrated Optimization of Cooling and Resource Usage via Machine Learning

May

2026

New Approach for Automated Detection and Area Calculation of Brain Tumor from MRI Images

Nov

2017

Elevating Social Network Analysis with a Graph Network and Reinforcement Learning Integration for Node Importance

Dec

2024

A Proposed Model for Generating a Financial Report based on Integration between ERP Systems and (XBRL) Language

Jul

2022

Reseach Article

Improving Web Search Results by removing Outliers using Data Mining Techniques

by Mennatollah M. Mahmoud, Shaimaa Salama, Doaa S. Elzanfaly

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 176 - Number 7

Year of Publication: 2017

Authors: Mennatollah M. Mahmoud, Shaimaa Salama, Doaa S. Elzanfaly

10.5120/ijca2017915635

Mennatollah M. Mahmoud, Shaimaa Salama, Doaa S. Elzanfaly . Improving Web Search Results by removing Outliers using Data Mining Techniques. International Journal of Computer Applications. 176, 7 ( Oct 2017), 9-14. DOI=10.5120/ijca2017915635

@article{ 10.5120/ijca2017915635,

author = { Mennatollah M. Mahmoud, Shaimaa Salama, Doaa S. Elzanfaly },

title = { Improving Web Search Results by removing Outliers using Data Mining Techniques },

journal = { International Journal of Computer Applications },

issue_date = { Oct 2017 },

volume = { 176 },

number = { 7 },

month = { Oct },

year = { 2017 },

issn = { 0975-8887 },

pages = { 9-14 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume176/number7/28565-2017915635/ },

doi = { 10.5120/ijca2017915635 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:41:52.528320+05:30

%A Mennatollah M. Mahmoud

%A Shaimaa Salama

%A Doaa S. Elzanfaly

%T Improving Web Search Results by removing Outliers using Data Mining Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 176

%N 7

%P 9-14

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Many users access the web seeking for information. They put their query or question in search engines that may returns irrelevant pages or results compared to users’ needs. This research paper proposes a model to remove outliers from the search results. The proposed model is based on association rules, modified Naïve Bayes algorithm and clustering techniques. The Naïve Bayes algorithm is modified to help removing outliers from the search results. The proposed model has been evaluated using the Sum of Squared Errors (SSE), silhouette coefficient and entropy evaluation measures against the standard k-medoids algorithm. Experimental results show that the proposed model outperforms the standard k-medoids clustering algorithm in removing the search outliers.

References

D. S. Rajput, R. S. Thakur, and G. S. Thakur, "An integrated approach and framework for document clustering using graph based association rule mining", Second International Conference on Soft Computing for Problem Solving, India, 2012, pp. 1421-1437.
R. K. Roul, O. R. Devanand, and S. K. Sahay, "Web document clustering and ranking using tf-idf based apriori approach," International Conference on Advances in Computer Engineering and Applications ICACEA, 2014, pp. 74-78.
N. Negm, M. Amin, P. Elkafrawy, and A. B. M. Salem, "Investigate the performance of document clustering approach based on association rules mining," (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 4, pp. 142-151, 2013.
N. Shah and S. Mahajan, "Document clustering: a detailed review," International Journal of Applied Information Systems (IJAIS), vol. 4, pp. 30-38, 2012.
T. Velmurugan, "Efficiency of k-means and k-medoids algorithms for clustering arbitrary data points, Int. Journal of Computer Technology & Applications, vol. 3, pp. 1758-1764, 2012.
M. M. Zaw and E. E. Mon, "Web document clustering using cuckoo search clustering algorithm based on levy flight", International Journal of Innovation and Applied Studies vol. 4, pp. 182-188, 2013.
K. A. A. Nazeer, S. D. M. Kumar, and M. P. Sebastian, "Enhancing the k-means clustering algorithm by using a O(n logn) heuristic method for finding better initial centroids" , Second International Conference on Emerging Applications of Information Technology (EAIT), Kolkata, India, 2011.
A.S.N.Chakravarthy, Deepthi.S, K.Satyatej, Sk.Nizmi, and S.Sindhura, "Document clustering in web search engine", International Journal of Computer Trends and Technology, vol. 3, pp. 290-293, 2012.
M. Yasodha and P. Ponmuthuramalingam, "An advanced concept-based mining model to enrich text clustering”, IJCSI International Journal of Computer Science Issues, vol. 9, pp. 417-422, 2012.
P. Vigneshvaran, E. Jayabalan, and K. Vijaya, "A predominant statistical approach to identify semantic similarity of textual documents", in Informatics and Mobile Engineering (PRIME) International Conference on Pattern Recognition, Salem, India, 2013, pp. 496-499.
H. Kim, X. Ren, Y. Sun, C. Wang, and J. Han, "Semantic frame-based document representation for comparable corpora", IEEE 13th International Conference on Data Mining (ICDM), Dallas, TX, USA, 2013.
S. S. Bama, M. S. I. Ahmed, and A. Saravanan, "A mathematical approach for mining web content outliers using term frequency ranking", Journal of Science and Technology, vol. 8, pp. 1-5, 2015.
L. Huang, T. Cassidy, X. Feng, H. Ji, C. R. Voss, J. Han, and A. Sil, "Liberal event extraction and event schema induction", 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 258-268.
W. R. W. Zulkifeli, N. Mustapha, and A. Mustapha, "Classic term weighting technique for mining web content outliers", International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2012), Penang, Malaysia, 2012.
V. Gurusamy and S. Kannan, "Preprocessing techniques for text mining," 2014.
UCI Machine Learning Repository: AAAI 2014 Accepted Papers Data Set. https://archive.ics.uci.edu/ml/datasets/AAAI+2014+Accepted+Papers.
T. M. Kodinariya and P. R. Makwana, "Review on determining number of cluster in k-means clustering", International Journal of Advance Research in Computer Science and Management Studies, vol. 1, pp. 90-95, 2013.
J. Han, M. Kamber, and J. Pei, Cluster analysis: basic concepts and methods in Data mining concepts and techniques, Third Ed. New York, USA: Elsevier Inc.
P.-N. Tan, M. Steinbach, and V. Kumar, Cluster analysis: basic concepts and algorithms in Introduction to data mining. Boston Pearson Addison Wesley, 2006.
A. Rosenberg and J. Hirschberg, "V-Measure: A conditional entropy-based external cluster evaluation measure", Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, 2007, pp. 410–420.
J. Han, M. Kamber, and J. Pei, Classification: basic concepts in Data mining concepts and techniques. New York, USA: Elsevier Inc.
T. R. Patil and S. S. Sherekar, "Performance analysis of naive bayes and j48 classification algorithm for data classification" International Journal of Computer Science and Applications, vol. 6, pp. 256-261, 2013.

Index Terms

Computer Science

Information Sciences

Keywords

Information Retrieval (IR) Web mining Association rules (AR) Classification Clustering Outlier detection.