CFP last date
20 January 2025
Reseach Article

Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise, Cookies and Web Robots

by P. Nithya, P. Sumathi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 53 - Number 17
Year of Publication: 2012
Authors: P. Nithya, P. Sumathi
10.5120/8510-1684

P. Nithya, P. Sumathi . Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise, Cookies and Web Robots. International Journal of Computer Applications. 53, 17 ( September 2012), 1-6. DOI=10.5120/8510-1684

@article{ 10.5120/8510-1684,
author = { P. Nithya, P. Sumathi },
title = { Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise, Cookies and Web Robots },
journal = { International Journal of Computer Applications },
issue_date = { September 2012 },
volume = { 53 },
number = { 17 },
month = { September },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume53/number17/8510-1684/ },
doi = { 10.5120/8510-1684 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:54:17.731999+05:30
%A P. Nithya
%A P. Sumathi
%T Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise, Cookies and Web Robots
%J International Journal of Computer Applications
%@ 0975-8887
%V 53
%N 17
%P 1-6
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. Web pages usually contain huge amount of information that may not interest the user, as it may not be the part of the main content of the web page. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in applications, such as, e-commerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are varieties of problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. This paper continues the line of research on Web access log analysis is to analyze the patterns of web site usage and the features of users behavior. It is the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for efficient web usage mining process. Preprocessing is the process comprises of three phases which includes data cleaning, user identification, and pattern discovery and pattern analysis. Log data is characteristically noisy and unclear, so preprocessing is an essential process for effective mining process. In this paper, a novel pre-processing technique is proposed by removing local and global noise and web robots. Preprocessing is an important step since the Web architecture is very complex in nature and 80% of the mining process is done at this phase. Anonymous Microsoft Web Dataset and MSNBC. com Anonymous Web Dataset are used for evaluating the proposed preprocessing technique.

References
  1. Etminani, K. , Delui, A. R. , Yanehsari, N. R. and Rouhani, M. , "Web Usage Mining: Discovery of the Users' Navigational Patterns Using SOM", First International Conference on Networked Digital Technologies, Pp. 224-249, 2009.
  2. Jianxi Zhang,Peiying Zhao, Lin Shang and Lunsheng Wang, "Web Usage Mining Based On Fuzzy Clustering in Identifying Target Group", International Colloquium on Computing, Communication, Control, and Management, Vol. 4, Pp. 209-212, 2009.
  3. Nina, S. P. , Rahman, M. , Bhuiyan, K. I. and Ahmed, K. , "Pattern Discovery of Web Usage Mining", International Conference on Computer Technology and Development, Vol. 1, Pp. 499-503, 2009.
  4. Chih-Hung Wu, Yen-Liang Wu, Yuan-Ming Chang and Ming-Hung Hung, "Web Usage Mining on the Sequences of Clicking Patterns in a Grid Computing Environment", International Conference on Machine Learning and Cybernetics (ICMLC), Vol. 6, Pp. 2909-2914, 2010.
  5. Aghabozorgi, S. R. and Wah, T. Y. , "Using Incremental Fuzzy Clustering to Web Usage Mining", International Conference of Soft Computing and Pattern Recognition, Pp. 653-658, 2009.
  6. Maratea, A. and Petrosino, A. , "An Heuristic Approach to Page Recommendation in Web Usage Mining", Ninth International Conference on Intelligent Systems Design and Applications, Pp. 1043-1048, 2009.
  7. Inbarani, H. H. , Thangavel, K. and Pethalakshmi, A. , "Rough Set Based Feature Selection for Web Usage Mining", International Conference on Conference on Computational Intelligence and Multimedia Applications, Vol. 1, Pp. 33-38, 2007.
  8. Jalali, M. , Mustapha, N. , Sulaiman, N. B. and Mamat, A. , "A Web Usage Mining Approach Based on LCS Algorithm in Online Predicting Recommendation Systems", 12th International Conference Information Visualisation, Pp. 302-307, 2008.
  9. Shinde, S. K. and Kulkarni, U. V. , "A New Approach for on Line Recommender System in Web Usage Mining", International Conference on Advanced Computer Theory and Engineering, Pp. 973- 977, 2008.
  10. Zhang Huiying and Liang Wei, "An intelligent algorithm of data pre-processing in Web usage mining", Fifth World Congress on Intelligent Control and Automation, Vol. 4, 3119- 3123, 2004.
  11. Nasraoui, O. , Soliman, M. , Saka, E. , Badia, A. and Germain, R. , "A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites", IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 2, Pp. 202-215, 2008.
  12. Hogo, M. , Snorek, M. and Lingras, P. , "Temporal Web usage mining", International Conference on Web Intelligence, Pp. 450-453, 2003.
  13. DeMin Dong, "Exploration on Web Usage Mining and its Application", International Workshop on Intelligent Systems and Applications, Pp. 1-4, 2009.
  14. Yan Li, Boqin Feng and Qinjiao Mao, "Research on Path Completion Technique in Web Usage Mining", International Symposium on Computer Science and Computational Technology, Vol. 1, Pp. 554-559, 2008.
  15. Baraglia, R. and Palmerini, P. , "SUGGEST: a Web usage mining system", International Conference on Information Technology: Coding and Computing, Pp. 282-287, 2002.
  16. Jian Chen, Jian Yin, Tung, A. K. H. and Bin Liu, "Discovering Web usage patterns by mining cross-transaction association rules", International Conference on Machine Learning and Cybernetics, Vol. 5, Pp. 2655-2660, 2004.
  17. Wu, K. L. , Yu, P. S. and Ballman, A. , "SpeedTracer: A Web usage mining and analysis tool", IBM Systems Journal, Vol. 37, No. 1, Pp. 89-105, 1998.
  18. Labroche, N. , Lesot, M. J. and Yaffi, L. , "A New Web Usage Mining and Visualization Tool", 19th IEEE International Conference on Tools with Artificial Intelligence, Vol. 1, Pp. 321-328, 2007.
  19. Chu-Hui Lee and Yu-Hsiang Fu, "Web Usage Mining Based on Clustering of Browsing Features", Eighth International Conference on Intelligent Systems Design and Applications, Vol. 1, Pp. 281-286, 2008.
  20. http://archive. ics. uci. edu/ml/datasets/Anonymous+Microsoft+Web+Data
  21. http://archive. ics. uci. edu/ml/datasets/MSNBC. com+Anonymous+Web+Data
  22. N. Kushmerick , "Learning to remove internet advertisements , " In third annual Conf. on Autonomous Agents , ACM press, NY 1999.
  23. Z. Bar-Yossef and S. Rajagopalan . Template detection via data mining and its applications. In the Eleventh International World Wide Web Conference (WWW 2002). ACM press,New York, 7-11May 2002.
  24. D. Chakraborti,R. Kumar,K. Punera, "Page level template detectiojn via isotonic smoothing", in WWW'07, 2007.
  25. Cooley, R. , Mobasher, B. , and Srivastava, J. (1999). "Data preparation for mining World Wide Web browsing patterns", Knowledge and Information Systems, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Preprocessing Data Cleaning Path Completion Travel Path set Content Path Set