We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Improved Preprocessing techniques for Analyzing Patterns in Web Personalization Process

by R. Gobinath, M. Hemalatha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 58 - Number 3
Year of Publication: 2012
Authors: R. Gobinath, M. Hemalatha
10.5120/9261-3438

R. Gobinath, M. Hemalatha . Improved Preprocessing techniques for Analyzing Patterns in Web Personalization Process. International Journal of Computer Applications. 58, 3 ( November 2012), 13-20. DOI=10.5120/9261-3438

@article{ 10.5120/9261-3438,
author = { R. Gobinath, M. Hemalatha },
title = { Improved Preprocessing techniques for Analyzing Patterns in Web Personalization Process },
journal = { International Journal of Computer Applications },
issue_date = { November 2012 },
volume = { 58 },
number = { 3 },
month = { November },
year = { 2012 },
issn = { 0975-8887 },
pages = { 13-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume58/number3/9261-3438/ },
doi = { 10.5120/9261-3438 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:03:20.298061+05:30
%A R. Gobinath
%A M. Hemalatha
%T Improved Preprocessing techniques for Analyzing Patterns in Web Personalization Process
%J International Journal of Computer Applications
%@ 0975-8887
%V 58
%N 3
%P 13-20
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data preprocessing plays a vital role in Data Mining. In this paper we have adopted the concept of web based mining for cleansing the web server log files. Web mining extracts useful information of hypertext documents. Once a user access the web pages /sites their information are recorded in a file as an entry called log file. The web server log files are used for mining several useful patterns to analyze the access behavior of the user. Before performing the mining process the raw data has to be preprocessed in order to improve the quality of data to be mined. This paper discusses about the significance of data preprocessing methods and various steps involved in getting the required content successfully. An entire preprocessing technique is being planned to preprocess the web log for extraction of user patterns. Data cleansing algorithm is applied to eliminate the extraneous entries from web log at the same time filtering algorithm is used to discard the impassive attributes from log file. The outlier are detected and removed from the dataset. The User and sessions are identified. The performance of the data cleansing process was evaluated by adapting the wrapper approach in which the resultant cleaned dataset are clustered using five different clustering algorithms namely Farthest First, K-means, COBWEB, make density based algorithm and Expectation maximization algorithm to identify the quality of web log data

References
  1. Pirolli, P. , Pitkow, J. , and Rao, R. 1996. Silk from a Sow's Ear: Extracting Usable Structures from the Web. In Proceedings on Human Factors in Computing Systems, ACM Press, pp. 118-125
  2. Yan LI, Boqin FENG and Qinjiao MAO. 2008. Research on Path Completion Technique in Web Usage Mining. IEEE International Symposium On Computer Science and Computational Technology, pp. 554-559.
  3. Hussain, T. , S. Asghar, et al. 2010. Web Usage Mining: A Survey on Preprocessing of Web Log File. IEEE, International Conference on (ICIET),pp. 1 - 6
  4. Doru Tanasa and Brigitte Trousse. 2004. Advanced Data Preprocessing for Intersites Web Usage Mining. Published by the IEEE Computer Society, pp. 59-65.
  5. Huiping Peng. 2010. Discovery of Interesting Association Rules Based On Web Usage Mining. IEEE conference, pp. 272-275.
  6. Ling Zheng, Hui GUI and Feng Li. 2010. Optimized Data Preprocessing Technology For Web Log Mining. IEEE International Conference On Computer Design and Applications ( ICCDA ), pp. 19-21.
  7. JING Chang-bin and Chen Li. 2010. Web Log Data Preprocessing Based On Collaborative Filtering. IEEE 2nd International Workshop on Education Technology and Computer Science, pp. 118-121.
  8. Zaiane, Web, O. , R. 2001. Usage mining for a better web-based learning environment, proceeding of Conference on Advanced Technology for Education, pp. 450-455.
  9. Leticia dos Santos Machado, Karin Becker. 2003. Distance Education: a Web Usage Mining Case Study for the Evaluation of Learning Sites,In Proceedings of ICALT,pp. 360-361
  10. Juan Velasquez. , Hiroshi Yasuda and Terumasa Aoki. 2003. Combining the web content and usage mining to understand the visitor behavior in a web site. In proceeding of: Data Mining, Third IEEE International Conference.
  11. Carlos G. Marquardt, Karin Becker Duncan D. Ruiz, 2004. A Preprocessing Tool for Web Usage Mining in the Distance Education Domain, Proceedings of the International Database Engineering and Applications Symposium, pp. 78 - 87
  12. Xin Jin, Yanzan Zhou, Bamshad Mobasher. 2004. Web Usage Mining Based on Probabilistic LATENT Semantic Analysis, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 197-205
  13. Sanjoy Dasgupta , 2005. Performance guarantees for hierarchical clustering, Journal of Computer and System Sciences - Special issue on COLT 2002: 70 (4) PP. 555-569.
  14. Hochbaum and Shmoys, 1985. A Best Possible Heuristic for the k-center Problem, Mathematics of Operations Research: 10 (2) PP. 180-184.
  15. Hewijin Christine Jiau. , Yi-Jen Su. ,Yeou-Min Lin and Shang-Rong Tsai,,2006. "MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data", J Intell Inf Syst (2006) 26: pp. 185–207.
  16. Cheng, Y. , &, Fu, K. (1985). Conceptual clustering in knowledge organization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7, pp. 592-598.
  17. Shantakumar B. Patil. , Y. S. Kumaraswamy. ,2009 "Warehouses for Heart Attack Prediction",International Journal of Computer Science and Network Security, 9(2¬),pp. 228-235
  18. Cooley, R. , Mobasher, B. , and Srivastava, j. 1999. Data preparation for mining World Wide Web browsing patterns, journal of knowledge and Information Systems, 1 (1).
  19. Buchner, A. And Mulvenna, M. D. 1999. Discovering Internet marketing intelligence through online analytical Web usage mining, SIGMOD Record. 4(27). pp. 27-35.
  20. B. Mobasher, R. Cooley, J. Srivastava. 2000,Automatic Personalization Based on Web Usage Mining, Communications of the ACM, 43(8). PP, 142-151.
  21. Ling Zheng, Hui Gui and Feng Li, ,2010 " Optimized Data Preprocessing Technology For Web Log Mining", IEEE International Conference On Computer Design and Applications( ICCDA ), pp. 19-21.
Index Terms

Computer Science
Information Sciences

Keywords

Web Mining Field extraction Data cleansing User identification Session identification Server Log files