CFP last date
20 January 2025
Call for Paper
February Edition
IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2025

Submit your paper
Know more
Reseach Article

Advanced Preprocessing Techniques used in Web Mining: A Study

by T.gopalakrishnan, M.kavya, V.s.gowthami
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 101 - Number 13
Year of Publication: 2014
Authors: T.gopalakrishnan, M.kavya, V.s.gowthami
10.5120/17747-8822

T.gopalakrishnan, M.kavya, V.s.gowthami . Advanced Preprocessing Techniques used in Web Mining: A Study. International Journal of Computer Applications. 101, 13 ( September 2014), 16-20. DOI=10.5120/17747-8822

@article{ 10.5120/17747-8822,
author = { T.gopalakrishnan, M.kavya, V.s.gowthami },
title = { Advanced Preprocessing Techniques used in Web Mining: A Study },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 101 },
number = { 13 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 16-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume101/number13/17747-8822/ },
doi = { 10.5120/17747-8822 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:31:34.511440+05:30
%A T.gopalakrishnan
%A M.kavya
%A V.s.gowthami
%T Advanced Preprocessing Techniques used in Web Mining: A Study
%J International Journal of Computer Applications
%@ 0975-8887
%V 101
%N 13
%P 16-20
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Web based applications are now increasingly becoming more popular among the users across the world. The user interactions with the applications are being tracked by the web log files that are maintained by the web server. For this purpose web usage mining (WUM) is being used. Web usage mining is the process of extracting user patterns from the web usage. In web usage mining, preprocessing plays a key role, since large amount of irrelevant information are present in the web. It is used to improve the quality and efficiency of the data. There are number of techniques available at preprocessing level of WUM. Different techniques are applied at preprocessing level such as data cleaning, data filtering, and data integration. In this paper, we present a survey on the various preprocessing techniques that have been used in order to improve the efficiency.

References
  1. Murata, T. and K. Saito (2006). Extracting User's interests from Web Log Data. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence.
  2. Pabarskaite, Z. (2002). Implementing Advanced Cleaning and End-User Interpretability Technologies in Web Log Mining. 24th Int. Conf. information Technology Interfaces /TI 2002, June 24-27, 2002, Cavtat, Croatia.
  3. Yun, L. , W. Xun, et al. (2008). A Hybrid Information Filtering Algorithm Based on Distributed Web log Mining. Third International Conference on Convergence and Hybrid Information Technology 978-0-7695-3407-7/08 © 2008 IEEE DOI 10. 1109/ICCIT. 2008. 39.
  4. Suneetha, K. R. and D. R. Krishnamoorthi (2009). "Identifying User Behavior by Analyzing Web Server Access Log File. " IJCSNS International Journal of Computer Science and Network Security, VOL. 9 No. 4, April 2009.
  5. Wahab, M. H. A. , M. N. H. Mohd, et al. (2008). Data Pre-processing on Web Server Logs for Generalized Association Rules Mining Algorithm. World Academy of Science, Engineering and Technology 48 2008.
  6. Stermsek, G. , M. Strembeck, et al. (2007). A User Profile Derivation Approach based on Log-File Analysis. IKE 2007: 258-264.
  7. Alam, S. , G. Dobbie, et al. (2008). Particle Swarm Optimization Based Clustering Of Web Usage Data. 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology 978-0-7695-3496-1/08 DOI 10. 1109/WIIAT. 2008. 292 IEEE/WIC/ACM International Conference on Web.
  8. Lu. H. and Nguyen. T. T. S. , 2009, "Experimental Investigation of PSO Based Web User Session Clustering", 2009 International Conference of Soft Computing and Pattern Recognition 978-0-7695-3879-2/09. IEEE DOI 10. 1109/SoCPaR. 2009. 127
  9. JIANG Chang-bin and Chen Li. , 2010, "Web Log Data Preprocessing Based On Collaborative Filtering", 2010 International Conference On Education Technology and Computer Science 978-0-7695-3967-4/10. © IEEE DOI 10. 1109/ETCS. 2010. 588.
  10. Tasawar Hussain, Sohail Asgar and Nayyer Masood. , 2010, " Hierarchical Sessionization At Preprocessing Level of WUM Based On Swam Intelligence", 6th International Conference on Emerging Technology (ICET) 978-1-4244-8058-6/10 . © 2010 IEEE
  11. Theint Theint Aye. , 2011," Web Log Cleaning For Mining Of Web Usage Patterns", 978-1-61284-2/11 © 2011 IEEE.
  12. Nithya. P and Dr. P. Sumathi. , 2012, " Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise and Web Robots" , 2012 National Conference on Computing and Communication Systems 978-1-4673-1953-9/12 © 2012 IEEE.
  13. Yuan, F. , L. -J. Wang, et al. (2003). Study on Data Preprocessing Algorithm in Web Log Mining. Proceedings of the Second International Conference on Machine Learning and Cybernetics, Wan, 2-5 November 2003.
  14. Castellano, G. , A. M. Fanelli, et al. (2007). LODAP: A LOg DAta Preprocessor for mining Web browsing patterns. Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, Corfu Island, Greece, February 16-19, 2007.
  15. Stermsek, G. , M. Strembeck, et al. (2007). A User Profile Derivation Approach based on Log-File Analysis. IKE 2007: 258-264.
  16. Raju. G. T. and Satyanarayana. P. S. , 2008, "Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology", IJCSNS International Journal of Computer Science and Network Security, V|OL. 8 No. 1, January 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Web usage mining log cleaning User identification sessionization