International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 58 - Number 3 |
Year of Publication: 2012 |
Authors: R. Gobinath, M. Hemalatha |
10.5120/9261-3438 |
R. Gobinath, M. Hemalatha . Improved Preprocessing techniques for Analyzing Patterns in Web Personalization Process. International Journal of Computer Applications. 58, 3 ( November 2012), 13-20. DOI=10.5120/9261-3438
Data preprocessing plays a vital role in Data Mining. In this paper we have adopted the concept of web based mining for cleansing the web server log files. Web mining extracts useful information of hypertext documents. Once a user access the web pages /sites their information are recorded in a file as an entry called log file. The web server log files are used for mining several useful patterns to analyze the access behavior of the user. Before performing the mining process the raw data has to be preprocessed in order to improve the quality of data to be mined. This paper discusses about the significance of data preprocessing methods and various steps involved in getting the required content successfully. An entire preprocessing technique is being planned to preprocess the web log for extraction of user patterns. Data cleansing algorithm is applied to eliminate the extraneous entries from web log at the same time filtering algorithm is used to discard the impassive attributes from log file. The outlier are detected and removed from the dataset. The User and sessions are identified. The performance of the data cleansing process was evaluated by adapting the wrapper approach in which the resultant cleaned dataset are clustered using five different clustering algorithms namely Farthest First, K-means, COBWEB, make density based algorithm and Expectation maximization algorithm to identify the quality of web log data