International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 97 - Number 18 |
Year of Publication: 2014 |
Authors: Mitali Srivastava, Rakhi Garg, P. K. Mishra |
10.5120/17104-7737 |
Mitali Srivastava, Rakhi Garg, P. K. Mishra . Preprocessing Techniques in Web Usage Mining: A Survey. International Journal of Computer Applications. 97, 18 ( July 2014), 1-9. DOI=10.5120/17104-7737
Due to huge, unstructured and scattered amount of data available on web, it is very tough for users to get relevant information in less time. To achieve this, improvement in design of web site, personalization of contents, prefetching and caching activities are done according to user's behavior analysis. User's activities can be captured into a special file called log file. There are various types of log: Server log, Proxy server log, Client/Browser log. These log files are used by web usage mining to analyze and discover useful patterns. The process of web usage mining involves three interdependent steps: Data preprocessing, Pattern discovery and Pattern analysis. Among these steps, Data preprocessing plays a vital role because of unstructured, redundant and noisy nature of log data. To improve later phases of web usage mining like Pattern discovery and Pattern analysis several data preprocessing techniques such as Data Cleaning, User Identification, Session Identification, Path Completion etc. have been used. In this paper all these techniques are discussed in detail. Moreover these techniques are also categorized and incorporated with their advantage and disadvantage that will help scientist, researchers and academicians working in this direction.