International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 48 - Number 8 |
Year of Publication: 2012 |
Authors: Surbhi Anand, Rinkle Rani Aggarwal |
10.5120/7367-0097 |
Surbhi Anand, Rinkle Rani Aggarwal . An Efficient Algorithm for Data Cleaning of Log File using File Extensions. International Journal of Computer Applications. 48, 8 ( June 2012), 13-18. DOI=10.5120/7367-0097
World Wide Web is a monolithic repository of web pages that provides the Internet users with heaps of information. With the growth in number and complexity of Websites, the size of web has become massively large. Web Usage Mining is a division of web mining that involves application of mining techniques to web server logs in order to extract the behavior of users. A Web Usage Mining process comprises of three phases: data preprocessing, patterns discovery and pattern analysis. Data preprocessing tasks are carried out former to the application of mining algorithms. Preprocessing enables to translate the unprocessed data which is composed from server log files into constructive data abstraction. The appropriate analysis of a web server log proves to be beneficiary to manage the websites efficiently from the administrative and users' prospective. Preprocessing results also strongly influences the later phases of Web Usage Mining. This makes the preprocessing of server log files a significant step in Web Usage Mining. This paper emphasizes on the Web Usage Mining process and makes an exploration in the field of data cleaning.