CFP last date
20 December 2024
Reseach Article

Analyzing Web Access Logs using Spark with Hadoop

by Vandita Jain, Tripti Saxena, Vineet Richhariya
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Number 1
Year of Publication: 2017
Authors: Vandita Jain, Tripti Saxena, Vineet Richhariya
10.5120/ijca2017915904

Vandita Jain, Tripti Saxena, Vineet Richhariya . Analyzing Web Access Logs using Spark with Hadoop. International Journal of Computer Applications. 180, 1 ( Dec 2017), 47-51. DOI=10.5120/ijca2017915904

@article{ 10.5120/ijca2017915904,
author = { Vandita Jain, Tripti Saxena, Vineet Richhariya },
title = { Analyzing Web Access Logs using Spark with Hadoop },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2017 },
volume = { 180 },
number = { 1 },
month = { Dec },
year = { 2017 },
issn = { 0975-8887 },
pages = { 47-51 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume180/number1/28768-2017915904/ },
doi = { 10.5120/ijca2017915904 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:59:28.541252+05:30
%A Vandita Jain
%A Tripti Saxena
%A Vineet Richhariya
%T Analyzing Web Access Logs using Spark with Hadoop
%J International Journal of Computer Applications
%@ 0975-8887
%V 180
%N 1
%P 47-51
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Web usage mining is a process for finding a user navigation patterns in web server access logs. These navigation patterns are further analyzed by various data minig techniques. The discovered navigation patterns can be further used for several things like identifying the frequent patterns of the user, predicting the future request of user, etc. and in the recent years there are huge growth in electronic commerce websites like flipkart, amazon, etc. with an huge amount of online shopping websites, it is necessary to notice that how many users are actually reaching to the websites. When user’s access any online website, web access logs are generated on the server. Web access logs data helps us to analyze user behavior that contain information like ip address, user name, url, timestamp, bytes transferred. It is very meaningful to analyze the web access logs which helps us in knowing the emergency trends on electronic commerce. These ecommerce websites generates petabytes of log data every day which is not possible by traditional tools and techniques to store and analyze such log data. In these paper we proposed an hadoop framework which is very reliable for storing such huge amount of data in to HDFS and than we can analyze the unstructured logs data using apache spark framework to find user behaviour. And in these paper we can also analyze the log data using mapreduce framework and finally we can compare the performance on spark and mapreduce framework on analyzing the log data.

References
  1. Dr.S.Suguna, M.Vithya, J.I.Christy Eunaicy, “Big Data Analysis in E-commerce System Using HadoopMapReduce” in 2016 IEEE.
  2. Rahul Kumar Chawda, Dr. Ghanshyam Thakur, “Big Data and Advanced Analytics Tools”, 2016 Symposium on Colossal Data Analysis and Networking (CDAN), IEEE 2016, ISSN: 978-1-5090-0669-4/16.
  3. G.S.Katkar, A.D.Kasliwal, “Use of Log Data for Predictive Analytics through Data Mining”, Current Trends in Technology and Science, ISSN: 2279-0535. Volume: 3, Issue: 3(Apr-May 2014).
  4. Savitha K, Vijaya MS, “Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies”, IJACSA, Vol. 5, 2014.
  5. McKinsey, Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011, http: //www.mckinsey.com/.
  6. White Paper Big Data Analytics Extract, Transform, and Load Big Data with Apache Hadoop-Intel corporation.
  7. Qureshi, S. R., & Gupta, A, “Towards efficient Big Data and data analytics: A review”, IEEE International Conference on IT in Business, Industry and Government (CSIBIG),March 2014 pp-1-6.
  8. http://searchbusinessanalytics.techtarget.com/definition/Hadoop-Distributed-File-System-HDFS.
  9. Michael G. Noll, Applied Research, Big Data, Distributed Systems, Open Source, "Running Hadoop on Ubuntu Linux (Single-Node Cluster)", [online], available at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
  10. Chuck Lam, “Hadoop in Action”, Manning Publications.
  11. Harish Kumar B T, Dr. Vibha L, Dr. Venugopal K R, “Web Page Access Prediction Using Hierarchical Clustering Based on Modified Levenshtein Distance and Higher Order Markov Model” in 2016 IEEE Region 10 Symposium (TENSYMP), Bali, Indonesia
  12. M.Santhanakumar and C.Christopher Columbus, “Web Usage Analysis of Web pages UsingRapidminer”, WSEAS Transactions on computers, EISSN: 2224-2872, vol.3, May 2015.
  13. Shaily G.Langhnoja ,MehulP.Barot and DarshakB.Mehta, “Web Usage Mining Using Association Rule Mining on Clustered Data for Pattern Discovery “,International Journal of Data Mining Techniques and Applications, vol.2 ,Issue.1, June,2013
Index Terms

Computer Science
Information Sciences

Keywords

Hadoop HDFS Mapreduce Log analysis spark user behaviour.