Analyzing Web Access Logs using Spark with Hadoop

Vandita Jain; Tripti Saxena; Vineet Richhariya

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Data Mining using Modified GFMM Neural Network

April

2015

Monitoring System using GSM

May

2015

ON Tiling Patterns Involving Islamic Stars with an Odd Number of Vertices

March

2013

Design and Implementation of Scalable, Fully Distributed Web Crawler for a Web Search Engine

February

2011

Reseach Article

Analyzing Web Access Logs using Spark with Hadoop

by Vandita Jain, Tripti Saxena, Vineet Richhariya

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 180 - Number 1

Year of Publication: 2017

Authors: Vandita Jain, Tripti Saxena, Vineet Richhariya

10.5120/ijca2017915904

Vandita Jain, Tripti Saxena, Vineet Richhariya . Analyzing Web Access Logs using Spark with Hadoop. International Journal of Computer Applications. 180, 1 ( Dec 2017), 47-51. DOI=10.5120/ijca2017915904

@article{ 10.5120/ijca2017915904,

author = { Vandita Jain, Tripti Saxena, Vineet Richhariya },

title = { Analyzing Web Access Logs using Spark with Hadoop },

journal = { International Journal of Computer Applications },

issue_date = { Dec 2017 },

volume = { 180 },

number = { 1 },

month = { Dec },

year = { 2017 },

issn = { 0975-8887 },

pages = { 47-51 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume180/number1/28768-2017915904/ },

doi = { 10.5120/ijca2017915904 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:59:28.541252+05:30

%A Vandita Jain

%A Tripti Saxena

%A Vineet Richhariya

%T Analyzing Web Access Logs using Spark with Hadoop

%J International Journal of Computer Applications

%@ 0975-8887

%V 180

%N 1

%P 47-51

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Web usage mining is a process for finding a user navigation patterns in web server access logs. These navigation patterns are further analyzed by various data minig techniques. The discovered navigation patterns can be further used for several things like identifying the frequent patterns of the user, predicting the future request of user, etc. and in the recent years there are huge growth in electronic commerce websites like flipkart, amazon, etc. with an huge amount of online shopping websites, it is necessary to notice that how many users are actually reaching to the websites. When user’s access any online website, web access logs are generated on the server. Web access logs data helps us to analyze user behavior that contain information like ip address, user name, url, timestamp, bytes transferred. It is very meaningful to analyze the web access logs which helps us in knowing the emergency trends on electronic commerce. These ecommerce websites generates petabytes of log data every day which is not possible by traditional tools and techniques to store and analyze such log data. In these paper we proposed an hadoop framework which is very reliable for storing such huge amount of data in to HDFS and than we can analyze the unstructured logs data using apache spark framework to find user behaviour. And in these paper we can also analyze the log data using mapreduce framework and finally we can compare the performance on spark and mapreduce framework on analyzing the log data.

References

Dr.S.Suguna, M.Vithya, J.I.Christy Eunaicy, “Big Data Analysis in E-commerce System Using HadoopMapReduce” in 2016 IEEE.
Rahul Kumar Chawda, Dr. Ghanshyam Thakur, “Big Data and Advanced Analytics Tools”, 2016 Symposium on Colossal Data Analysis and Networking (CDAN), IEEE 2016, ISSN: 978-1-5090-0669-4/16.
G.S.Katkar, A.D.Kasliwal, “Use of Log Data for Predictive Analytics through Data Mining”, Current Trends in Technology and Science, ISSN: 2279-0535. Volume: 3, Issue: 3(Apr-May 2014).
Savitha K, Vijaya MS, “Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies”, IJACSA, Vol. 5, 2014.
McKinsey, Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011, http: //www.mckinsey.com/.
White Paper Big Data Analytics Extract, Transform, and Load Big Data with Apache Hadoop-Intel corporation.
Qureshi, S. R., & Gupta, A, “Towards efficient Big Data and data analytics: A review”, IEEE International Conference on IT in Business, Industry and Government (CSIBIG),March 2014 pp-1-6.
http://searchbusinessanalytics.techtarget.com/definition/Hadoop-Distributed-File-System-HDFS.
Michael G. Noll, Applied Research, Big Data, Distributed Systems, Open Source, "Running Hadoop on Ubuntu Linux (Single-Node Cluster)", [online], available at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Chuck Lam, “Hadoop in Action”, Manning Publications.
Harish Kumar B T, Dr. Vibha L, Dr. Venugopal K R, “Web Page Access Prediction Using Hierarchical Clustering Based on Modified Levenshtein Distance and Higher Order Markov Model” in 2016 IEEE Region 10 Symposium (TENSYMP), Bali, Indonesia
M.Santhanakumar and C.Christopher Columbus, “Web Usage Analysis of Web pages UsingRapidminer”, WSEAS Transactions on computers, EISSN: 2224-2872, vol.3, May 2015.
Shaily G.Langhnoja ,MehulP.Barot and DarshakB.Mehta, “Web Usage Mining Using Association Rule Mining on Clustered Data for Pattern Discovery “,International Journal of Data Mining Techniques and Applications, vol.2 ,Issue.1, June,2013

Index Terms

Computer Science

Information Sciences

Keywords

Hadoop HDFS Mapreduce Log analysis spark user behaviour.