CFP last date
20 January 2025
Reseach Article

News Feed Processing and Analysis using Hadoop Framework

Published on June 2016 by Pankaj Karpe, Vijay Bhor, Chetana Agarwal
National Conference on Advances in Computing, Communication and Networking
Foundation of Computer Science USA
ACCNET2016 - Number 1
June 2016
Authors: Pankaj Karpe, Vijay Bhor, Chetana Agarwal
4c971cfb-7ce0-4c36-a1bb-784cdc85973f

Pankaj Karpe, Vijay Bhor, Chetana Agarwal . News Feed Processing and Analysis using Hadoop Framework. National Conference on Advances in Computing, Communication and Networking. ACCNET2016, 1 (June 2016), 16-18.

@article{
author = { Pankaj Karpe, Vijay Bhor, Chetana Agarwal },
title = { News Feed Processing and Analysis using Hadoop Framework },
journal = { National Conference on Advances in Computing, Communication and Networking },
issue_date = { June 2016 },
volume = { ACCNET2016 },
number = { 1 },
month = { June },
year = { 2016 },
issn = 0975-8887,
pages = { 16-18 },
numpages = 3,
url = { /proceedings/accnet2016/number1/24970-2256/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advances in Computing, Communication and Networking
%A Pankaj Karpe
%A Vijay Bhor
%A Chetana Agarwal
%T News Feed Processing and Analysis using Hadoop Framework
%J National Conference on Advances in Computing, Communication and Networking
%@ 0975-8887
%V ACCNET2016
%N 1
%P 16-18
%D 2016
%I International Journal of Computer Applications
Abstract

This paper presents News Feed Processing and Analysis using Hadoop that automatically group's news related to the same topics published in different newspapers on different days according to geographical regions i. e. spatial analysis. Grouping the titles of the news feeds selected by the user, it is possible to identify sets of related news on the basis of syntactic and lexical similarity. The user may tune some parameters in order to improve the grouping results. The exploration is performed on the data collected with Indian Media Monitor (IMM), a system which monitors over 2500 online sources and processes 90,000 articles per day. By analyzing the news feeds, we want to find out which topics are important in different countries. In the special description of the news feeds, every article can be represented by two geographic attributes, the news origin and the location of the event itself. In order to assess these spatial properties of news articles, we conducted our geo-analysis, which is able to cope with the size and spatial distribution of the data. Within this application framework, we show opportunities how real-time news feed data can be analyzed efficiently.

References
  1. Data Mining with Big Data , Xindong Wu, Fellow, IEEE, Xingquan Zhu, Senior Member, IEEE, Gong-Qing Wu, and Wei Ding, Senior Member, IEEE –January 2014.
  2. Big Data Analysis Using HACE Theorem, Deepak S. Tamhane, Sultana N. Sayyad – January 2015.
  3. A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Williamson et al.
  4. , pages 271–280.
  5. S. Bergamaschi, F. Guerra, M. Orsini, and C. Sartori. Extracting relevant attribute values for improved search. IEEE Internet Computing, pages 26–35, Sep-Oct 2007.
  6. Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135.
  7. Wanner, F. , Rohrdantz, C. , Mansmann, F. , Oelke, D. , and Keim, D. A. (2009). Visual sentiment analysis of rss news feeds featuring the U. S presidential election in 2008. In Workshop on Visual Interfaces to the Social and the Semantic Web (VISSW 2009).
  8. Liu, B. , Hu, M. , and Cheng, J. (2005). Opinion observer: analyzing and comparing opinions on the web. In WWW '05: Proc. 14th int. conference on World Wide Web, pages 342–351. ACM.
  9. Bak, P. , Mansmann, F. , Janetzko, H. , and Keim, D. A. (2009b). Density equalizing distortion of large geographic point sets. In J. of Cartographic and Geographic Information Science, volume 36(3).
  10. Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages. " -"MapReduce: Simplified Data Processing on Large Clusters", by Jeffrey Dean and Sanjay Ghemawat;
  11. DavidDeWitt; Michael Stonebraker. "MapReduce: A major step backwards". craig-henderson. blogspot. com. Retrieved 2008-08-2
  12. "News Analytics on www. eventstudytools. com". Newsanalytics. net. Retrieved 2015-07-26
Index Terms

Computer Science
Information Sciences

Keywords

News Feed Analysis Spatiotemporal Analysis Hdfs Map Reduce Data Mining Heterogeneity.