National Conference on Advances in Computing, Communication and Networking |
Foundation of Computer Science USA |
ACCNET2016 - Number 1 |
June 2016 |
Authors: Pankaj Karpe, Vijay Bhor, Chetana Agarwal |
4c971cfb-7ce0-4c36-a1bb-784cdc85973f |
Pankaj Karpe, Vijay Bhor, Chetana Agarwal . News Feed Processing and Analysis using Hadoop Framework. National Conference on Advances in Computing, Communication and Networking. ACCNET2016, 1 (June 2016), 16-18.
This paper presents News Feed Processing and Analysis using Hadoop that automatically group's news related to the same topics published in different newspapers on different days according to geographical regions i. e. spatial analysis. Grouping the titles of the news feeds selected by the user, it is possible to identify sets of related news on the basis of syntactic and lexical similarity. The user may tune some parameters in order to improve the grouping results. The exploration is performed on the data collected with Indian Media Monitor (IMM), a system which monitors over 2500 online sources and processes 90,000 articles per day. By analyzing the news feeds, we want to find out which topics are important in different countries. In the special description of the news feeds, every article can be represented by two geographic attributes, the news origin and the location of the event itself. In order to assess these spatial properties of news articles, we conducted our geo-analysis, which is able to cope with the size and spatial distribution of the data. Within this application framework, we show opportunities how real-time news feed data can be analyzed efficiently.