International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 175 - Number 16 |
Year of Publication: 2020 |
Authors: Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz |
10.5120/ijca2020920670 |
Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz . A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream. International Journal of Computer Applications. 175, 16 ( Sep 2020), 48-55. DOI=10.5120/ijca2020920670
Frequent itemset mining algorithms are one of the most interesting research issues in recent years. They play an important role in finding association rules from a continuous massive data stream such as: customer behavior tracking, retail sales, network monitoring, etc. In this paper, a novel approach will be introduced to remove some drawbacks in parallel FP-Growth and enable it to handle the data stream. The proposed algorithm DSW-BPGFP (Dynamic Sliding Window - Balanced Parallel Graph Frequent Pattern) will improve space and time required based on a compact data structure, called FP-Graph to maintain and store dynamic sliding window transactions. The algorithm dynamically reconstructs and compresses directed graph data structure to control the amount of space usage, and the size of dynamic window will be adjusted by the concept change detection. Moreover, DSW-BPGFP will distribute loads between Hadoop cluster nodes equally, by introducing load balancing strategy. The experiments show that the proposed algorithm can achieve a good speedup, a good degree of balance between nodes and efficiently process large dynamic datasets. In addition, it achieves improvement in memory consumption to store frequent patterns and in time complexity.