International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 62 - Number 10 |
Year of Publication: 2013 |
Authors: Ali El-matarawy, Mohammad El-ramly, Reem Bahgat |
10.5120/10118-4792 |
Ali El-matarawy, Mohammad El-ramly, Reem Bahgat . Parallel and Distributed Code Clone Detection using Sequential Pattern Mining. International Journal of Computer Applications. 62, 10 ( January 2013), 25-31. DOI=10.5120/10118-4792
This research presents a parallel and distributed data mining approach to code clone detection. It aims to prove the value and importance of deploying parallel and distributed computing for real-time large scale code clone detection. It is implemented this approach in a family of clone detectors, called PD EgyCD (Parallel and Distributed Egypt Clone Detector). In this approach, This research builds on an earlier work of the authors for code clone and plagiarism detection using sequential pattern mining by adding parallelism and distribution to our earlier tool EgyCD. Our approach uses data mining through a tailored Apriori-based algorithm for code clone detection. And it uses parallelization and distribution to achieve excellent performance to scale up to clone detection on very large systems. This approach has been implemented as a database application which leverages the capabilities of modern database tools. Two versions have been developed of this distributed technique. The first one uses client-server technique in which all clients and the server deal with only one database. The second one uses agents where each client acts as a separate agent and has its own database and after working on a sub-problem, it submits its partial solution to the server to finally get the complete solution (set of code clones). Experiments show that agents technique is faster than client-server one. Distribution enhances performance very much. Speed improvement is a function of the number of clients/agents used. Our conclusion is that data mining, combined with parallel and distributed computing, can efficiently be deployed for code clone detection of very large systems.