International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 154 - Number 2 |
Year of Publication: 2016 |
Authors: Shivani, Vishal Goyal |
10.5120/ijca2016912041 |
Shivani, Vishal Goyal . A Novel Approach for Plagiarism Detection in English Text. International Journal of Computer Applications. 154, 2 ( Nov 2016), 32-37. DOI=10.5120/ijca2016912041
Digitalization provides text easily available on web interrelated to several academic areas. So it becomes a serious problem for academic enterprises or institutes. This paper presents Plagiarism detection system for the English language. Digital World provides text easily available on web interrelated to several academic areas. So it becomes a serious problem for academic enterprises or institutes. PD means to detect the text being copied from original sources through websites, books, journals, previously published papers, online search engines, etc. this paper have presented the development of a web-based PD system to discover the similarity in English written text only. This paper is going to discuss textual based PD on an exact string matching technique through the DB and the web. The proposed system has presented concerning principal behind the system. The proposed system supports three steps: first is Pre-processing where the splitting of the input string to sentences and stop words are removed. Second is the process of sentence searching through DB and the web. Once plagiarized sentence is already there in DB then sentence directly retrieved from DB with stored URL. If searching of the sentence is not found there in DB, then plagiarized sentence is searching throughout the web (“GOOGLE”) starts for both semantic and syntactic by using Cosine Similarity Approach. After Web search plagiarized sentence is stored in the DB. Thirdly, similarity analysis is performed for detail description about all plagiarized sentences with the URL (source address). As a result, the proposed system displays plagiarized sentences with the original source’s URL and percentage of Plagiarism within the input string.