CFP last date
20 December 2024
Reseach Article

A Novel Approach for Plagiarism Detection in English Text

by Shivani, Vishal Goyal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 154 - Number 2
Year of Publication: 2016
Authors: Shivani, Vishal Goyal
10.5120/ijca2016912041

Shivani, Vishal Goyal . A Novel Approach for Plagiarism Detection in English Text. International Journal of Computer Applications. 154, 2 ( Nov 2016), 32-37. DOI=10.5120/ijca2016912041

@article{ 10.5120/ijca2016912041,
author = { Shivani, Vishal Goyal },
title = { A Novel Approach for Plagiarism Detection in English Text },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2016 },
volume = { 154 },
number = { 2 },
month = { Nov },
year = { 2016 },
issn = { 0975-8887 },
pages = { 32-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume154/number2/26467-2016912041/ },
doi = { 10.5120/ijca2016912041 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:59:11.262945+05:30
%A Shivani
%A Vishal Goyal
%T A Novel Approach for Plagiarism Detection in English Text
%J International Journal of Computer Applications
%@ 0975-8887
%V 154
%N 2
%P 32-37
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Digitalization provides text easily available on web interrelated to several academic areas. So it becomes a serious problem for academic enterprises or institutes. This paper presents Plagiarism detection system for the English language. Digital World provides text easily available on web interrelated to several academic areas. So it becomes a serious problem for academic enterprises or institutes. PD means to detect the text being copied from original sources through websites, books, journals, previously published papers, online search engines, etc. this paper have presented the development of a web-based PD system to discover the similarity in English written text only. This paper is going to discuss textual based PD on an exact string matching technique through the DB and the web. The proposed system has presented concerning principal behind the system. The proposed system supports three steps: first is Pre-processing where the splitting of the input string to sentences and stop words are removed. Second is the process of sentence searching through DB and the web. Once plagiarized sentence is already there in DB then sentence directly retrieved from DB with stored URL. If searching of the sentence is not found there in DB, then plagiarized sentence is searching throughout the web (“GOOGLE”) starts for both semantic and syntactic by using Cosine Similarity Approach. After Web search plagiarized sentence is stored in the DB. Thirdly, similarity analysis is performed for detail description about all plagiarized sentences with the URL (source address). As a result, the proposed system displays plagiarized sentences with the original source’s URL and percentage of Plagiarism within the input string.

References
  1. EncyclopediaWikipedia,https://en.wikipedia.org/wiki/Natural_language_processing (last accessed August 10, 2016).
  2. EncyclopediaMicrosoft,https://www.microsoft.com/en-us/research/group/natural-language-processing(Last accessed August 12, 2016)
  3. Encyclopedia,mind.ilstu,http://www.mind.ilstu.edu/curriculum/protothinker/natural_language_processing.php (Last accessed August 15, 2016).
  4. EncyclopediaBritannica,http://www.britannica.com/EBchecked/topic/462640/plagiarism (Last accessed August 18, 2016).
  5. Mechti, S., Jaoua, M. B., & Belguith, L. H. (2013). L H.: A framework for Plagiarism Detection based on Author Profiling.Notebook for PAN at CLEF.
  6. Joshi, M., & Khanna, K. (2013). Plagiarism detection over the web: review. International Journal of Computer Applications, 68(15).
  7. Zechner, M., Muhr, M., Kern, R., & Granitzer, M. (2009, September).External and intrinsic plagiarism detection using vector space models. InProc. SEPLN (Vol. 32, pp. 47-55.
  8. Zhang, P. Y., & Li, C. H. (2009, August).Automatic text summarization based on sentences clustering and extraction.In Computer Science and Information Technology, 2009.ICCSIT 2009.2nd IEEE International Conference on (pp. 167-170).IEEE.
  9. Clough, P. (2003).Old and new challenges in automatic plagiarism detection.In National Plagiarism Advisory Service, 2003; http://ir.shef.ac. uk/cloughie/index.html.
  10. Shivakumar, N., & Garcia-Molina, H. (1996, April).Building a scalable and accurate copy detection mechanism.In Proceedings of the first ACM international conference on Digital libraries (pp. 160-168).ACM.
  11. Ali, A. M. E. T., Abdulla, H. M. D., & Snasel, V. (2011).Overview and Comparison of Plagiarism Detection Tools. In DATESO (pp. 161-172).
  12. Si, A., Leong, H. V., & Lau, R. W. (1997, April).Check: a document plagiarism detection system. In Proceedings of the 1997 ACM symposium on Applied computing (pp. 70-77). ACM.
  13. Tripathi, R., Tiwari, P., & Nithyanandam, K. (2015, January).Avoiding plagiarism in research through free online plagiarism tools.In Emerging Trends and Technologies in Libraries and Information Services (ETTLIS), 2015 4th International Symposium on (pp. 275-280).IEEE.
  14. Monostori, K., Zaslavsky, A. B., & Schmidt, H. W. (2000, May). MatchDetectReveal: finding overlapping and similar digital documents. InIRMA Conference (pp. 955-957).
  15. HaCohen-Kerner, Y., Tayeb, A., & Ben-Dror, N. (2010, August).Detection of simple plagiarism in computer science papers.In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 421-429).Association for Computational Linguistics.
Index Terms

Computer Science
Information Sciences

Keywords

NLP Plagiarism Detection Textual Similarity Exact string matching scheme Results analysis by comparison with other tools.