International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 163 - Number 5 |
Year of Publication: 2017 |
Authors: Kavita Garg, Jayshankar Prasad, Saba Hilal |
10.5120/ijca2017913526 |
Kavita Garg, Jayshankar Prasad, Saba Hilal . Study of Near Duplicate Content: Identification of Categories Generating Maximum Duplicate URL in Results. International Journal of Computer Applications. 163, 5 ( Apr 2017), 20-23. DOI=10.5120/ijca2017913526
The study of identification of near duplicate content involves identifying search categories which generate same URL in a query result. These categories are needed to be identified so that results can be improved by removing duplicate URL. Generating same URL in results irritates the user and it also decreases priority of other URL. These URL displayed on second or third page which user do not bother to open. Near duplicate content sometimes hides better results from the user and make the search results ineffective. There are many algorithms and procedures or filters to reduce the duplicity. But to reduce duplicity there is need to identify that duplicates. Which categories generate most duplicate results, in what form redundancy exists, which search engine generates these duplicate results and so on. This paper shows efforts to identify categories with maximum duplicates in term of same URL.