International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 62 - Number 1 |
Year of Publication: 2013 |
Authors: Faritha Banu. A, Chandrasekar. C |
10.5120/10043-4627 |
Faritha Banu. A, Chandrasekar. C . An Optimized Approach of Modified BAT Algorithm to Record Deduplication. International Journal of Computer Applications. 62, 1 ( January 2013), 10-15. DOI=10.5120/10043-4627
The task of recognizing, in a data warehouse, records that pass on to the identical real world entity despite misspelling words, kinds, special writing styles or even unusual schema versions or data types is called as the record deduplication. In existing research they offered a genetic programming (GP) approach to record deduplication. Their approach combines several different parts of substantiation extracted from the data content to generate a deduplication purpose that is capable to recognize whether two or more entries in a depository are duplications or not. Because record deduplication is a time intense task even for undersized repositories, their aspire is to promote a method that discovers a proper arrangement of the best pieces of confirmation, consequently compliant a deduplication function that maximizes performance using a small representative portion of the corresponding data for preparation purposes also the optimization of process is less. Our research deals these issues with a novel technique called modified bat algorithm for record duplication. The incentive behind is to generate a flexible and effective method that employs Data Mining algorithms. The structure distributes many similarities with evolutionary computation techniques such as Genetic programming approach. This scheme is initialized with an inhabitant of random solutions and explores for optima by updating bat inventions. Nevertheless, disparate GP, modified bat has no development operators such as crossover and mutation. We also compare the proposed algorithm with other existing algorithms, including GP from the experimental results.