| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 103 |
| Year of Publication: 2026 |
| Authors: Saiteja Jonnalagadda |
10.5120/ijcaea79a3291b37
|
Saiteja Jonnalagadda . Generative AI for Synthetic Patient Data Generation to Enhance Identity Matching and Deduplication Models. International Journal of Computer Applications. 187, 103 ( May 2026), 32-38. DOI=10.5120/ijcaea79a3291b37
The paper examines how the concept of Generative Artificial Intelligence can be utilized to tackle the important problem of patient identity matching and deduplication in healthcare informatics, through the use of Generative Adversarial Networks and Variational Autoencoders. The privacy limitations and the fragmentation of data tend to complicate the creation of the effective record linkage algorithms. To circumvent this limitation, the study employs a synthetic data generation framework that generates patient records of high fidelity that are reflective of the statistical characteristics of real-world clinical datasets. The experiment uses the Synthea simulator of patient population and Python-based GAN libraries to generate a specialized data sample of 389 data cases. Such cases include demographic factors, longitudinal medical records, and deliberate clerical mistakes like phonetic misspellings and reversed numbers. The effectiveness is assessed by training deduplication models on this artificially augmented data as a measure of the enhancement of accuracy and recall of similar entries in different systems. The software products are TensorFlow to construct the model architecture, RecordLinkage toolkits to match and Pandas to manipulate data. Findings show that the generative models can represent the peculiarities of human error and increase the sensitivity of the deduplication models by a significant margin, without violating patient privacy. This study shows that in contemporary electronic health record settings, synthetic data is an effective tool for optimizing identity resolution mechanisms.