Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey

Norberto Torres-Reyes; Shahram Latifi

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey

by Norberto Torres-Reyes, Shahram Latifi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 182 - Number 35

Year of Publication: 2019

Authors: Norberto Torres-Reyes, Shahram Latifi

10.5120/ijca2019918334

Norberto Torres-Reyes, Shahram Latifi . Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey. International Journal of Computer Applications. 182, 35 ( Jan 2019), 27-31. DOI=10.5120/ijca2019918334

@article{ 10.5120/ijca2019918334,

author = { Norberto Torres-Reyes, Shahram Latifi },

title = { Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey },

journal = { International Journal of Computer Applications },

issue_date = { Jan 2019 },

volume = { 182 },

number = { 35 },

month = { Jan },

year = { 2019 },

issn = { 0975-8887 },

pages = { 27-31 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume182/number35/30291-2019918334/ },

doi = { 10.5120/ijca2019918334 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:13:22.311699+05:30

%A Norberto Torres-Reyes

%A Shahram Latifi

%T Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey

%J International Journal of Computer Applications

%@ 0975-8887

%V 182

%N 35

%P 27-31

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Generative adversarial networks (GAN) have become prominent in the field of machine learning. Their premise is based on a minimax game in which a generator and discriminator “compete” against each other until an optimal point is reached. The goal of the generator is to produce synthetic samples that match that of real data. The discriminator tries to classify the real data as real and the generated data as not real. Together, the generator improves to the point where the fake data and real data are identical to the discriminator. GAN has been successfully applied in the image processing field over a large range of GAN variant architectures. Although not as prominent, the audio enhancement and synthesis field has also benefitted from GAN in a variety of different forms. In this survey paper, different techniques involving GAN will be explored relative to speech synthesis, speech enhancement, music generation, and general audio synthesis. Strengths and weaknesses of GAN will be looked at including variants created to combat those weaknesses. Also, a few similar machine learning architectures will be explored that may help achieve promising results.

References

Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.
Donahue, Chris, Julian McAuley, and Miller Puckette. "Adversarial Audio Synthesis." arXiv preprint arXiv:1802.04208v2 (2018).
Yang, Shan, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, and Haizhou Li. "Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework." In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 685-691. IEEE, 2017.
Briot, Jean-Pierre, Gaëtan Hadjeres, and François Pachet. "Deep learning techniques for music generation-a survey." arXiv preprint arXiv:1709.01620 (2017).
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).
Creswell, Antonia, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A. Bharath. "Generative adversarial networks: An overview." IEEE Signal Processing Magazine 35, no. 1 (2018): 53-65.
Wang, Kunfeng, Chao Gou, Yanjie Duan, Yilun Lin, Xinhu Zheng, and Fei-Yue Wang. "Generative adversarial networks: introduction and outlook." IEEE/CAA Journal of Automatica Sinica 4, no. 4 (2017): 588-598.
Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. "Improved training of wasserstein gans." In Advances in Neural Information Processing Systems, pp. 5767-5777. 2017.
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
Karras, Tero, Timo Aila, Samuli Laine, and Jaakko Lehtinen. "Progressive growing of gans for improved quality, stability, and variation." arXiv preprint arXiv:1710.10196 (2017).
Kaneko, Takuhiro, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru Hiramatsu, and Kunio Kashino. "Generative adversarial network-based postfilter for statistical parametric speech synthesis." In Proc. ICASSP, vol. 2017, pp. 4910-4914. 2017.
Michelsanti, Daniel, and Zheng-Hua Tan. "Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification." arXiv preprint arXiv:1709.01703 (2017).
Pandey, Ashutosh, and Deliang Wang. "On adversarial training and loss functions for speech enhancement." In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5414-5418. IEEE, 2018.
Yeh, Cheng-chieh, Po-chun Hsu, Ju-chieh Chou, Hung-yi Lee, and Lin-shan Lee. "Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences." arXiv preprint arXiv:1808.03113 (2018).
Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014).
Zhao, Yi, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, and Nobuaki Minematsu. "Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder." IEEE Access (2018).
Pascual, Santiago, Antonio Bonafonte, and Joan Serra. "SEGAN: Speech enhancement generative adversarial network." arXiv preprint arXiv:1703.09452 (2017).
Dong, Hao-Wen, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. "MuseGAN: Symbolic-domain music generation and accompaniment with multi-track sequential generative adversarial networks." arXiv preprint arXiv:1709.06298 (2017).
Bojanowski, Piotr, Armand Joulin, David Lopez-Paz, and Arthur Szlam. "Optimizing the latent space of generative networks." arXiv preprint arXiv:1707.05776 (2017).
Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. "Visually indicated sounds." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2405-2413. 2016.

Index Terms

Computer Science

Information Sciences

Keywords

Audio synthesis generative adversarial networks survey enhancement.