International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 175 - Number 31 |
Year of Publication: 2020 |
Authors: Arthav Mane, Janhavi Bhopale, Ria Motghare, Priya Chimurkar |
10.5120/ijca2020920867 |
Arthav Mane, Janhavi Bhopale, Ria Motghare, Priya Chimurkar . An Overview of Speaker Recognition and Implementation of Speaker Diarization with Transcription. International Journal of Computer Applications. 175, 31 ( Nov 2020), 1-6. DOI=10.5120/ijca2020920867
This paper presents an overview of the generic process of a speaker recognition system and an implementation of its usage in a speaker diarization process. The motivation behind this paper is to present a simple implementation of a speaker diarization system that inculcates the usage of speaker recognition, speech segmentation and speech transcription. On the basis of various speech features such as Mel Frequency Cepstral Coefficients (MFCCs), Joint Factor Analysis (JFA), i-vectors, Probabilistic Linear Discriminant Analysis (PLDA), etc., speaker modelling is done to train Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs) and to use clustering. Speaker diarization is then implemented to get speakers speech segments which are then converted into text for the user. The methods discussed, and thus implemented, emphasize on maximum identification rate and minimal error in order to develop the functionality of speaker diarization and audio transcription and are aimed at helping the user to create a manuscript of the conversations that take place between multiple people.