Text-to-Speech Recognition using Google API

Orlunwo Placida Orochi; Ledisi Giok Kabari

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Text-to-Speech Recognition using Google API

by Orlunwo Placida Orochi, Ledisi Giok Kabari

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 183 - Number 15

Year of Publication: 2021

Authors: Orlunwo Placida Orochi, Ledisi Giok Kabari

10.5120/ijca2021921474

Orlunwo Placida Orochi, Ledisi Giok Kabari . Text-to-Speech Recognition using Google API. International Journal of Computer Applications. 183, 15 ( Jul 2021), 18-20. DOI=10.5120/ijca2021921474

@article{ 10.5120/ijca2021921474,

author = { Orlunwo Placida Orochi, Ledisi Giok Kabari },

title = { Text-to-Speech Recognition using Google API },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2021 },

volume = { 183 },

number = { 15 },

month = { Jul },

year = { 2021 },

issn = { 0975-8887 },

pages = { 18-20 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume183/number15/32002-2021921474/ },

doi = { 10.5120/ijca2021921474 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:16:52.783720+05:30

%A Orlunwo Placida Orochi

%A Ledisi Giok Kabari

%T Text-to-Speech Recognition using Google API

%J International Journal of Computer Applications

%@ 0975-8887

%V 183

%N 15

%P 18-20

%D 2021

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech is the most natural mode of human communication. To enable machines to understand human speech, computers can act as an intermediary for human experts, allowing them to respond accurately and reliably to human voices.This can be accomplished by a text-to-speech recognition device, which allows a data processor to accurately interpret the language in which a message was written and translate it to an audio file that can be heard through a sound medium such as a speaker. The aim of the study is to use the Python programming language to introduce a text-to-speech model to see whether the messages written are read. Using Google API, text-to-speech conversion was successful.

References

Aditya Amberkar, Gaurav Deshmukh, ParikshitAwasarmol, Piyush Dave, “Speech Recognition using RecurrentNeural Networks, IEEE.
Arpita Gupta and Akshay Joshi. (2018). Speech Recognitionusing Artificial NeuralNetwork, IEEE.
Ashwin Nair Anil Kumar, Senthil Arumugam Muthukumaraswamy. (2017). Text dependent voice recognition system using MFCC and VQ for security applications, International conference of Electronics, Communication and Aerospace Technology (ICECA), Volume 2, pp.130-136.
JiPibil, Anna Pibilov, JindichMatouek. (2016). Comparison of one and two-level architecture of the GMM-based speaker age classifier”, 39th International Conference on Telecommunications and Signal Processing (TSP), pp.299- 302.
Ledisi G. Kabari, Marcus B. Chigoziri. (2019). Speech Recognition Using MATLAB and Cross-Correlation Technique. EJERS, European Journal of Engineering Research and Science Vol. 4, No. 8.
Manjutha M, Gracy J, Subashini P, Krishnaveni M. (2017). Automated Speech Recognition System – A Literature Review”,IJETA-V4I2P9.
Mohsen Sadeghi, Hossein Marvi. (2017). OptimalMFCCFeaturesExtraction by Differential Evolution Algorithm for Speaker Recognition, 3rd Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp.169-173.
MouazBezoui,AbdelmajidElmoutaouakkil, AbderrahimBenihssane. (2016). Feature extraction of some Quranic recitation using Mel-Frequency Cepstral Coefficients (MFCC), 5th International Conference on Multimedia Computing and Systems (ICMCS), pp.127-131.
R. Smith. (n.a). An Overview of the Tesseract OCR Engine", USA: Google Inc
Rania Chakroun, Leila BeltafaZouari, MondherFrikha, Ahmed Ben Hamida. (2016). Improving text-independent speaker recognition with GMM, 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp.693-696.
Rusli A. T., Ahmad M. I., Ilyas M. Z. (2018). Improving speaker verification using MFCC order, International Conference on Robotics, Automation and Sciences (ICORAS), pp.1-4, 2016.
Suhas R. Mache, Manasi R. Baheti, Namrata C. Mahender. (2015). Review on Text-To-Speech Synthesizer, International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 8, August.
Teddy Surya Gunawan, Rashida Husain, Mira Kartiwi. (2017). Development of language identification system using MFCC and vector quantization, IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA), pp.1-4.
Wenyong Lin. (2015). An improved GMM-based clustering algorithm for efficient speaker identification, 4th International Conference on Computer Science and Network Technology (ICCSNT), Volume 1, pp.1490-1493.
Ying Zhang, Mohammad Pezeshki, Phil´emonBrakel, Saizheng Zhang, C´esar Laurent Yoshua Bengio1, Aaron Courville. (2017). TowardsEnd-to-End Speech Recognition with Deep Convolutional Neural Networks, IEEE.

Index Terms

Computer Science

Information Sciences

Keywords

API Artificial Intelligence Speech Text.