CFP last date
20 February 2025
Reseach Article

Innovative Technique for Audio Segmentation

Published on April 2012 by Borawake Madhuri P
Emerging Trends in Computer Science and Information Technology (ETCSIT2012)
Foundation of Computer Science USA
ETCSIT - Number 4
April 2012
Authors: Borawake Madhuri P

Borawake Madhuri P . Innovative Technique for Audio Segmentation. Emerging Trends in Computer Science and Information Technology (ETCSIT2012). ETCSIT, 4 (April 2012), 27-30.

@article{
author = { Borawake Madhuri P },
title = { Innovative Technique for Audio Segmentation },
journal = { Emerging Trends in Computer Science and Information Technology (ETCSIT2012) },
issue_date = { April 2012 },
volume = { ETCSIT },
number = { 4 },
month = { April },
year = { 2012 },
issn = 0975-8887,
pages = { 27-30 },
numpages = 4,
url = { /proceedings/etcsit/number4/5988-1031/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Emerging Trends in Computer Science and Information Technology (ETCSIT2012)
%A Borawake Madhuri P
%T Innovative Technique for Audio Segmentation
%J Emerging Trends in Computer Science and Information Technology (ETCSIT2012)
%@ 0975-8887
%V ETCSIT
%N 4
%P 27-30
%D 2012
%I International Journal of Computer Applications
Abstract

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of processing. Speech segmentation is an important sub problem of speech recognition, and cannot be adequately solved in isolation. The lowest level of speech segmentation is the breakup and classification of the sound signal into a string of phones. The difficulty of this problem is compounded by the phenomenon of co-articulation of speech sounds, where one may be modified in various ways by the adjacent sounds: it may blend smoothly with them, fuse with them, split, or even disappear. This phenomenon may happen between adjacent words just as easily as within a single word. The notion that speech is produced like writing, as a sequence of distinct vowels and consonants. In fact, the way we produce vowels depends on the surrounding consonants and the way we produce consonants depends on the surrounding vowels. Therefore, even with the best algorithms, the result of phonetic segmentation will usually be very distant from the standard written language.

References
  1. Boreczky, J. S. and Wilcox, L. D. : A hidden Markov model framework for video segmentation using audio and image features, in Proceedings of ICASSP'98 pp. 3741-3744, Seattle, May 1998.
  2. Foote, J. : Content-based retrieval of music and audio ,in Proceedings of SPIE'97, Dallas, 1997.
  3. Ghias, A. , Logan, J. and Chamberlin, D. : Query by humming - musical information retrieval in an audio database, in Proceedings of ACM Multimedia Conference, ~~~231-235, 1995. [ 4] Kimber, D. and Wilcox, L. : Acoustic segmentation for audio browsers, in Proceedings of Interface Conference, Sydney, Australia, July 1996.
  4. Liu, Z. , Huang, J. , Wang, Y. et al. : Audio feature extraction and analysis for scene classification, in Proceedings of IEEE 1st Multimedia Workshop, 1997.
  5. Naphade, M. R. , Kristjansson, T. , Frey, B. et al. Probabilistic multimedia objects (MULTIJECTS): a novel approach to video indexing and retrieval in multimedia systems, in Proceedings of IEEE Conference
  6. Patel, N. and Sethi, I. : Audio characterization for video indexing, in Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases, ~01. 2670, ~~~373-384, San Jose, 1996.
  7. Saunders, J. : Real-time discrimination of broadcast speech/music, in Proceedings of ICASSP'96.
  8. Scheirer, E. and Slaney, M. : Construction and evaluation of a robust multifeature speech/music discriminator, in Proceedings of ICASSP'97, Munich, Germany, Apr. 1997. [1o] Wold, E. , Blum, T. and Keislar, D. et al. : Contentbased classification, search, and retrieval of audio, IEEE Multimedia, pp. 27-36, Fall, 1996. [ll] Wyse, L. and Smoliar, S. : Toward content-based audio indexing and retrieval and a new speaker discrimination technique, in http://www. iss. nus. sg/People/lwyse/lwyse. html, Dec.
  9. Haykin Simon," Communication System " , Fourth Edition, Wiley Student Edition(2001), Reprint (2005)
  10. Taub Herbert, Donald Schilling," Principles Of Communication Systems", Second Edition, Tata Mcgraw-Hill (1991) Edition, Thirty Second Reprint(2005)
  11. Hayes Monson, "Digital Signal Processing", Third Edition, Tata McGraw-Hill Edition, New Delhi, (2004)
  12. Ifeachor Emmanuel C. & Barrie W. Jervis, "Digital Signal Processing", Second Edition, First Indian Reprint, Pearson Education, New Delhi (2002)
  13. Ingle K. Vinay & John J. Proakis, "Digital Signal Processing Using Matlab", International Student Edition, Thomson Books, Vikas Publishing House, Fifth Reprint Bangalore (2004)
  14. Ludeman Lonnie C. , "Fundamentals of Digital Signal Processing", First Edition, Wiley publication, Singapore (1986)
  15. Proakis John G. & Dimitrise G. Manolakis, "Digital Signal Processing, Principle Algorithms & Applications", Third Edition, Sixth Indian Reprint, Pearson Education, New Delhi (2005)
  16. Duda Richard O. , Peter E. Hart, David G. Stork, "Pattern Classification", Second Edition, John Willey & Sons, Singapore (2004)
  17. "Window function", on http://en. wikipedia. org/wiki/Window_function
Index Terms

Computer Science
Information Sciences

Keywords

Audio Content Analysis Audio Database Management Audio Segmentation