CFP last date
20 January 2025
Reseach Article

A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve

by Hamidreza Saberkari, Mousa Shamsi, Hamed Heravi, Mohammad Hossein Sedaaghi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 67 - Number 17
Year of Publication: 2013
Authors: Hamidreza Saberkari, Mousa Shamsi, Hamed Heravi, Mohammad Hossein Sedaaghi
10.5120/11489-7194

Hamidreza Saberkari, Mousa Shamsi, Hamed Heravi, Mohammad Hossein Sedaaghi . A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve. International Journal of Computer Applications. 67, 17 ( April 2013), 25-38. DOI=10.5120/11489-7194

@article{ 10.5120/11489-7194,
author = { Hamidreza Saberkari, Mousa Shamsi, Hamed Heravi, Mohammad Hossein Sedaaghi },
title = { A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 67 },
number = { 17 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 25-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume67/number17/11489-7194/ },
doi = { 10.5120/11489-7194 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:25:43.431269+05:30
%A Hamidreza Saberkari
%A Mousa Shamsi
%A Hamed Heravi
%A Mohammad Hossein Sedaaghi
%T A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve
%J International Journal of Computer Applications
%@ 0975-8887
%V 67
%N 17
%P 25-38
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Punctual identification of protein-coding regions in Deoxyribonucleic Acid (DNA) sequences because of their 3-base periodicity has been a challenging issue in bioinformatics. Many DSP (Digital Signal Processing) techniques have been applied for identification task and concentrated on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate periodicity components. In this paper, first, the symbolic DNA sequences are converted to digital signal using the Z-curve method, which is a unique 3-D plot to illustrate DNA sequence and presents the biological behavior of DNA sequence. Then a novel fast algorithm is proposed to investigate the location of exons in DNA strand based on the combination of Linear Predictive Coding Model (LPCM) and Goertzel algorithm. The proposed algorithm leads to increase the speed of process and therefor reduce the computational complexity. Detection of small size exons in DNA sequences, exactly, is another advantage of our algorithm. The proposed algorithm ability in exon prediction is compared with several existing methods at the nucleotide level using: (i) specificity - sensitivity values; (ii) Receiver Operating Curves (ROC); and (iii) area under ROC curve. Simulation results show that our algorithm increases the accuracy of exon detection relative to other methods for exon prediction. In this paper, we have also developed a useful user friendly package to analyze DNA sequences.

References
  1. Snustad D. P. and Simmons M. J. , Principles of Genetics, John Wiley & Sons Inc. , 2000.
  2. Dougherty E. R, et al. , Genomic signal processing and statistics, EURASIP Book Series on Signal Processing and Communications, 2005.
  3. Fickett, J. W. and Tung CS, "Assessment of protein coding measures," Nucleic Acids Res, PP. 6441-6450, 1992.
  4. Fickett, J. W, "The gene identification problem: an overview for developers," Comput Chem, vol. 20, PP. 103-118, 1996.
  5. Vaidyanathan, P. P. and Yoon, B. J, "The role of signal-processing concepts in genomics and proteomics," J. Franklin Inst, PP. 111-135, 2004.
  6. Tsonis, A. A. , et al. , "Periodicity in DNA coding sequences: implications in gene evolution," J. Theor. Biol, vol. 151, pp. 323-351, 1991.
  7. Voss, R. F. , "Evolution of long-range fractal correlations and 1/f noise in DNA base sequences," Phy. Rev, Lett, vol. 85, pp. 1342-1345, 1992.
  8. Chatzidimitriou-Dreismann, C. A. , and Larhammar, D. , "Long-range correlations in DNA," Nature, vol. 361, pp. 212-213, 1993.
  9. Henderson, J. , et al. , "Finding genes in DNA with a Hidden Markov Model," J. Comput. Biol, vol. 4, pp. 127-141, 1997.
  10. Ding, C. H. , and Dubchak, I. , "Multi-class protein fold recognition using support vector machines and neural networks," Bioinformatics, vol. 17, pp. 349-358, 2001.
  11. Eftestel, T. , et al. , "Eukaryotic gene prediction by spectral analysis and pattern recognition techniques," In Proceedings of the Seventh IEEE Nordic Signal Processing Symposium, pp. 146-149, 2006.
  12. Anastassiou, D. , "Genomic signal processing," IEEE Sign. Proc. Mag, vol. 18, pp. 8-20, 2001.
  13. Fox, T. W. , and Carreira, A. , "A digital signal processing method for gene prediction with improved noise suppression," EURASIP J. Appl. Aign. Proc, pp. 108-114, 2004.
  14. Tiwari S. , Ramachandran S. , Bhattacharya A. , Bhattacharya S. , and Ramaswamy R. , "Prediction of probable genes by Fourier analysis of genomic sequences," Comput Appl Biosci, vol. 13, pp. 263-270, 1997.
  15. Saberkari H. , Shamsi M. , Sedaaghi M. H. , and Golabi F. , "Prediction of protein coding regions in DNA sequences using signal processing methods," 2012 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2012), Bandung, Indonesia, pp. 354-359, September 2012.
  16. Saberkari H. , Shamsi M. , and Sedaaghi M. H. , "Identification of genomic islands in DNA sequences using a non-DSP technique based on the Z-Curve," 11th Iranian Conference on Intelligent Systems (ICIS 2013),Tehran, Iran, 27-28 February 2013.
  17. Deng S. , et al. , "Prediction of Protein Coding Regions by Combining Fourier and Wavelet Transform", Intarnational Conference on Image and Signal processing (ICISP), 2010.
  18. Datta S. , Asif A. , "A Fast DFT-Based Gene Prediction Algorithm for Identification of Protein Coding Regions," Proceedings of the 30th International Conference on Acoustics, Speech, and Signal Processing, 2005.
  19. Akhtar M. , Epps J. , Ambikairajah E. , "Signal Processing in sequence Analysis: advanced in Eukaryotic gene Prediction," IEEE journal of selected topics in signal processing, 2008, vol. 2, pp. 310-321.
  20. Haykin S. , Adaptive Filter Theory, Fourth Edition, Prentice Hall, 2001.
  21. Ma Baoshan. , Zhu Yi-Sheng. , "Kalman Filtering Approach for Human Gene Identification," 2nd International Conference on Signal Processing Systems (ICSPS 2010), 2010.
  22. Ma Baoshan, "A novel adaptive filtering approach for genomic signal processing, " IEEE 10th International Conference on Signal Processing (ICSP), 2010, pp. 1805-1808.
  23. Chakravarthy, N. , et al. , "Autoregressive modeling and feature analysis of DNA sequence," EURASIP J. Appl. Sign. Proc, pp. 13-28, 2004.
  24. P. D. Cristea, "Conversion of nucleotides sequences into genomic signals," J. Cell. Mol. Med. , vol. 6, no. 2, pp. 279–303, 2002.
  25. P. D. Cristea, "Genetic signal representation and analysis," In SPIE Conference, InternationalBiomedical Optics Symposium, Molecular Analysis and Informatics (BIOS '02), vol. 4623 of Proceedings of SPIE, pp. 77–84, San Jose, Calif, USA, January 2002.
  26. J. M. Claverie, "Computational methods for the identification of genes in vertebrate genomic sequences," Hum. Mol. Genet, vol. 6, no. 10, PP. 1735-1744, 1997.
  27. W. F. Doolittle, "Phylogenetic classification and the universal tree," Science, vol. 284, no. 5423, pp. 2124–2128, 1999.
  28. P. D. Cristea, "Genomic signals of chromosomes and of concatenated reoriented coding regions," In SPIE Conference, Biomedical Optics (BIOS '04), vol. 5322 of Proceedings of SPIE, pp. 29–41, SanJose, Calif, USA, January 2004, Progress in Biomedical Optics and Imaging, Vol. 5, No. 11.
  29. K. D. Rao and M. N. S. Swamy "Analysis of genomics and proteomics using DSP techniques," IEEE Transactions on Circuits and Systems-1, vol. 55, no. 1, pp. 370-378, February 2008.
  30. Zhang R. and Zhang C. T. , "Z curves, an intuitive tool for visualizing and analysing the DNA sequences," J. on Biom. Struc. Dyn. , vol. 11, pp. 767-782, 1994.
  31. Yan M. , Lin Z. S. , and Zhang C. T. , "A new Fourier Transform approach for protein coding measure based on the format of the Z-curve," Bioinformatics, vol. 14, no. 8, 1998.
  32. Rabiner L. R. , Schafer R. W. , Digital Processing of Speech Signals, Prentice-Hall, Inc, 1987.
  33. A. V. Oppenheim and R. W. Schafer, "Discrete Time Signal Processing," Prentice Hall, Inc, NJ, 1999.
  34. Braun F. Q. , "Nonrecursive digital filters for detecting multifrequency code dignals," IEEE Transactions on Acoustic, Speech and Signal Processing, vol. ASSP-23, no. 3, pp. 250-256, 1975.
  35. Koval I. , Gara G. , "Digital MF receiver using discrete Fourier transform," IEEE transactions on Commiunication, vol. COM-21, no. 12, pp. 1331-1335, 1973.
  36. Simington R. A. Z. , and Percival T. M. P. , "New frequency domain technique for DSP based VSAT modems," Proc. IREE Conference, pp. 428-431, 1991.
  37. Burset M. , and Guigo R. , "Evaluation of gene structure prediction programs," genomics, pp. 353-367, 1996.
  38. Fawcett T. ,ROC Graphs: Notes and Practical Considerations for Researchers HP Laboratories, 2003.
  39. Ramachandran P. , Lu W. S. , and Antoniou A. , "Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA sing a Quasi-Newton Algorithm," IEEE International Symposium on Circuits and Systems (ISCAS 2010), 2010.
  40. National Center for Biotechnology Information,National Institutes of Health,National Library of Medicine, http://www. ncbi. nlm. nih. gov/Genebank/index. html.
  41. Vaidayanathan P. P. , Yoon B. J. , "Digital filters for gene prediction applications," Proceeding of the 36th Asilomar Conference on Signals, Systems, and Computers, 2002.
Index Terms

Computer Science
Information Sciences

Keywords

DNA sequence Protein coding regions Signal processing Exon Linear predictive coding model Goertzel algorithm