CFP last date
20 December 2024
Reseach Article

Phylogenetic Tree Generation using Different Scoring Methods

by Rajbir Singh, Sinapreet Kaur, Dheeraj Pal Kaur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Number 14
Year of Publication: 2014
Authors: Rajbir Singh, Sinapreet Kaur, Dheeraj Pal Kaur
10.5120/17597-8404

Rajbir Singh, Sinapreet Kaur, Dheeraj Pal Kaur . Phylogenetic Tree Generation using Different Scoring Methods. International Journal of Computer Applications. 100, 14 ( August 2014), 38-45. DOI=10.5120/17597-8404

@article{ 10.5120/17597-8404,
author = { Rajbir Singh, Sinapreet Kaur, Dheeraj Pal Kaur },
title = { Phylogenetic Tree Generation using Different Scoring Methods },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 100 },
number = { 14 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 38-45 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume100/number14/17597-8404/ },
doi = { 10.5120/17597-8404 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:30:00.311134+05:30
%A Rajbir Singh
%A Sinapreet Kaur
%A Dheeraj Pal Kaur
%T Phylogenetic Tree Generation using Different Scoring Methods
%J International Journal of Computer Applications
%@ 0975-8887
%V 100
%N 14
%P 38-45
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data Mining is a branch of knowledge discovery in the field of research and development. The biological data is available in different formats and is comparatively more complex. Knowledge discovery from these large and complex databases is the key problem of this era. Data mining and machine learning techniques are needed which can scale to the size of the problems and can be customized to the application of biology. Hierarchical Clustering is the one of the main techniques for data mining. Phylogeny is the evolutionary history for a set of evolutionary related species. One approach on determining the evolutionary histories of a dataset are scoring based methods. There are number of different distance based methods of which two are details with here: the UPGMA (Unweighted Pair Group Method using Arithmetic average) and Neighbor Joining. A method for construction of distance based phylogenetic tree using hierarchical clustering is proposed and implemented on different rice varieties. The sequences are downloaded from NCBI databank. Evolutionary distances are calculated using jukes cantor distance method. Multiple sequence alignment is applied on different datasets. Trees are constructed for different datasets from available data using both the distance based methods and pruning technique. SNAP calculates synonymous and non-synonymous substitution rates based on a set of codon aligned nucleotide sequences. The DNA Multiple sequences to calculate the GC content of eukaryotes, molecular weight, melting temperature and tree information. Extractions of closely related varieties are performed by applying threshold condition. Then, final tree is constructed using these closely related rice varieties.

References
  1. Amanda J. Garris,(2005) "Genetic Structure and Diversity in Oryza sativa L. ", Oxford Journals, pp. 1631-1638.
  2. Archak S. and Nagaraju J. , (2007) "Computational Prediction of Rice (Oryza sativa) miRNA Targets", Genomics Proteomics & Bioinformatics, Vol. 5 No. 3–4, pp. 196-206.
  3. Arthur M. , (2002) "Introduction to bioinformatics", oxford university press, pp. 25-28
  4. Bergeron, B. (2003) "Bioinformatics Computing", Pearson Education, pp. 110-160.
  5. David J. HAND, (1998) "Data Mining: Statistics and More? ", The American Statistician, Vol. 52, No. 2, pp. 112-118.
  6. Gronau I. and Moran S. , (2007) "Optimal Implementations of UPGMA and Other Common Clustering Algorithms", Information Processing Letters, Volume 104, Issue 6, pp. 205-210.
  7. Jacques Cohen (2004) "Bioinformatics An Introduction for Computer Scientists", ACM Computing Surveys, Vol. 36, No. 2, pp. 122–158.
  8. Jose C. Clemente et al. , (2006) "Phylogenetic reconstruction from non-genomic data" Oxford University Press, Vol. 23, pp. e110–e115.
  9. Khalid R. (2012) "Application of Data Mining in Bioinformatics", Indian Journal of Computer Science and Engineering, Vol. 1 No 2, pp. 114-118.
  10. Mai S. Mabrouk et al. (2006) "BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab", proc. cairo international biomedical engineering conference, pp. 1-9.
  11. Nair Achuthsankar S. , "Computational Biology & Bioinformatics: A Gentle Overview", Communications of the Computer Society of India, January 2007.
  12. Rakshit S. et al. , (2007) "Large-scale DNA polymorphism study of Oryza sativa and O. rufipogon reveals the origin an divergence of Asian rice", Springer, pp. 731-743.
  13. Rani S. and Kaur S. (2012) "Cluster Analysis Method for Multiple Sequence Alignment", International Journal of Computer Applications, Vol. 43– No. 14, pp. 19-25
  14. Singh Harmandeep (2013) "Implementing Hierarchical Clustering method For Multiple Sequence Alignment and Phylogenetic Tree Construction", International Journal of Computer Science, Engineering and Information Technology, Vol. 3, No. 1, pp. 1-12. .
  15. Usama Fayyad et al. , (1996) "From Data Mining to Knowledge Discovery in Databases", American Association for Artificial Intelligence, Volume 17 Number 3, pp. 37-54.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining DNA Phylogenetics