CFP last date
20 January 2025
Reseach Article

Transaction Encoding Algorithm (TEA) for Distributed Data

by A.Anbarasi, D.Sathyasrinivas, Dr.K.Vivekanandan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 16 - Number 8
Year of Publication: 2011
Authors: A.Anbarasi, D.Sathyasrinivas, Dr.K.Vivekanandan
10.5120/2030-2580

A.Anbarasi, D.Sathyasrinivas, Dr.K.Vivekanandan . Transaction Encoding Algorithm (TEA) for Distributed Data. International Journal of Computer Applications. 16, 8 ( February 2011), 43-49. DOI=10.5120/2030-2580

@article{ 10.5120/2030-2580,
author = { A.Anbarasi, D.Sathyasrinivas, Dr.K.Vivekanandan },
title = { Transaction Encoding Algorithm (TEA) for Distributed Data },
journal = { International Journal of Computer Applications },
issue_date = { February 2011 },
volume = { 16 },
number = { 8 },
month = { February },
year = { 2011 },
issn = { 0975-8887 },
pages = { 43-49 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume16/number8/2030-2580/ },
doi = { 10.5120/2030-2580 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:04:23.654735+05:30
%A A.Anbarasi
%A D.Sathyasrinivas
%A Dr.K.Vivekanandan
%T Transaction Encoding Algorithm (TEA) for Distributed Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 16
%N 8
%P 43-49
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Analysis of huge datasets has been a major concern in almost all areas of technology in the past decade and the role of data mining has become so crucial as a result of this crisis. As the data sizes in these datasets increase, from gigabytes to terabytes or even larger the complexity in collecting and warehousing these massive dataset as such in a single site is practically impossible as it may not have enough main memory to hold all the data. Therefore they are accumulated usually in geographically distributed sites. The challenge in distributed data mining is how to learn as much knowledge from distributed databases as we do from the centralized database without costing too much communication bandwidth. A solution to distributed data mining is that the massive dataset can be collected and warehoused in a single site if its dimensionality is reduced. The dimension reduction algorithms are generally classified into feature selection, feature extraction and random projection. In this paper we propose a dimension reduction algorithm, which is different from all of these methods, to encode the transactions which reduce the size of transaction that in turn reduces the communication cost. Experimental results on a datasets demonstrate the performance of our proposed algorithm.

References
  1. Y. Akbas, C. Takma Canonical correlation analysis for studying the relationship between egg production traits and body weight, egg weight and age at sexual maturity in layers Czech Journal of Animal Science, 50, pp.163–168, 2005 (4).
  2. L. Breiman., Random forests, Technical report, Department of Statistics, University of California, 2001.
  3. Deon Garrett, David A. Peterson, Charles W. Anderson, and Michael H. Thaut, Comparison of Linear, Nonlinear, and Feature Selection Methods for EEG Signal Classification IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 11 Issue. 2, pp.141 – 144, 2003
  4. Garthwaite, P.H., 1994. An interpretation of partial least squares. Journal American Statistical. Association. 89, pp.122–127, 1988.
  5. A. J. Guarino, A Comparison of First and Second Generation Multivariate Analyses: Canonical Correlation Analysis and Structural Equation Modeling 1, Florida Journal of Educational Research, 2004, Vol. 42, pp. 22 – 40 22
  6. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, Canonical correlation analysis: An overview with applications to learning methods, Neural Comput., vol. 16, pp. 2639–2664, 2004.
  7. HIoskuldsson, A., PLS regression methods, Journal of Chemometrics. 2, 211–228. 1988.
  8. J.E. Jackson. “A User's Guide to Principal Components”. New York: John Wiley and Sons, 1991.
  9. John Aldo Lee, Amaury Lendasse, Michel Verleysen Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis Neuro computing 57 (2004) 49 – 76
  10. I.T. Jolliffe. “Principal Component Analysis”. Springer-Verlag, 1986.
  11. Lamersdorf, M. Merz (Eds.), Trends in Distributed Systems for Electronic Commerce, Lecture Notes in Computer Science, vol. 1402, Springer-Verlag, Berlin Heidelberg New York, June 1998
  12. Philip Machanick, A distributed systems approach to secure Internet mail, Security,Volume 24, Issue 6, September 2005, Pages 492-499
  13. D. Ridder, O. Kouropteva, O. Okun, M. Pietikainen, and R. Duin, “Supervised locally linear embedding,” in Proc. Artif. Neural Netw. Neural Inf. Process., 2003, pp. 333–341.
  14. S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science 290: 2323-2326.
  15. Salah Aidarous Stephen B. Weinstein, Distributed Systems for Telecommunications IEEE Network January/February 1994
  16. Sammon Jr., J.W., “A nonlinear mapping for data structure analysis” IEEE Transactions on Computers, C-18, 401-409. 1969
  17. Syed Zahid, Hassan Zaidi, Syed Sibte Raza Abidi and Selvakumar Manickam. Distributed Data Mining From Heterogeneous Healthcare Data Repositories: Towards an Intelligent Agent-Based Framework, Proceedings of the 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002)
  18. Talia, D.“Grid-Based Distributed Data Mining Systems, Algorithms and Services, 9th International Workshop on High Performance and Distributed Mining, Bethesda April 22, 2006
  19. J. J. Verbeek, S. T. Roweis, and N. Vlassis. Nonlinear CCA and PCA by alignment of local models. In Advances in Neural Information Processing Systems 16, 2000
  20. Wu-Shan Jiang, Ji-Hui Yu., Distributed Data Mining on the Grid, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005.
Index Terms

Computer Science
Information Sciences

Keywords

Centralized Database Data Mining Distributed Data Mining Dataset Dimension reduction