We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Study on Various Data De-duplication Systems

by Rashmi Vikraman, Abirami S
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 4
Year of Publication: 2014
Authors: Rashmi Vikraman, Abirami S
10.5120/16334-5616

Rashmi Vikraman, Abirami S . A Study on Various Data De-duplication Systems. International Journal of Computer Applications. 94, 4 ( May 2014), 35-40. DOI=10.5120/16334-5616

@article{ 10.5120/16334-5616,
author = { Rashmi Vikraman, Abirami S },
title = { A Study on Various Data De-duplication Systems },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 4 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 35-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number4/16334-5616/ },
doi = { 10.5120/16334-5616 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:16:44.037801+05:30
%A Rashmi Vikraman
%A Abirami S
%T A Study on Various Data De-duplication Systems
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 4
%P 35-40
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data is the heart of any organization; hence it is necessary to protect it. For doing so, it is the needed to implement a good backup and recovery plan. But the redundant nature of the backup data makes the storage a concern; hence it is necessary to avoid the redundant data present in the backup. Data de-duplication is one such solution that discovers and removes the redundancies among the data blocks. This paper focuses on giving a wide study on the technology, process and types of the various data de-duplication system. This paper is helpful to the readers in giving a detailed analysis and study on the various data de-duplication systems that has been proposed by many researchers.

References
  1. Eshghi, K. A. 2005. Framework for Analyzing and Improving Content-Based Chunking Algorithms. Technical Report HPL-2005-30(R. 1), Hewlett Packard Laboratories, Palo Alto, CA.
  2. Thein, N. L. and Thwel, T. T. 2012. An efficient Indexing Mechanism for Data De-duplication. In Proceedings of the 2009 International Conference on the current trends in Information Technology (CTIT), 1-5.
  3. Kruss, E. , Ungureanu, C. and Dubnicki, C. 2010. Bimodal Content Defined Chunking for Backup Streams. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST' 10),
  4. Rabin, M. O. 1981. Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University.
  5. Bloom, B. H. 1970. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426.
  6. Peter, C. 2012. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication. In Proceedings of the IEEE Transactions on Knowledge and Data Engineering. 24(9), 1537-1555.
  7. http://en. wikipedia. org/wiki/Bloom_filter
  8. Quinlan, S. and Dorward, S. 2002. Venti: A new Approach to Archival Storage. In Proceedings of the USENIX Conference on File and Storage Technologies, 89- 101.
  9. Fan, L. , Cao, P. , Almeida, J. and Broder, A. Z. 2000. Summary cache: a scalable wide area web cache sharing protocol. In Proceedings of the IEEE Transactions on Networking, 8(3), 281- 293.
  10. Chazelle, B. , Kilian, J. , Rubinfled, R. and Tal, A. 2004. The Bloomier Filter: an efficient data structure for static support lookup tables. In Proceedings of the 15th annual ACM-SIAM symposium on Discrete Algorithms, 30-39.
  11. Wei, J. , Jiang, H. , Zhou, K. and Feng, D. 2013. Efficiently Representing Membership for Variable Large Data Sets. In Proceedings of the IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 960-970.
  12. https://www. wikipedia. org
  13. Bhagwat D. , Eshghi, Long D. D. E. , and Lilibridge M. , "Extreme binning: Scalable, parallel deduplication for chunk-based file backup", Proceedings of the 7th IEEE International Symposium on Modelling, Analysis and Simulation (MASCOTS), 1-9, 2009.
  14. Zhu, B. , Li, k. , and Patterson, H. 2008. Avoiding the disk bottleneck in the Data Domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), 269-282.
  15. Lillibridge, M. , Eshghi, K. , Bhagwat, D. , Trezise, G. and Camble, P. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), 111-123.
  16. Can, W. , Qin, Z. G. , Yang, L. and Juan, W. 2012. A Fast Duplicate Chunk Identifying Method Based on Hierarchical Indexing Structure. In Proceedings on IEEE International Conference on Industrial Control and Electronics Engineering, 624-627.
  17. Wildani, A. , Miller, E. L. , and Rodeh, O. 2013. HANDS: A Heuristically arranged non-backup inline deduplication system. In Proceedings of the IEEE 29th International Conference on Data Engineering (ICDE), 446-457.
  18. Sengar, S. S. and Mishra, M. 2012. E-DAID: An Efficient Distributed Architecture for inline data deduplication. In Proceedings of the IEEE International Conference on Communication Systems and Network Technologies (CSNT), 438-442.
  19. Xia, W. , Jiang, H. , Feng, D. and Hua, Y. 2011. SiLo: a similarity-locality based near exact deduplication scheme with low RAM overhead and high throughput. In Proceedings of the USENIX Annual Technical Conference (ATC), 26-28.
  20. Andre, B. , Dirk, M. and Kaiser, J. 2013. Block locality caching for data deduplication system. In Proceedings of the 6th ACM International Systems and Storage Conference, 19-24.
  21. Biplob, D. , Jin, L. and Sudipta, S. 2010. Chunkstash: Speeding up Inline Storage Deduplication using Flash Memory. In Proceedings of the 2010 USENIX Annual and Technical Conference, 16-21.
  22. en. wikipedia. org/wiki/SHA-1
  23. Lu, G. , Nam, Y. J. and Du, D. H. 2012. BloomStore: Bloom filter based memory efficient key-value store for indexing of data deduplication on flash. In Proceedings of the 28th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 1-11.
  24. Meister, D. , Tim, S. and Brinkmann, A. 2013. File recipe compression in data deduplication systems. In Proceedings of 11th USENIX Conference on File and Storage Systems and Technologies (MSST), 175-182.
  25. Kaczmarczyk, M. , Barczynski, M. , Killian, W. and Dubnicki, C. 2012. Reducing impact of data fragmentation caused by inline deduplication. In Proceedings of the 5th ACM Annual International Systems and Storage Conference (SYSTOR).
  26. Feng, D. , Sha, E. H. , Ge, X. , Tan, Y. and Yan, Z. 2012. Reducing the delinearization of data placement to improve deduplication performance. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 796-800.
  27. Young, J. N. , Park, D. and David, H. C. D. 2012. Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets. In Proceedings of the IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication systems, 201-208.
  28. Ziv, J. and Lempel, A. 1978. Compression of Individual Sequences via Variable-Rate Coding. In Proceedings of the IEEE Transactions on Information Theory, 530-536.
  29. Harnik, D. , Margalit, O. , Naor, D. , Sotnikov, D. and Vernik, G. 2012. Estimation of deduplication ratios in large data sets. In Proceedings of the 28th IEEE Conference on Mass Storage Systems and Technologies (MSST), 1-11.
  30. Gowsikhaa, D. , Manjunath. , Abirami, S. 2012. Suspicious Human activity detection from Surveillance videos. In Proceedings of the International Journal on Internet and Distributed Computing Systems, Vol. 2,No. 2, 141-149.
  31. Gowshikaa, D. , Abirami, S. , Baskaran, R. 2012. Automated Human Behaviour Analysis from Surveillance videos: a survey. In Proceedings of the Artificial Intelligence Review. DOI 10. 1007/s 10462-012-9341-3.
  32. Gowsikhaa, D. , Abirami, S. and Baskaran, R. 2012. Construction of Image Ontology using Low level features for Image Retrieval. In Proceedings of the International Conference on Computer Communication and Informatics, (ICCCI 2012, January 10-12), 129-134.
  33. Mark, R. C. and Steve Whitner. Data De-duplication for Dummies. Wiley Publishing, Inc.
Index Terms

Computer Science
Information Sciences

Keywords

Data de-duplication Flash Storage Chunking Backup