CFP last date
20 February 2025
Reseach Article

Semi-Adaptive Substitution Coder for Lossless Text Compression

by Rexline S J, Robert L, Trujila Lobo
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 80 - Number 4
Year of Publication: 2013
Authors: Rexline S J, Robert L, Trujila Lobo
10.5120/13846-1678

Rexline S J, Robert L, Trujila Lobo . Semi-Adaptive Substitution Coder for Lossless Text Compression. International Journal of Computer Applications. 80, 4 ( October 2013), 1-5. DOI=10.5120/13846-1678

@article{ 10.5120/13846-1678,
author = { Rexline S J, Robert L, Trujila Lobo },
title = { Semi-Adaptive Substitution Coder for Lossless Text Compression },
journal = { International Journal of Computer Applications },
issue_date = { October 2013 },
volume = { 80 },
number = { 4 },
month = { October },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume80/number4/13846-1678/ },
doi = { 10.5120/13846-1678 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:53:37.678683+05:30
%A Rexline S J
%A Robert L
%A Trujila Lobo
%T Semi-Adaptive Substitution Coder for Lossless Text Compression
%J International Journal of Computer Applications
%@ 0975-8887
%V 80
%N 4
%P 1-5
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, a new text transformation technique called Semi-Adaptive Substitution Coder for Lossless Text Compression is proposed. The rapid advantage of this Substitution Coder is that it substitutes the codewords by referring the reference of the word's position in the dictionary to expedite the dictionary mapping and also codewords are shorter than words and, thus, the same amount of text will require less space. In general, text transformation needs an external dictionary to store the frequently used words. To preserve this transformation method in a healthy way, a semi-adaptive dictionary is used and therefore which reduces the expenditure of memory overhead and speeds up the transformation because of the smaller size dictionary. This new transformation algorithm is implemented and tested using Calgary Corpus and Large Corpus. In this implementation Semi-Adaptive Substitution Coder in connection with a popular bzip2 and commonly used Gzip compressors improve the compression performance by about 7–9% on large files.

References
  1. Abel,J, Teahan,W, "Universal Text Preprocessing for Data Compression",IEEE Trans. Computers,54(5)pp :497-507,2005.
  2. F. Awan and A. Mukherjee, "LIPT: A Lossless Text Transform to Improve Compression," Proceedings of International Conference on Information and Theory:Coding and Computing, IEEE Computer Society, pp. 452-460, April 2001.
  3. T. Bell, J. Cleary, and I. Witten, "Data compression using adaptive coding and partial string matching," IEEE Transactions on Communications, Vol. 32 (4), p. 396-402, 1984.
  4. M. Burrows and D. J. Wheeler, "A Block-Sorting Lossless Data Compression Algorithm", SRC Research Report 124, Digital Systems Research Center, Palo Alto, CA, 1994.
  5. Chapin, B. "Higher Compression from the Burrows-Wheeler Transform with new Algorithms for the List Update Problem", Ph. D. Dissertation, University of North Texas, 2001.
  6. R. Franceschini, H. Kruse, N. Zhang, R. Iqbal, and A. Mukherjee, "Lossless, Reversible Transformations that Improve Text Compression Ratio" ,Project paper, University of Central Florida, USA. 2000.
  7. V. K. Govindan, B. S. Shajee mohan, "IDBE – An Intelligent Dictionary Based Encoding Algorithm for Text Data Compression for High Speed Data Transmission Over Internet", Proceeding of the International Conference on Intelligent Signal Processing and Robotics IIIT Allahabad February 2004.
  8. Horspool N, Cormack G. "Constructing Word-Based Text Compression Algorithms", Proceedings of the 1992 IEEE Data Compression Conference, IEEE Computer Society Press, Los Alamitos, California, pp. 62–71,1992.
  9. Huffman, D. A. ," A method for the construction of minimum-redundancy codes". Proc. Inst. Radio Eng. , 40: pp: 1098-1101. 1952.
  10. Hussein Al-Bahadili, Shakir M. Hussain," A Bit-level Text Compression Scheme Based on the ACW Algorithm", International Journal of Automation and Computing, pp: 123-131, February 2010.
  11. Isal RYK, Moffat A, Ngai ACH. "Enhanced Word-Based Block-Sorting Text Compression", Proceedings of the 25th Australian Computer Science Conference, Melbourne, pp. 129–138, January 2002.
  12. H. Kruse and A. Mukherjee, "Preprocessing Text to Improve Compression Ratios", Proceedings of Data Compression Conference, IEEE Computer Society, Snowbird Utah, pp. 556, 1998.
  13. U. Manger, "A Text compression scheme that allows fast searching directly in compressed file" , ACM Transactions on Information Systems, Vol. 52, N0. 1, pp. 124-136, 1997.
  14. Md. Nasim Akhtar,Md. Mamunur Rashid, Md,Shafiqul Islam, Mohammad Abul kashem,Cyrill Y. Kolybanov , "Position Index preserving Compression for Text Data",JCS&T,Vol 11, No 1,April 2011.
  15. Radu R¸ADESCU, "Transform Methods Used in Lossless Compression of Text Files ", romanian journal of information science and technology ,Volume 12, Number 1,pp :101-115, 2009.
  16. Robert Franceschini, Amar Mukherjee, " Data Compression Using Encrypted Text" ,proceedings of the third forum on Research and Technology, Advances on Digital Libraries,ADL 96,pp . 130-138, May 1996.
  17. P. Skibi?ski, Sz. Grabowski and S. Deorowicz. "Revisiting dictionary-based compression". Software–Practice and Experience, pp. 1455-1476, 2005.
  18. Sun W, Mukherjee A, Zhang N. "A Dictionary-based Multi-Corpora Text Compression System" . In Storer JA, Cohn M, editors, Proceedings of the 2003 IEEE Data Compression Conference, IEEE Computer Society Press, Los Alamitos, California, pp . 448 ,2003.
  19. Umesh S. Bhadade, A. I. Trivedi, "Lossless Text Compression using Dictionaries", International Journal of Computer Applications ,Volume 13– No. 8, January 2011.
  20. Md. Ziaul Karim Zia, Dewan Md. Fayzur Rahman, and Chowdhury Mofizur Rahman, "Two-Level Dictionary-Based Text Compression Scheme", Proceedings of 11th International Conference on Computer and Information Technology, Khulna, Bangladesh. ,pp. 25-27 December.
Index Terms

Computer Science
Information Sciences

Keywords

Transformation preprocessing adaptive dictionary compression decompression.