CFP last date
20 December 2024
Reseach Article

Optimising Storage Resource using Morpheme based Text Compression Technique

by Rockson Kwasi Afriyie, J. B. Hayfron-acquah, Joseph K. Panford
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 93 - Number 2
Year of Publication: 2014
Authors: Rockson Kwasi Afriyie, J. B. Hayfron-acquah, Joseph K. Panford
10.5120/16190-5414

Rockson Kwasi Afriyie, J. B. Hayfron-acquah, Joseph K. Panford . Optimising Storage Resource using Morpheme based Text Compression Technique. International Journal of Computer Applications. 93, 2 ( May 2014), 33-42. DOI=10.5120/16190-5414

@article{ 10.5120/16190-5414,
author = { Rockson Kwasi Afriyie, J. B. Hayfron-acquah, Joseph K. Panford },
title = { Optimising Storage Resource using Morpheme based Text Compression Technique },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 93 },
number = { 2 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 33-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume93/number2/16190-5414/ },
doi = { 10.5120/16190-5414 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:14:48.356306+05:30
%A Rockson Kwasi Afriyie
%A J. B. Hayfron-acquah
%A Joseph K. Panford
%T Optimising Storage Resource using Morpheme based Text Compression Technique
%J International Journal of Computer Applications
%@ 0975-8887
%V 93
%N 2
%P 33-42
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, we present a text compression technique which utilises morpheme-based text compression to optimise storage resources. The proposed technique is designed to decompose words into their morphemes and then to produce code representations for compression. The proposed algorithm is implemented using English Language text data and applied using 30 different texts of different lengths collected from different sources with different natures. The efficiency increases with the increase in the number of long, repetitive morphemes in the input data. To the best of our knowledge, the resulting implementation is the first to demonstrate lossless compression using such a technique. We illustrate its suitability and effectiveness on a number of benchmark file sizes – small, middle-sized, large, and very large real-world application. The results indicated a good compression performance of 98% making the approach an attractive one. A further virtue of this method is its dynamic application. A degraded compression can be compensated for by appending identified morphemes within the document to the dictionary to improve compression. The evaluation experiments show that: if storage space is the primary consideration, the morpheme-based text compression technique is an efficient approach for compressing text data.

References
  1. Akman, I. et'al. 2011. Lossless text compression technique using syllable based morphology. The International Arab Journal of Information Technology, Vol. 8, No. 1, January 2011.
  2. Andrew, B. 2011. Big Data. www. zdnet. com/blog/service. . . of. . . data. . . growing. . . /4750 http://www. zdnet. com/five-big-data-trends-revolutionizing-retail-7000019510/
  3. International Dictionary of English. 2002. Morpheme is the smallest bit of language. Low Price Edition, University Press.
  4. David G. 2000. The future of English; A guide to forecasting the popularity of English Language in the 21st century.
  5. Distributed proofreaders. 2011. Digitisation of Public Domain Books. http://www. pgdp. netg. (accessed 2011 October 20).
  6. http://www. codeguru. com/cpp/cpp/algorithms/compression/article. php/c5089/ (accessed 2010 August 13).
  7. Ida, M. P. 2006. Fundamental data compression. Butterworth-Heinemann Linacre House, Jordan Hill, Oxford OX2 30 Corporate Drives, Suite 400, Burlington, MA 01803.
  8. Jon, N. 2007. Digital Text Community — new forum on digitizing "ink-on-paper" texts. http://www. teleread. com /ebooks/digital-text-community-new-forum-on digitizing-ink-on paper-texts/ (accessed 2011 October 24).
  9. Lansky, J. , Zemlicka, M. 2005. Text Compression: Syllables Conference: Databases, Texts, Specifications, Objects - DATESO, pp. 32-45, 2005 http://academic. research. microsoft. com/Publication/1873500/text-compression-syllables. (Access 2011 December 14).
  10. Mark, C. 2003. An Introduction to Language. ENG 346: Aspects of the English Language Lesson 4: Morphology. Updated January 7, 2003. http://www. uncp. edu/home/canada/work/markport/language/aspects/spg2003/04morph. htm. (accessed 2012 April 3).
  11. Mark, N. 1989. Data compression, LZW Data Compression http://marknelson. us/1989/10/01/lzw-data-compression.
  12. Mark, N. and Jean-loup, G. 1995. The Data compression book 2nd edition, M&T Books, Wiley, New York, NY. http://staff. uob. edu. bh/files/781231507_files/The-Data-Compression-Book-2nd-edition. pdf
  13. Shenfeng, C. 1996. Algorithmic Applications of data Compression Techniques. Department of Computer Science; Duke University. http://www. cs. duke. edu/~reif/ paper/chen/chen. thesis/chen. thesis. pdf.
  14. Skibi´nski, P. Grabowski, S. Z. , and Deorowicz, S. 2005. Revisiting dictionary-based compression Software Practice and Experience. (accessed 2012 January 11).
  15. Social Media Informer. 2012. Data is growing and shows no signs of slowing down. http://www. socialmedia informer. com/data/information/statistics/ (accessed 2012 December 4).
  16. Web Technology Survey. Usage of content languages for websites. http://w3techs. com/technologies/overview/content_language/all
  17. Wikipedia. 2012. Converting already published books into eBooks. en. wikipedia. org/wiki/Gutenberg (accessed 2012 May 5).
  18. Wolfram, S. 2002. A new kind of science. Notes for Chapter 10: Processes of Perception and Analysis Section: Data Compression Page 1069. http://www. wolfram science. com/nksonline/page-1069b-text?firstview=1 (accessed 2011 March 10).
Index Terms

Computer Science
Information Sciences

Keywords

Algorithm morpheme clean data storage resource