CFP last date
20 January 2025
Reseach Article

Algorithm for XML Compression using DTD and Stack

Published on March 2012 by G. M. Tere, B. T. Jadhav
International Conference on Recent Trends in Information Technology and Computer Science
Foundation of Computer Science USA
ICRTITCS - Number 2
March 2012
Authors: G. M. Tere, B. T. Jadhav
78feb42b-6816-4ee5-985e-33786d972ad5

G. M. Tere, B. T. Jadhav . Algorithm for XML Compression using DTD and Stack. International Conference on Recent Trends in Information Technology and Computer Science. ICRTITCS, 2 (March 2012), 12-17.

@article{
author = { G. M. Tere, B. T. Jadhav },
title = { Algorithm for XML Compression using DTD and Stack },
journal = { International Conference on Recent Trends in Information Technology and Computer Science },
issue_date = { March 2012 },
volume = { ICRTITCS },
number = { 2 },
month = { March },
year = { 2012 },
issn = 0975-8887,
pages = { 12-17 },
numpages = 6,
url = { /proceedings/icrtitcs/number2/5179-1011/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Recent Trends in Information Technology and Computer Science
%A G. M. Tere
%A B. T. Jadhav
%T Algorithm for XML Compression using DTD and Stack
%J International Conference on Recent Trends in Information Technology and Computer Science
%@ 0975-8887
%V ICRTITCS
%N 2
%P 12-17
%D 2012
%I International Journal of Computer Applications
Abstract

Worldwide standard for data definition is XML. For developing SOA based applications XML is extensively used. SOA based applications contains many different applications which are integrated to each other. For solving the problem of interoperability XML documents are used. XML is widely used for a variety of tasks, including configuration files, protocols, and web services. XML has problem with processing. It is verbose nature. Simple messages can be quite large, containing very small information. In XML documents lots of information are duplicated, which take more computing resources and thus performance of web services decreases. Lots of research is going on regarding how to process XML, so that web services’ performance can increase. We present an algorithm for compressing XML documents using Document Type Definition (DTD) specifications. Our algorithm is based on lossless compression technique. The model used for compression and decompression is generated automatically from the DTD, and is used in conjunction with an arithmetic encoder to produce a compressed XML document. Our compression technique is on-line, that is, it can compress the document as it is being read. We have implemented the compressor generator, and we have mentioned the results of our experiments performed with XML documents created from Oracle database. The average compression is better than that of XMLPPM and XMill. The processor, XPrFAST, is able to compress large documents where XMLPPM failed to work as it ran out of memory. The technique we have proposed is simple and effective and we have compared it with XMLPPM and XMill.

References
  1. Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: Efficient query evaluation over compressed XML data. In: EDBT. (2004) 200–218
  2. Backhouse, R.C.: Syntax of Programming Languages - Theory and Practice. Prentice Hall International, London (1979)
  3. Bzip2: (http://www.bzip.org)
  4. Cameron, R.D.: Source encoding using syntactic information source models. IEEE Transactions on Information Theory 34 (1988) 843–850
  5. Cleary, J.G., Teahan, W.J.: Unbounded length contexts for PPM. The Computer Journal 40 (1997) 67–75
  6. Cheney, J.: Compressing XML with Multiplexed Hierarchical PPM Models. In: Proceedings of the Data Compression Conference, IEEE Computer Society (2001) 163–172
  7. DBLP: (http://www.informatik.uni-trier.de/?ley/db)
  8. Ernst, J., Evans, W.S., Fraser, C.W., Lucco, S., Proebsting, T.A.: Code compression. In: PLDI. (1997) 358–365
  9. Franz, M.: Adaptive compression of syntax trees and iterative dynamic code optimization: Two basic technologies for mobile object systems. In: Mobile Object Systems: Towards the Programmable Internet. Springer-Verlag: Heidelberg, Germany (1997) 263–276
  10. Franz, M., Kistler, T.: Slim binaries. Commun. ACM 40 (1997) 87–94
  11. Fraser, C.W.: Automatic inference of models for statistical code compression. In: PLDI. (1999) 242–246
  12. Liefke, H., Suciu, D.: XMILL: An efficient compressor for XML data. In: SIGMOD Conference. (2000) 153–164
  13. Roza Leyderman, Oracle Database Sample Schemas, 11g Release 1 (11.1), B28328-03, Oracle, 2008.
  14. Min, J.K., Park, M.J., Chung, C.W.: XPRESS: A queriable compression for XML data. In: SIGMOD Conference. (2003) 122–133
  15. Nelson, M.: Arithmetic coding and statistical modeling.
  16. http://dogma.net/markn/articles/arith/part1.htm. Dr. Dobbs Journal (1991)
  17. Thierry Violleau, Java Technology and XML-Part One, March 2001, http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML/
  18. Thierry Violleau, Java Technology and XML-Part Two, March 2002, http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML_part2/
  19. Tolani, P.M., Haritsa, J.R.: XGRIND: A query-friendly XML compressor. In: ICDE. (2002) 225–234
  20. UniProt: (http://www.ebi.uniprot.org)
  21. Witten, I. H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30 (1987) 520–540
  22. XML: W3C recommendation. http://www.w3.org/TR/REC-xml (2004)
  23. XMLZip, http://www.xmls.com
  24. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23 (1977) 337–343
  25. Hariharan Subramanian and Priti Shankar, Compressing XML Documents Using Recursive Finite State Automata, CIAA 2005, LNCS 3845, pp. 282–293, 2006, Springer-Verlag Berlin Heidelberg 2006
Index Terms

Computer Science
Information Sciences

Keywords

Arithmetic coding compression ratio DTD DFA XMLPrFAST