CFP last date
20 January 2025
Reseach Article

XML: URL Data Set Creation for Future Web Mining Research Avenues

Published on March 2012 by Krishna Murthy. A, Suresha
International Conference in Computational Intelligence
Foundation of Computer Science USA
ICCIA - Number 3
March 2012
Authors: Krishna Murthy. A, Suresha
ad01605d-e933-4a9e-ac63-35024f597071

Krishna Murthy. A, Suresha . XML: URL Data Set Creation for Future Web Mining Research Avenues. International Conference in Computational Intelligence. ICCIA, 3 (March 2012), 1-4.

@article{
author = { Krishna Murthy. A, Suresha },
title = { XML: URL Data Set Creation for Future Web Mining Research Avenues },
journal = { International Conference in Computational Intelligence },
issue_date = { March 2012 },
volume = { ICCIA },
number = { 3 },
month = { March },
year = { 2012 },
issn = 0975-8887,
pages = { 1-4 },
numpages = 4,
url = { /proceedings/iccia/number3/5111-1024/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference in Computational Intelligence
%A Krishna Murthy. A
%A Suresha
%T XML: URL Data Set Creation for Future Web Mining Research Avenues
%J International Conference in Computational Intelligence
%@ 0975-8887
%V ICCIA
%N 3
%P 1-4
%D 2012
%I International Journal of Computer Applications
Abstract

The rapid expansion of internet has made web a popular place for disseminating and collecting information and also it opens up many research topics on varies research fields. Since last few years, several attempts have been made on Web based research particularly based on HTML web pages because of their huge availability. So that many Research Data Sets have been created and most of them are made available on web. But W3 consortium stated that, HTML does not provide a better description of semantic structure of the web page contents. To overcome this draw backs Web developers started to develop Web page(s) on XML, Flash kind of new technologies [1]. It makes a way for new research methods. This article mainly focuses on Data Set creation on XML Web pages by using Sequential Search, Link Extraction and String based Classification methods for future research avenues on XML Web pages.

References
  1. Book: Ed Tittel, ‘Complete Coverage of XML’, Tata McGraw-Hill Edition.
  2. Book: Magdalini Eirinaki, ‘WEB MINING: A ROADMAP’
  3. Lan Yi, Bing Liu, and Xiaoli Li. , 2003, ‘Eliminating noisy information in web pages for data mining’. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 296{305, New York, NY, USA. ACM.
  4. http://www.w3c.org/DOM/
  5. P.F Xiang et al. 2006 ‘Effective Page Segmentation Combining Pattern Analysis and Visual Separators for Browsing on Small Screens’ Web Intelligenc.
  6. Shumeet Baluja 2006, ‘Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework’. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 33{42, New York, NY, USA. ACM.
  7. Y. Chen, X. Xie, W.-Y. Ma, and H.-J. Zhang, 2005. ‘Adapting web pages for small-screen devices’ Internet Computing, 9(1):50–56.
  8. Xin Yang, Yuanchun Shi, 2009 ‘Enhanced Gestalt Theory Guided Web Page Segmentation for Moile Browsing’ IEEE/WIC/ACM.
  9. Jaideep Srivastava_ y , Robert Cooley, et al, 2000, ‘Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data’ Volume 1, Issue 2 - page 12, ACM SIGKDD.
  10. Abraham. A, ‘Business Intelligence from Web Usage Mining’, Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co., Singapore, Vol. 2, No. 4, pp. 375-390, 003.
  11. Soumen Chakrabarti, 2000, ‘Data mining for hypertext: A tutorial survey’ Volume 1, Issue 2 - page 1 ACM SIGKDD.
  12. Soumen Chakrabarti, Byron E. Dom et al, ‘Mining the Link Structure of the World Wide Web’ 1999. _IBM Almaden Research Center, 650 Harry Road, San Jose CA 95120.
  13. C. Kohlsch utter and W. Nejdl. 2008, ‘A Densitometric Approach to Web Page Segmentation’. In ACM 17th Conf. on Information and Knowledge Management (CIKM 2008), 2008.
  14. Christian Kohlschutter, Peter Fankhauser, Wolfgang Nejdl, 2010 ‘Boilerplate Detection using Shallow Text Features’, WSDM, New York, USA, ACM.
  15. G. Poonkuzhali, K.Thiagarajan, and K.Sarukesi, 2009 ‘Signed Approach for Mining Web content Outliers’, World Academy of Science, Engineering and Technology 56.
  16. Bar-Yossef, Z. and Rajagopalan, S., 2002 ‘Template Detection via Data Mining and its Applications’. In Proceedings of the 11th International World Wide Web Conference (WWW2002).
  17. Lin, S.-H. and Ho, J.-M., 2002, ’Discovering Informative Content Blocks from Web Documents’. In Proceedings of ACM SIGKDD'02.
Index Terms

Computer Science
Information Sciences

Keywords

URL data set XML URL’s URL Extraction URL Classification