We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

A Noise Reduction Approach based on n x 1 Table and XSL Display Method for Efficient Web Data Extraction

by Neeraj Raheja, V. K. Katiyar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 64 - Number 11
Year of Publication: 2013
Authors: Neeraj Raheja, V. K. Katiyar
10.5120/10677-5552

Neeraj Raheja, V. K. Katiyar . A Noise Reduction Approach based on n x 1 Table and XSL Display Method for Efficient Web Data Extraction. International Journal of Computer Applications. 64, 11 ( February 2013), 12-17. DOI=10.5120/10677-5552

@article{ 10.5120/10677-5552,
author = { Neeraj Raheja, V. K. Katiyar },
title = { A Noise Reduction Approach based on n x 1 Table and XSL Display Method for Efficient Web Data Extraction },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 64 },
number = { 11 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 12-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume64/number11/10677-5552/ },
doi = { 10.5120/10677-5552 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:16:07.594454+05:30
%A Neeraj Raheja
%A V. K. Katiyar
%T A Noise Reduction Approach based on n x 1 Table and XSL Display Method for Efficient Web Data Extraction
%J International Journal of Computer Applications
%@ 0975-8887
%V 64
%N 11
%P 12-17
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A web page which is a source of information consist lots of parts among which only a part of the information is useful for a particular application and the remaining information are noises. An effective technique for users to extract the useful information from the total information is urgently required. Hence by removing those noise patterns from the web page, the efficiency of the web data extraction can be improved. This research work propose an approach for removing the local noise from a given web page based on n x 1 table and XSL display method with filter feature for improving the efficiency of web data extraction.

References
  1. Deng Cai1, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma, "Extracting Content Structure for Web Pages based on Visual Representation", In Proceedings of the 5th Asia-Pacific Web Conference on Web Technologies and Applications, pp. 406-417, Xian, China, 2003.
  2. G. Poonkuzhali, K. Thiagarajan, K. Sarukesi and G. V. Uma, "Signed Approach for Mining Web Content Outliers", World Academy of Science, Engineering and Technology, Vol. 56, pp. 820-824, 2009.
  3. Malik Agyemang, Ken Barker and Rada S. Alhajj, "Mining Web Content Outliers using Structure Oriented Weighting Techniques and N-Grams", In Proceedings of the ACM Annual Symposium on Applied Computing, pp. 482-487, New Mexico, March 2005.
  4. P. Sivakumar, R. M. S Parvathi , "An Efficient Approach of Noise Removal from Web Page for Effectual Web Content Mining" European Journal of Scientific Research ISSN 1450-216X Vol. 50 No. 3 , pp. 340-351,2011.
  5. Manisha Marathe, S. H. Patil, G. V. Garje and M. S. Bewoor, "Extracting Content Blocks from Web Pages", International Journal of Recent Trends in Engineering (IJRTE), Vol. 2, No. 4, pp. 62-64, November 2009.
  6. Sandip Debnath, Prasenjit Mitra and C. Lee Giles, "Automatic Extraction of Informative Blocks from Web Pages", In Proceedings of the ACM symposium on applied computing, pp. 1722 – 1726, Santa Fe, New Mexico, 2005.
  7. Lan Yi and Bing Liu, "Web Page Cleaning for Web Mining Through Feature Weighting", In Proceedings of the 18th International Joint Conference on Artificial Intelligence,Vol. 18, pp. 43-50, August 09 - 15, Acapulco, Mexico, 2003.
  8. Ruihua Song, Haifeng Liu, Ji-Rong Wen and Wei-Ying Ma, "Learning Important Models for Web Page Blocks based on Layout and Content Analysis", ACM SIGKDD Explorations Newsletter, Vol. 6, No. 2, pp. 14 - 23, 2004.
  9. Byeong Ho Kang and Yang Sok Kim, "Noise Elimination from The Web Documents By Using URL Paths and Information Redundancy", In Proceedings of the International Conference on Information & Knowledge Engineering, Las Vegas, Nevada, US, pp. 26-29, 2006.
  10. Ye Shiren, Chua Tat-Seng. Detecting and Partitioning Data Objects in Complex Web Pages, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence[C]. Washington: IEEE Computer Society,pp. 669-672, 2004.
  11. Ye Shiren, Chua. Tat-Seng Learning object models from semistructured Web documents [J]. IEEE Transactions on Knowledge and Data Engineering, pp. 334-339, 2006.
  12. Lin Shian-Hua, Ho Jan-Ming. Discovering informative content blocks from Web documents, Proceedings of the eighth ACM SIGKDD[C]. New York: ACE, pp. 588-593, 2002.
  13. Haitao YAO, Zhiyi YIN, Fuxi ZHU and Changsheng GONG "The Noise Reduction Method of Web Pages Based on Image Features International Conference on Computational Intelligence and Software Engineering, pp. 1-5, CiSE 2009.
  14. Yan Guo, Huifeng Tang, Linhai Song, Yu Wang and Guodong Ding, "ECON: An Approach to Extract Content from Web News Page", In Proceedings of the 12th International Asia-Pacific Web Conference (APWEB), pp. 314 – 320, April 06-08, Buscan, Korea, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

noise reduction web data mining data extraction XML XSL