Signed-With-Weight Technique for Mining Web Content Outliers

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Signed-With-Weight Technique for Mining Web Content Outliers

Published on February 2013 by S. Poonkuzhali, P. Sudhakar, K. Sarukesi

International Conference on Communication, Computing and Information Technology

Foundation of Computer Science USA

ICCCMIT - Number 2

February 2013

Authors: S. Poonkuzhali, P. Sudhakar, K. Sarukesi

1c379192-2ad1-4bb6-a3c8-0bb6fc7984b8

S. Poonkuzhali, P. Sudhakar, K. Sarukesi . Signed-With-Weight Technique for Mining Web Content Outliers. International Conference on Communication, Computing and Information Technology. ICCCMIT, 2 (February 2013), 40-45.

@article{

author = { S. Poonkuzhali, P. Sudhakar, K. Sarukesi },

title = { Signed-With-Weight Technique for Mining Web Content Outliers },

journal = { International Conference on Communication, Computing and Information Technology },

issue_date = { February 2013 },

volume = { ICCCMIT },

number = { 2 },

month = { February },

year = { 2013 },

issn = 0975-8887,

pages = { 40-45 },

numpages = 6,

url = { /specialissues/icccmit/number2/10336-1021/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Special Issue Article

%1 International Conference on Communication, Computing and Information Technology

%A S. Poonkuzhali

%A P. Sudhakar

%A K. Sarukesi

%T Signed-With-Weight Technique for Mining Web Content Outliers

%J International Conference on Communication, Computing and Information Technology

%@ 0975-8887

%V ICCCMIT

%N 2

%P 40-45

%D 2013

%I International Journal of Computer Applications

Abstract

Web outlier mining is dedicated for finding web pages which differ significantly from the rest of the web document taken from the same category. Most of the existing algorithms for web content outlier mining is developed for structured documents, whereas WWW contains mostly unstructured and semi structured documents. Moreover, the false positive rate in the existing algorithms for mining web content outlier is more than 30%. Therefore, there is need to develop a technique to mine web outliers from unstructured and semi structured document types with less false positive rate. This paper, concentrates on mining web content outliers which extracts the dissimilar web document taken from the group of documents of same domain. The proposed work implement a novel mathematical approach based on signed-with-weight technique for mining web content outliers which retrieves top n outlier web documents from both structured and unstructured web documents. The proven results show the performance measure of this approach in terms of precision and recall is more than 90%. Also, the false positive rate of this algorithm is less than 15%.

References

Ali S. Hadi,A. H. M. Rahmatullah Imon(2009), Mark Werner, Detection of outliers Overview, Wiley Interdisciplinary Reviews: Computational Statistics, Volume 1, Issue 1, pp-57-70.
Anguilli, F. , and Pizzuti, C. , Elomaa,T. (Eds. ). Fast Outlier Detection in High Dimensional Spaces. PKDD, LNAI 2431, 2002, pp 15-27
Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web Content Mining , SIGKDD Explorations, Volume 6, Issue 2.
Breunig, M. M. , Kriegel, H-P. , Ng R. T. , and Sander, J. LOF: Identifying Outliers in Large Dataset. Proc. of ACM SIGMOD 2000, Dallas, TX 2000.
Barnett, V. and Lewis, T. Outliers in Statistical Data. John Willey, 1994
G Poonkuzhali, K Thiagarajan and K Sarukesi, Set theoretical Approach for mining web content through outliers detection International journal on research and industrial applications, Vol. 2, 2009, pp. 131-138
G Poonkuzhali, K Thiagarajan, K Sarukesi and G V Uma, Signed approach for mining web content outliers. Proceedings of World Academy of Science, Engineering and Technology, Volume 56, 2009, pp -820-824.
G. Poonkuzhali, R. Kishore Kumar, R. Kripa Keshav and K. Sarukesi paper titled "Statistical Approach for Improving the Quality of Search Engine" " in the Book " RECENT RESEARCHES IN APPLIED COMPUTER AND APPLIED COMPUTATIONAL SCIENCE", included in ISI/SCI Web of Science and Web of Knowledge,Venice, Italy, 2011, pp-89-93.
Malik Agyemang, Ken Barker and Rada S. Alhajj, Framework for Mining Web Content Outliersb. In: ACM Symposium on Applied Computing, Nicosia, Cyprus, 2004, pp 590-594.
Malik Agyemang, Ken Barker, Reda Alhajj, Web outlier mining: Discovering outliers from web datasets, Intelligent Data Analysis,Vol. 9, No (5)/2005, pp 473-486
Malik Agyemang, Ken Barker and Rada S. Alhajj Mining Web Content Outliers using Structure Oriented Weighting Techniques and N-Grams' ACM Symposium on Applied Computing. , Santa Fe, New Mexico,2005, pp 482-487.
Malik Agyemang Ken Barker and Rada S. Alhajj WCOND âMine : Algorithm for detecting Web Content Outliers from Web Documents. IEEE Symposium on Computers and Communication. 2005.
Malik Agyemang Ken Barker and Rada S. Alhajj, Hybrid Approach to Web Content Outlier Mining without Query Vector. Springer âBerlin, 2005,Vol. 3589.
Malik Agyemang, Ken Barker, Reda Alhajj, A comprehensive survey of numeric and symbolic outlier mining techniques, Intelligent Data Analysis,Vol. 10, No (6)/2006, pp 521-538.
Ramaswamy S, Rastogi R, Shim k, Efficient Algorithm for mining outliers from large data sets, proc. Of ACM SIGMOD 2000, pp 127 â 138.
Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey, ACM SIGKDD, July 2000, Vol-2, pp 1-15.
Xia Huosong, Fan Zhaoyan, Peng Liuyan, "Chinese Web Text Outlier Mining Based on Domain Knowledge," Intelligent Systems, WRI Global Congress on, vol. 2, pp. 73-77, 2010 Second WRI Global Congress on Intelligent Systems, 2010

Index Terms

Computer Science

Information Sciences

Keywords

Dissimilarity Weight Outlier Mining Term Frequency Weighted Approach Web Content Mining Web Content Outliers