International Conference on Communication, Computing and Information Technology |
Foundation of Computer Science USA |
ICCCMIT - Number 2 |
February 2013 |
Authors: S. Poonkuzhali, P. Sudhakar, K. Sarukesi |
1c379192-2ad1-4bb6-a3c8-0bb6fc7984b8 |
S. Poonkuzhali, P. Sudhakar, K. Sarukesi . Signed-With-Weight Technique for Mining Web Content Outliers. International Conference on Communication, Computing and Information Technology. ICCCMIT, 2 (February 2013), 40-45.
Web outlier mining is dedicated for finding web pages which differ significantly from the rest of the web document taken from the same category. Most of the existing algorithms for web content outlier mining is developed for structured documents, whereas WWW contains mostly unstructured and semi structured documents. Moreover, the false positive rate in the existing algorithms for mining web content outlier is more than 30%. Therefore, there is need to develop a technique to mine web outliers from unstructured and semi structured document types with less false positive rate. This paper, concentrates on mining web content outliers which extracts the dissimilar web document taken from the group of documents of same domain. The proposed work implement a novel mathematical approach based on signed-with-weight technique for mining web content outliers which retrieves top n outlier web documents from both structured and unstructured web documents. The proven results show the performance measure of this approach in terms of precision and recall is more than 90%. Also, the false positive rate of this algorithm is less than 15%.