International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 99 - Number 14 |
Year of Publication: 2014 |
Authors: Shraddha Sarode, Jayant Gadge |
10.5120/17443-8245 |
Shraddha Sarode, Jayant Gadge . Approach for Dimensionality Reduction in Web Page Classification. International Journal of Computer Applications. 99, 14 ( August 2014), 32-37. DOI=10.5120/17443-8245
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant terms have to be removed from web pages. This research describes hybrid approach for dimensionality reduction in web page classification using a rough set and naïve Bayesian method. Feature selection and dimensionality reduction methods are used for reducing the dimensionality. Information gain method is used as feature selection method. Rough set based Quick Reduct algorithm is used for dimensionality reduction. Naïve Bayesian method is used for classifying web pages to optimal predefined categories. Assignment of web pages to category is based on maximum posterior probability. Words remaining after the process of feature selection and dimensionality reduction will be given to the classifier. Finally the classifier will assign most optimal predefined category to web pages.