International Conference and Workshop on Emerging Trends in Technology |
Foundation of Computer Science USA |
ICWET2012 - Number 12 |
March 2012 |
Authors: Komal Kumar Bhatia, Atul Srivastava, Veena Garg |
Komal Kumar Bhatia, Atul Srivastava, Veena Garg . VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes. International Conference and Workshop on Emerging Trends in Technology. ICWET2012, 12 (March 2012), 37-41.
Classification is a technique, used in data mining, for identification of membership of a particular data object. In this paper we provide a technique of classification that is an enhancement of an existing method of information retrieval i.e. Vector Space Model. Vector space model is applied on text data and generally used to determine the relevance of query to the web pages in information retrieval. Data objects are categorized in two communities based on their attributes, one having discrete-valued attributes and second having continuous-valued attributes. In almost every previous attempt in this area has treated both of the communities of data objects separately. For scalability point of view of the classifier one type (discrete/continuous) is converted to the other (continuous/discrete).This conversion sometimes may hamper the accuracy. But in this paper continuous and discrete attributes are treated individually without tempering their representation. This paper emulates VSM to be used for classification in the same way it is used for determining query relevance in information retrieval. The results show that the enhanced model achieved very good results in performance and the setup time is also satisfactory for a large collection of data objects. This paper is organized as section 1 contains the basic terminology about classification and introduction of vector space model, section 2 contains the related work that has already been done in literature, section 3 contains model construction for classification i.e. simulation of existing vector space model for information retrieval and use of this model for classification of unseen data tuple, section 4 contains pseudo code for VSM classification. Section 5 shows experiment and results analysis through an example. Section 6 concludes the paper and throws light on future aspects.