International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 68 - Number 19 |
Year of Publication: 2013 |
Authors: Ankit Bhakkad, S. C. Dharamadhikari, Parag Kulkarni |
10.5120/11686-7356 |
Ankit Bhakkad, S. C. Dharamadhikari, Parag Kulkarni . Efficient Approach to find Bigram Frequency in Text Document using E-VSM. International Journal of Computer Applications. 68, 19 ( April 2013), 9-11. DOI=10.5120/11686-7356
This paper proposes a novel and efficient approach to calculate bigram frequency which uses E-VSM as basis to represent text document. E-VSM: Enhanced-Vector Space Model is nothing but an extension to simple VSM which stores positions of tokens in addition to their frequency in document. Many recent methodologies in Information Retrieval and Text Mining have used bigram along with unigram since bigram gives more information gain than unigrams. Also recent efforts to provide more richer text document representation than simple 'Bag of Words' have also used bigram along with unigram. Proposed approach to calculate bigram frequency outperforms state-of-art in terms of time complexity. Analysis show that proposed approach improves time complexity to significant extent.