Efficient Approach to find Bigram Frequency in Text Document using E-VSM

Ankit Bhakkad; S. C. Dharamadhikari; Parag Kulkarni

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Generating a Glaucoma Diagnosis Report Using Deep Learning and Humphrey Visual Field Data

Tasneem Abdalgadir Salsabil A. El-Regaily Thanaa H. Mohamed El-Sayed M. El-Horbaty

Random Articles

Reseach Article

Efficient Approach to find Bigram Frequency in Text Document using E-VSM

by Ankit Bhakkad, S. C. Dharamadhikari, Parag Kulkarni

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 68 - Number 19

Year of Publication: 2013

Authors: Ankit Bhakkad, S. C. Dharamadhikari, Parag Kulkarni

10.5120/11686-7356

Ankit Bhakkad, S. C. Dharamadhikari, Parag Kulkarni . Efficient Approach to find Bigram Frequency in Text Document using E-VSM. International Journal of Computer Applications. 68, 19 ( April 2013), 9-11. DOI=10.5120/11686-7356

@article{ 10.5120/11686-7356,

author = { Ankit Bhakkad, S. C. Dharamadhikari, Parag Kulkarni },

title = { Efficient Approach to find Bigram Frequency in Text Document using E-VSM },

journal = { International Journal of Computer Applications },

issue_date = { April 2013 },

volume = { 68 },

number = { 19 },

month = { April },

year = { 2013 },

issn = { 0975-8887 },

pages = { 9-11 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume68/number19/11686-7356/ },

doi = { 10.5120/11686-7356 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:28:18.047942+05:30

%A Ankit Bhakkad

%A S. C. Dharamadhikari

%A Parag Kulkarni

%T Efficient Approach to find Bigram Frequency in Text Document using E-VSM

%J International Journal of Computer Applications

%@ 0975-8887

%V 68

%N 19

%P 9-11

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper proposes a novel and efficient approach to calculate bigram frequency which uses E-VSM as basis to represent text document. E-VSM: Enhanced-Vector Space Model is nothing but an extension to simple VSM which stores positions of tokens in addition to their frequency in document. Many recent methodologies in Information Retrieval and Text Mining have used bigram along with unigram since bigram gives more information gain than unigrams. Also recent efforts to provide more richer text document representation than simple 'Bag of Words' have also used bigram along with unigram. Proposed approach to calculate bigram frequency outperforms state-of-art in terms of time complexity. Analysis show that proposed approach improves time complexity to significant extent.

References

Matthew A. Russel,"Mining the Social Web", O'Reilly (2011), chapter 7, pp 224-229
Braga, Igor, Maria Monard, and Edson Matsubara (2009), "Combining unigrams and bigrams in semi-supervised text classification", Proceedings of Progress in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence (EPIA 2009), Aveiro, pp. 489-500.
Yashodhara Haribhakta, Arti Malgaokar and Dr. Parag Kulkarni, "Unsupervised Topic Detection Model and Its Application in Text Categorization", 2012 ACM 978-1-4503-1185-4/12/09
Ankit Bhakkad, S. C. Dharmadhikari, Parag Kulkarni and M. Emmanuel, "E-VSM : Novel Text Representation Model to Capture Context-based Closeness between two Text documents", Proceedings of 7th International Conference on Intelligent Systems and Control (ISCO 2013), Coimbatore, India, pp. 345-348.
R. Bekkerman and J. Allan. , "Using bigrams in text Categorization", Technical Report IR-408, Department of Computer Science, University of Massachusetts, Amherst, MA, 2004.
M. Tan, Y. F. Wang, and C. D. Lee. , "The use of bigrams to enhance text categorization", Information Processing and Management, 38(4):529–546, 2002.
T. Dumais, J. Platt, D. Heckerman, and M. Sahami, "Inductive learning algorithms and representations for text categorization", In Proceedings of CIKM'98, 7th ACM International Conference on Information and Knowledge Management, pages 148–155, Bethesda, US, 1998. ACM Press, New York, US.

Index Terms

Computer Science

Information Sciences

Keywords

E-VSM bigram trigram n-gram frequency count