Text Document Tokenization for Word Frequency Count using Rapid miner

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Text Document Tokenization for Word Frequency Count using Rapid miner

Published on August 2015 by Gaurav Gupta, Sumit Malhotra

International Conference on Advancements in Engineering and Technology

Foundation of Computer Science USA

ICAET2015 - Number 12

August 2015

Authors: Gaurav Gupta, Sumit Malhotra

Gaurav Gupta, Sumit Malhotra . Text Document Tokenization for Word Frequency Count using Rapid miner. International Conference on Advancements in Engineering and Technology. ICAET2015, 12 (August 2015), 24-26.

@article{

author = { Gaurav Gupta, Sumit Malhotra },

title = { Text Document Tokenization for Word Frequency Count using Rapid miner },

journal = { International Conference on Advancements in Engineering and Technology },

issue_date = { August 2015 },

volume = { ICAET2015 },

number = { 12 },

month = { August },

year = { 2015 },

issn = 0975-8887,

pages = { 24-26 },

numpages = 3,

url = { /proceedings/icaet2015/number12/22291-4172/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Advancements in Engineering and Technology

%A Gaurav Gupta

%A Sumit Malhotra

%T Text Document Tokenization for Word Frequency Count using Rapid miner

%J International Conference on Advancements in Engineering and Technology

%@ 0975-8887

%V ICAET2015

%N 12

%P 24-26

%D 2015

%I International Journal of Computer Applications

Abstract

Text mining, at times alluded to as content information mining, is harshly equal to content investigation, which alludes to the procedure of determining astounding data from content. RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The word frequency counter allows you to count the frequency usage of each word in your document. Applying tokenization and word frequency counter for a text document (resume in this case) helps us find out occurrence of each word in a document but there is no provision to find a particular word frequency occurrence according to user choice.

References

Textminingfromhttp://en. wikipedia. org/wiki/Text_mining.
RapidMinerfromhttp://en. wikipedia. org/wiki/RapidMiner.
RapidMinerStudiofromhttp://rapidminer. com/products/ rapidminer-studio/.
To find frequency of the words using RapidMiner(2012). Retrieved June 22, 2012, from http:// gunjanaaggarwal. blogspot. in/2012/07/words-frequency- text-analytics. html.
Value and benefits of text mining from http://www. jisc. ac. uk/reports/value-and-benefits-of-text-mining.
Tanu Verma,Renu,Deepti Gaur,"Tokenization and FilteringProcess in RapidMiner", International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 ,Volume 7– No. 2, April 2014.
Jordan Shterev,"Demo: Using RapidMiner for Text Mining",Digital Presentation and Preservation of Cultural and ScientificHeritage (Digital Presentation and Preservation of Cultural andScientific Heritage), issue: III / 2013, pages: 254256
TipawanSilwattananusarnand Assoc. Prof. Dr. KulthidaTuamsuk,"Data Mining and Its Applications for KnowledgeManagement::A Literature Review from 2007to 2012"International Journal of Data Mining & KnowledgeManagement Process(IJDKP) Vol. 2, No. 5, September 2012.

Index Terms

Computer Science

Information Sciences

Keywords

Rapidminer rapidminer Text Processing Rapidminer Process Document From File Operator Rapidminer Transform Case Operator Rapidminer Tokenize Operator.