International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 131 - Number 4 |
Year of Publication: 2015 |
Authors: Kritika Bhowmik, Tejal Aher, Vaibhav Kale, K. Rajeswari, M. Karthikeyan |
10.5120/ijca2015907296 |
Kritika Bhowmik, Tejal Aher, Vaibhav Kale, K. Rajeswari, M. Karthikeyan . Hadoop based Text Mining System for Identification of Chemicals Associated with Disease of Interest. International Journal of Computer Applications. 131, 4 ( December 2015), 26-29. DOI=10.5120/ijca2015907296
With huge amounts of biomedical data being generated day by day extracting statistical information about the chemicals mentioned in such huge databases manually is tedious and time consuming. Our system is mainly designed for naive users, which aims to automate data collection and knowledge extraction from chemical literature in a user friendly and efficient way on the hadoop platform. The system downloads the abstracts related to the disease of interest from Pubmed database. The text of the abstracts is then extensively parsed for chemicals such as protein/gene names and chemical compound names and classified into different classes. This analysis would prove to be helpful in various biomedical and pharmaceutical industries. The extraction of important information will be done using the Ling Pipe API wherein a training dataset is given to this Ling Pipe which classifies the extracted bioentities in the respective classes. The system being deployed on hadoop platform provides a scalable and distributed system which processes huge number of abstracts in a short time and with high efficiency. The system also provides a user friendly user interface for easy use of the hadoop system for non technical users.