International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 165 - Number 3 |
Year of Publication: 2017 |
Authors: Shailendra Singh Kathait, Shubhrita Tiwari |
10.5120/ijca2017913825 |
Shailendra Singh Kathait, Shubhrita Tiwari . Unsupervised Tagging of Chinese Articles. International Journal of Computer Applications. 165, 3 ( May 2017), 29-32. DOI=10.5120/ijca2017913825
Large amount of insights can be drawn from the articles that are published online. Instead of manually reading all the articles and assigning relevant tags to them satisfying the content, it will be highly efficient if there exists an automated process for performing the task. In this paper, an unsupervised approach for the automated tagging of articles in Chinese language has been implemented. The input is an article and output is the tags to that article. The major challenge is the segmentation of the Chinese characters, which do not make use of separators unlike the English characters. To overcome this, different approaches are combined together in order to get accurate results. Efficient tagging of articles is required, which can be used for many applications in the analysis, one of which is in Recommendation Engine. The tagging process should consider all the aspects of the article and assign the most relevant tags accordingly. The proposed algorithm was implemented for a Chinese Publication House and relevant tags were assigned to its articles of different categories. At the end of the project, the results were manually checked for, in a corpus of 10000 Chinese articles, which reflected the attainment of overall accuracy of around 85%, greater than that obtained through different traditional methods.