CFP last date
20 January 2025
Reseach Article

A Methodology for the Usage of Side Data in Content Mining

by Solunke B.R., Priyanka S. Muttur, Amol U. Kuntham, Seema S.chavan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 112 - Number 6
Year of Publication: 2015
Authors: Solunke B.R., Priyanka S. Muttur, Amol U. Kuntham, Seema S.chavan
10.5120/19667-1101

Solunke B.R., Priyanka S. Muttur, Amol U. Kuntham, Seema S.chavan . A Methodology for the Usage of Side Data in Content Mining. International Journal of Computer Applications. 112, 6 ( February 2015), 1-8. DOI=10.5120/19667-1101

@article{ 10.5120/19667-1101,
author = { Solunke B.R., Priyanka S. Muttur, Amol U. Kuntham, Seema S.chavan },
title = { A Methodology for the Usage of Side Data in Content Mining },
journal = { International Journal of Computer Applications },
issue_date = { February 2015 },
volume = { 112 },
number = { 6 },
month = { February },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume112/number6/19667-1101/ },
doi = { 10.5120/19667-1101 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:48:42.057355+05:30
%A Solunke B.R.
%A Priyanka S. Muttur
%A Amol U. Kuntham
%A Seema S.chavan
%T A Methodology for the Usage of Side Data in Content Mining
%J International Journal of Computer Applications
%@ 0975-8887
%V 112
%N 6
%P 1-8
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Compelling In different text mining applications, side-information is accessible close-by the text records. Such side-information may be of distinctive sorts, case in point, report provenance information, the relationship in the record, client access conduct from web logs, or other non-textual properties which are embedded into the text document. Such qualities may contain a monster measure of information for clustering purposes. On the other hand, the relative targets of this side-information may be hard to gage, particularly precisely when a portion of the information is uproarious. In such cases, it can be dangerous to unite side-information into the mining logic, in light of the way that it can either redesign the method for the representation for the mining process, or can add unsettling influence to the system. Subsequently, we oblige a principled strategy to perform the mining system, to build the slant from utilizing this side information. In this paper, we mastermind a processing which joins secured disseminating with probabilistic models so as to make a persuading social occasion method. We then show to broaden the methodology to the approach issue. We show test happens on different true blue information sets to design the focal purposes of utilizing such a method.

References
  1. C. C. Aggarwal and H. Wang, Managing and Mining Graph Data. New York, NY, USA: Springer, 2010.
  2. C. C. Aggarwal, Social Network Data Analytics. New York, NY, USA: Springer, 2011.
  3. C. C. Aggarwal and C. -X. Zhai, Mining Text Data. New York, NY, USA: Springer, 2012.
  4. C. C. Aggarwal and C. -X. Zhai, "A survey of text classification algorithms," in Mining Text Data. New York, NY, USA: Springer, 2012.
  5. C. C. Aggarwal and P. S. Yu, "A framework for clustering mas- sive text and categorical data streams," in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.
  6. C. C. Aggarwal, S. C. Gates, and P. S. Yu, "On using partial supervision for text categorization," IEEE Trans. Knowl. Data Eng. , vol. 16, no. 2, pp. 245–255, Feb. 2004.
  7. C. C. Aggarwal and P. S. Yu, "On text clustering with side information," in Proc. IEEE ICDE Conf. , Washington, DC, USA, 2012.
  8. R. Angelova and S. Siersdorfer, "A neighborhood-based approach for clustering of linked document collections," in Proc. CIKM Conf. , New York, NY, USA, 2006, pp. 778–779.
  9. A. Banerjee and S. Basu, "Topic models over text streams: A study of batch and online unsupervised learning," in Proc. SDM Conf. , 2007, pp. 437–442.
  10. J. Chang and D. Blei, "Relational topic models for document net- works," in Proc. AISTASIS, Clearwater, FL, USA, 2009, pp. 81–88.
  11. D. Cutting, D. Karger, J. Pedersen, and J. Tukey, "Scatter/Gather: A cluster-based approach to browsing large document collec- tions," in Proc. ACM SIGIR Conf. , New York, NY, USA, 1992, pp. 318–329.
  12. I. Dhillon, "Co-clustering documents and words using bipartite spectral graph partitioning," in Proc. ACM KDD Conf. , New York, NY, USA, 2001, pp. 269–274.
  13. I. Dhillon, S. Mallela, and D. Modha, "Information-theoretic co- clustering," in Proc. ACM KDD Conf. , New York, NY, USA, 2003, pp. 89–98.
  14. P. Domingos and M. J. Pazzani, "On the optimality of the sim- ple Bayesian classifier under zero-one loss," Mach. Learn. , vol. 29, no. 2–3, pp. 103–130, 1997.
  15. M. Franz, T. Ward, J. S. McCarley, and W. J. Zhu, "Unsupervised and supervised clustering for topic tracking," in Proc. ACM SIGIR Conf. , New York, NY, USA, 2001, pp. 310–317.
  16. G. P. C. Fung, J. X. Yu, and H. Lu, "Classifying text streams in the presence of concept drifts," in Proc. PAKDD Conf. , Sydney, NSW, Australia, 2004, pp. 373–383.
  17. H. Frigui and O. Nasraoui, "Simultaneous clustering and dynamic keyword weighting for text documents," in Survey of Text Mining, M. Berry, Ed. New York, NY, USA: Springer, 2004, pp. 45–70.
  18. S. Guha, R. Rastogi, and K. Shim, "CURE: An efficient clustering algorithm for large databases," in Proc. ACM SIGMOD Conf. , New York, NY, USA, 1998, pp. 73–84.
  19. S. Guha, R. Rastogi, and K. Shim, "ROCK: A robust cluster- ing algorithm for categorical attributes," Inf. Syst. , vol. 25, no. 5, pp. 345–366, 2000.
  20. Q. He, K. Chang, E. -P. Lim, and J. Zhang, "Bursty feature repre- sentation for clustering text streams," in Proc. SDM Conf. , 2007, pp. 491–496.
  21. A. Jain and R. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ, USA: Prentice-Hall, Inc. , 1988.
  22. T. Liu, S. Liu, Z. Chen, and W. -Y. Ma, "An evaluation of feature selection for text clustering," in Proc. ICML Conf. , Washington, DC, USA, 2003, pp. 488–495.
  23. A. McCallum. (1996). Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering [Online]. Available: http://www. cs. cmu. edu/ mccallum/bow
  24. Q. Mei, D. Cai, D. Zhang, and C. -X. Zhai, "Topic modeling with network regularization," in Proc. WWW Conf. , New York, NY, USA, 2008, pp. 101–110.
  25. R. Ng and J. Han, "Efficient and effective clustering methods for spatial data mining," in Proc. VLDB Conf. , San Francisco, CA, USA, 1994, pp. 144–155.
  26. G. Salton, An Introduction to Modern Information Retrieval. London, U. K. : McGraw Hill, 1983.
  27. H. Schutze and C. Silverstein, "Projections for efficient document clustering," in Proc. ACM SIGIR Conf. , New York, NY, USA, 1997, pp. 74–81.
  28. F. Sebastiani, "Machine learning for automated text categoriza- tion," ACM CSUR, vol. 34, no. 1, pp. 1–47, 2002.
  29. C. Silverstein and J. Pedersen, "Almost-constant time clustering of arbitrary corpus sets," in Proc. ACM SIGIR Conf. , New York, NY, USA, 1997, pp. 60–66.
  30. M. Steinbach, G. Karypis, and V. Kumar, "A comparison of docu- ment clustering techniques," in Proc. Text Mining Workshop KDD, 2000, pp. 109–110.
  31. Y. Sun, J. Han, J. Gao, and Y. Yu, "iTopicModel: Information net- work integrated topic modeling," in Proc. ICDM Conf. , Miami, FL, USA, 2009, pp. 493–502.
  32. W. Xu, X. Liu, and Y. Gong, "Document clustering based on non- negative matrix factorization," in Proc. ACM SIGIR Conf. , New York, NY, USA, 2003, pp. 267–273.
  33. G Fattepurkar, V Bandgar "Fast Compressive Tracking of Robust Object with Kalman Filter" International Journal of Engineering Research & Technology (IJERT)
  34. B Vishal V "A Review on: Automatic Movie Character Annotation by Robust Face-name Graph Matching" International Journal of Computer Applications 104 (october 2014). .
  35. B V. V "An Approach for development of Multitenant application as SaaS cloud" International Journal of Computer Applications 106 (November 2014)
  36. Y. Zhou, H. Cheng, and J. X. Yu, "Graph clustering based on struc- tural/attribute similarities," PVLDB, vol. 2, no. 1, pp. 718–729,2009.
  37. S. Zhong, "Efficient streaming text clustering," Neural Netw. , vol. 18, no. 5–6, pp. 790–798, 2005
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Data mining Text mining.