CFP last date
20 January 2025
Reseach Article

Document Clustering in Forensic Investigation by Hybrid Approach

by G. Thilagavathi, J. Anitha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 91 - Number 3
Year of Publication: 2014
Authors: G. Thilagavathi, J. Anitha
10.5120/15860-4784

G. Thilagavathi, J. Anitha . Document Clustering in Forensic Investigation by Hybrid Approach. International Journal of Computer Applications. 91, 3 ( April 2014), 14-19. DOI=10.5120/15860-4784

@article{ 10.5120/15860-4784,
author = { G. Thilagavathi, J. Anitha },
title = { Document Clustering in Forensic Investigation by Hybrid Approach },
journal = { International Journal of Computer Applications },
issue_date = { April 2014 },
volume = { 91 },
number = { 3 },
month = { April },
year = { 2014 },
issn = { 0975-8887 },
pages = { 14-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume91/number3/15860-4784/ },
doi = { 10.5120/15860-4784 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:11:47.963606+05:30
%A G. Thilagavathi
%A J. Anitha
%T Document Clustering in Forensic Investigation by Hybrid Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 91
%N 3
%P 14-19
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Digital Forensic Investigation is the branch of scientific forensic process for investigation of material found in digital devices related to computer crimes. Digital evidence analogous to particular incident is any digital data that provides hypothesis about incident. The essential part of Digital forensic Process is to analyze the documents present on suspect's computer. Due to increasing count of documents and larger size of storage devices makes very difficult to analyze the documents on computer. To overcome these problems, a subject based semantic document clustering algorithm along with bisecting-kmeans has been proposed that allows the examiner to analyze and cluster the documents based on particular subject and also the terms that does not belong to any subject. The accuracy of clustering of documents has been improved by means of this hybrid approach.

References
  1. S. Banerjee, Adapting the lesk algorithm for word sense disambiguation to wordnet, Ph. D. thesis, University of Minnesota, 2002.
  2. J. Becker, D. Kuropka, Topic-based Vector Space Model, Proceedings of the 6th International Conference on Business Information Systems, Colorado Springs,2003.
  3. B. D. Carrier, E. H. Spafford, An event-based digital forensic investigation framework, Proceedings of the 4th Digital Forensic Research Workshop, 2004.
  4. G. Costa, G. Manco, R. Ortale, E. Ritacco, Hierarchical clustering of xml documents focused on structural components, Data & Knowledge Engineering 84(2013) 26–46.
  5. S. Dumais, J. Platt, D. Heckerman, M. Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the 7th International Conference on Information and Knowledge Management, ACM, New York, NY, USA, 1998.
  6. B. C. M. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent item sets, Proceedings of the 3rd SIAM International Conference on Data Mining (SDM), SIAM, San Francisco, CA, 2003.
  7. Y. Hu, E. E. Milios, J. Blustein, Semi-supervised document clustering with dual supervision through seeding, Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC '12, ACM, New York, NY, USA, 2012.
  8. A. K. Jain, Data clustering: 50 years beyond k-means, Pattern Recognition Letters 31 (8) (2010) 651–666.
  9. A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc. , Upper Saddle River, NJ, USA, 1988.
  10. T. Joachims, Text categorization with support vector machines: learning with many relevant features, Proceedings of the 10th European Conference on Machine Learning, Springer-Verlag, London, UK, 1998.
  11. G. A. Miller, WordNet: a lexical database for English, Communications of the ACM 38 (1995) 39–41.
  12. R. T. Ng, J. Han, Efficient and effective clustering methods for spatial data mining, Proceedings of the 20th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. ,San Francisco, CA, USA, 1994.
  13. A. Polyvyanyy, D. Kuropka, A Quantitative Evaluation of the Enhanced Topic-based Vector Space Model, Universitätsverlag Potsdam, 2007.
  14. G. Salton, Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Longman Publishing Co. , Inc. ,Boston, MA, USA, 1989.
  15. M. Steinbach, G. Karypis, V. Kumar, A Comparison of Document Clustering Techniques, 2000.
  16. Y. Zhao, G. Karypis, Topic-driven clustering for document datasets, Proceedings of the SIAM Data Mining Conference (SDM), 2005.
  17. Gaby G. Dagher, Benjamin C. M. Fung ,"Subject-based semantic document clustering for digital forensic investigations ",Data & Knowledge Engineering 86 (2013) 224–241
  18. M. F. Porter. The Porter Stemming Algorithm. www. tartus. org/martin/PorterStemmer
  19. B. S. Vamsi Krishna, P. Satheesh, Suneel Kumar R," Comparative Study of K-means and Bisecting k-means Techniques in Wordnet Based Document Clustering", International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-6, August 2012
Index Terms

Computer Science
Information Sciences

Keywords

Digital Forensic Stemming Term Importance Subject –based semantic clustering Document-subject similarity