CFP last date
20 December 2024
Reseach Article

Agent for Documents Clustering using Semantic-based Model and Fuzzy

by Khaled M. Fouad, Moataz O. Hassan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 62 - Number 3
Year of Publication: 2013
Authors: Khaled M. Fouad, Moataz O. Hassan
10.5120/10059-4651

Khaled M. Fouad, Moataz O. Hassan . Agent for Documents Clustering using Semantic-based Model and Fuzzy. International Journal of Computer Applications. 62, 3 ( January 2013), 10-16. DOI=10.5120/10059-4651

@article{ 10.5120/10059-4651,
author = { Khaled M. Fouad, Moataz O. Hassan },
title = { Agent for Documents Clustering using Semantic-based Model and Fuzzy },
journal = { International Journal of Computer Applications },
issue_date = { January 2013 },
volume = { 62 },
number = { 3 },
month = { January },
year = { 2013 },
issn = { 0975-8887 },
pages = { 10-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume62/number3/10059-4651/ },
doi = { 10.5120/10059-4651 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:11:07.702833+05:30
%A Khaled M. Fouad
%A Moataz O. Hassan
%T Agent for Documents Clustering using Semantic-based Model and Fuzzy
%J International Journal of Computer Applications
%@ 0975-8887
%V 62
%N 3
%P 10-16
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large sets of documents into a small number of meaningful clusters. Many fuzzy clustering algorithms, such as K-means, deal with documents as bag of words. The bag of words representation method used for these clustering is often unsatisfactory because it ignores the semantic of words. The proposed agent exploits WordNet ontology to create low dimensional feature vector which allows us to develop an efficient clustering algorithm. A new semantic-based model, that represents documents based on semantic concepts of words, is proposed. The proposed approach aims at increasing the performance of information retrieval process by enhancing the document clustering. The accuracy and the speed of clustering have been examined before and after combining ontology with Vector Space Model (VSM). Experimental results demonstrate that using semantic-based model and fuzzy clustering enhances the clustering quality of sets of documents.

References
  1. Oikonomakou, N. & Vazirgiannis, M. (2010). A Review of Web Document Clustering Approaches. In: Data Mining and Knowledge Discovery Handbook, 2nd edition. DOI 10. 1007/978-0-387-09823-4_48, Springer Science+Business Media.
  2. TONG, T. (2010). Semantic frameworks for document and ontology clustering. A dissertation in Computer Science and Computer Networking Presented to the Faculty of the University of Missouri–Kansas City. .
  3. Viswanth, p. , Patra, b. & Babu, v. (2009). Some Efficient and Fast Approaches to Document Clustering. In: Handbook of Research on Text and Web Mining Technologies, 181-188 pp, DOI: 10. 4018/978-1-59904-990-8. ch011. IGI Global.
  4. Zhao, Y. , Cao, L. , Zhang, H. & Zhang, C. (2009). Data Clustering. In: Handbook of Research on Innovations in Database Technologies and Applications: Current and Future Trends. 562-572 pp. DOI: 10. 4018/978-1-60566-242-8. ch060. IGI Global.
  5. Fellbaum, C. (2010). WordNet. Theory and Applications of Ontology: Computer Applications, 231, PP: 231-243, Springer Science+Business Media B. V.
  6. Drakshayani, B. & Prasad, E. (2012). Text Document Clustering based on Semantics. International Journal of Computer Applications (0975 – 8887). Vol. 45– No. 4.
  7. Luo, C. , Li, Y. & Chung, S. (2009). Text document clustering based on neighbors. Data & Knowledge Engineering 68 (2009) 1271–1288. Elsevier B. V.
  8. Shah, N & Mahajan, S. (2012). Semantic based Document Clustering: A Detailed Review. International Journal of Computer Applications (0975 – 8887). Vol. 52– No. 5.
  9. Gharib, T. , Fouad, M. , Mashat, A. & Bidawi, I. (2012). Self Organizing Map -based Document Clustering Using WordNet Ontologies, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, No 2.
  10. Thangamani, M. & Thangaraj. P. (2010). Ontology Based Fuzzy Document Clustering Scheme. Modern Applied Science. Vol. 4, No. 7.
  11. Fodeh, S. , Punch, B. & Tan P. (2011). On ontology-driven document clustering using core semantic features. Knowl Inf Syst (2011) 28:395–421. DOI 10. 1007/s10115-010-0370-4. Springer-Verlag London Limited.
  12. Gharib, T. , Fouad, M. & Aref, M. (2010). Fuzzy Document Clustering Approach using WordNet Lexical Categories. In: Advanced Techniques in Computing Sciences and Software Engineering. DOI 10. 1007/978-90-481-3660-5, Springer Science+Business Media.
  13. Georgakarakou, C. ?. , & Economides, A. A. (2008). Software Agent Technology: An Overview. Software Applications: Concepts, Methodologies, Tools, and Applications. 128-151 pp. IGI Global.
  14. M. Oded, R. Lior. (2010). A survey of Clustering Algorithms. In: Data Mining and Knowledge Discovery Handbook Second Edition. DOI 10. 1007/978-0-387-09823-4_14. Springer Science+Business Media.
  15. Maria, I. & Loke, S. (2010). The Impact of Ontology on the Performance of Information Retrieval: A Case of WordNet, In G. I. Alkhatib, D. C. Rine, Web Engineering Advancements and Trends: Building New Dimensions of Information Technology, DOI: 10. 4018/978-1-60566-719-5. ch002, 24-37.
  16. Pereira, d. C. , Tettamanzi, C. (2006). A. G. B. : An ontology-based method for user model acquisition. In: Ma, Z. (ed. ) Soft computing in ontologies and semantic Web. Studies in fuzziness and soft computing, pp. 211–227. Springer, Heidelberg.
  17. Amine, A. , Elberrichi, Z. & Simonet, M. (2010). Evaluation of Text Clustering Methods Using WordNet, The International Arab Journal of Information Technology, (7) 4.
  18. Voorhees, E. (1994). Query Expansion Using Lexical-Semantic Relations. The 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (Dublin Ireland, 1994), 61. ACM.
  19. Koschke, R. & Eisenbarth, T. (2000). A Framework for Experimental Evaluation of Clustering Techniques. 0-7695-0656-9/00, IEEE.
  20. Tonella, P. , Ricca, F. , Pianta, E. & Girardi, C. (2003). Evaluation Methods for Web Application Clustering. Proceedings of the Fifth IEEE International Workshop on Web Site Evolution (WSE'03). 0-7695-2016-2/03, IEEE.
  21. Sridevi, U. (2011). An Ontology Based Model for Document Clustering. International Journal of Intelligent Information Technologies (IJIIT). Vol. 7 (3), PP: 54-69. DOI: 10. 4018/jiit. 2011070105.
  22. Punitha, S. & Punithavalli, M. (2012). Performance Evaluation of Semantic Based and Ontology Based Text Document Clustering Techniques. Procedia Engineering 30 (2012) 100 – 106. Elsevier Ltd.
  23. Liu, G. (1994). The Semantic Vector Space Model (SVSM) A Text Representation and Searching Technique. 1060-3425/94, IEEE.
  24. Zhao, L. , Jianguo, D. (2010). An Efficient Semantic VSM based Email Categorization Method. International Conference on Computer Application and System Modeling (ICCASM 2010), 978-1-4244-7237-6, IEEE
  25. Liu, Y. (2009). On Document Representation and Term Weights in Text Classification. In: Handbook of Research on Text and Web Mining Technologies. PP: 1-22. DOI: 10. 4018/978-1-59904-990-8. ch001. IGI Global.
  26. Majestic-12: Projects : C# HTML parser (. NET). http://www. majestic12. co. uk/projects/html_parser. php.
  27. wordnetdotnet - Revision 262. http://wordnetdotnet. googlecode. com/svn/trunk/Projects/Thanh/ .
  28. Bai, R. , Wang, X. & Liao, J. (2010). Extract Semantic Information from WordNet to Improve Text Classification Performance. AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 409–420. Springer-Verlag Berlin Heidelberg.
  29. Tarek, G. , Fouad, M. & Aref, M. (2008). Web Document Clustering Approach using WordNet Lexical Categories and Fuzzy Clustering. Proceedings of International Workshop on Data Mining and Artificial Intelligence (DMAI' 08), 24 December, 2008, Khulna, Bangladesh. 1-4244-2136-7/08, IEEE.
  30. Jones, K. (2004). A Statistical Interpretation of Term Specificity and its Application to Retrieval. Journal of Documentation, 60 (5), p. 493-502.
  31. B. Fatiha, B. Mohand, T. Lynda, D. Mariam. (2010). Using WordNet for Concept-Based Document Indexing in Information Retrieval, SEMAPRO: The Fourth International Conference on Advances in Semantic Processing, Pages: 151 to 157, IARIA.
  32. Dragoni, M. , Pereira, C. & Tettamanzi, A. (2010). An Ontological Representation of Documents and Queries for Information Retrieval Systems, IEA/AIE 2010, Part II, LNAI 6097, pp. 555–564, Springer-Verlag Berlin Heidelberg.
Index Terms

Computer Science
Information Sciences

Keywords

Document Clustering Semantic Text Representation Agent WordNet