CFP last date
20 January 2025
Reseach Article

Deciphering Indus Scripts through Clustering Techniques and Frequency Analysis

by Geetha Ramani, Joseph Samuel
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 30
Year of Publication: 2024
Authors: Geetha Ramani, Joseph Samuel
10.5120/ijca2024923834

Geetha Ramani, Joseph Samuel . Deciphering Indus Scripts through Clustering Techniques and Frequency Analysis. International Journal of Computer Applications. 186, 30 ( Jul 2024), 5-17. DOI=10.5120/ijca2024923834

@article{ 10.5120/ijca2024923834,
author = { Geetha Ramani, Joseph Samuel },
title = { Deciphering Indus Scripts through Clustering Techniques and Frequency Analysis },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2024 },
volume = { 186 },
number = { 30 },
month = { Jul },
year = { 2024 },
issn = { 0975-8887 },
pages = { 5-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number30/deciphering-indus-scripts-through-clustering-techniques-and-frequency-analysis/ },
doi = { 10.5120/ijca2024923834 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-07-26T23:00:35.466808+05:30
%A Geetha Ramani
%A Joseph Samuel
%T Deciphering Indus Scripts through Clustering Techniques and Frequency Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 30
%P 5-17
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this research work, the deciphering of Indus scripts is undertaken through a comprehensive methodology that integrates Clustering analysis, comparative analysis with Tamil Brahmi, identification of primary, secondary, and composite symbols, N-gram analysis and Grammar Analysis. Commencing with Clustering analysis, four algorithms: K-means, Agglomerative, Birch, and Spectral Clustering are employed. By combining the outputs of these algorithms through voting, coherent patterns within the Indus script are identified, paving the way for a deeper understanding of its structural and semantic properties. Following this, a comparative analysis of the Indus script symbols with those of Tamil Brahmi is conducted, exploring potential linguistic connections and cultural influences. Subsequently, primary, secondary, and composite symbols within the Indus script corpus are identified, shedding light on their hierarchical usage and contextual coherence. This hierarchical classification enhances the understanding of the script's semantic organization and usage patterns, providing valuable insights into its communicative capabilities and linguistic conventions. Finally, through N-gram analysis, the predictive modeling of symbol sequences is undertaken, aiming to uncover underlying structures and linguistic patterns encoded within the script's corpus. This analysis yields a list of influential signs, offering fresh perspectives on the script's symbolic and cultural significance. This research also employs a comprehensive approach to analyze the grammatical aspects of the Indus scripts. Utilizing Frequency Analysis, meticulous examination of the co-occurrence of symbols within the script corpus uncovers recurring patterns and potential grammatical markers. Subsequently, through Pattern Recognition and Contextual Analysis, a deeper understanding of the structural and semantic properties of the script is achieved by identifying linguistic patterns. By contextualizing these patterns within the inscriptions and comparing them with known linguistic structures, the aim is to decipher the underlying grammar encoded within the script. Overall, this interdisciplinary approach represents a significant milestone in the ongoing quest to decipher the Indus script, providing innovative methodologies and insights for future research in ancient linguistics and archaeology. Notably, this research marks the first application of Clustering algorithms to the Indus script, thereby pioneering a novel approach to decipherment. It's pertinent to mention that the analysis is conducted on the Interactive Corpus of Indus Texts (ICIT) comprising 694 symbols, providing a robust foundation for the investigations.

References
  1. Community Detection in Networks: A Comprehensive Survey (2016) by Fortunato, S. & Castellano, C. (Published in: Physics Reports, 586, 74-174. https://arxiv.org/list/stat.ME/recent
  2. Gong, H., & Zhang, Y. (2020). Analyzing the Indus script using a combination of convolutional neural networks and n-grams. Pattern Recognition Letters, 130, 272-278.
  3. Shan, X., Yao, J., & Wang, J. (2019). Rethinking Indus script complexity through information theory. Entropy, 21(12), 1222.
  4. Shu-Ming Hsieh and Chiun-Chieh Hsu. 2008. Graph-based representation for similarity retrieval of symbolic images. Data Knowl. Eng. 65, 3 (June, 2008), 401–418. https://doi.org/10.1016/j.datak.2007.12.004.
  5. Guanglin Huang, Wan Zhang, and Liu Wenyin. 2008. A Discriminative Representation for Symbolic Image Similarity Evaluation. Graphics Recognition. Recent Advances and New Opportunities: 7th International Workshop, GREC 2007, Curitiba, Brazil, September 20-21, 2007. Selected Papers. Springer-Verlag, Berlin, Heidelberg, 71–79. https://doi.org/10.1007/978-3-540-88188-9_8
  6. Sarat Sasank Barla, Sai Surya Sanjay Alamuru, and Peter Zsolt Revesz. 2022. Feature Analysis of Indus Valley and Dravidian Language Scripts with Similarity Matrices. In Proceedings of the 26th International Database Engineered Applications Symposium (IDEAS '22). Association for Computing Machinery, New York, NY, USA, 63–69. https://doi.org/10.1145/3548785.3548801.
  7. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In L. M. Le Cam & J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (
  8. Ward Jr, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236-244.
  9. Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. In ACM
  10. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (pp. 849-856).
  11. Ansumali Mukhopadhyay, B. Interrogating Indus inscriptions to unravel their mechanisms of meaning conveyance. Palgrave Commun 5, 73 (2019). https://doi.org/10.1057/s41599-019-0274-1
Index Terms

Computer Science
Information Sciences

Keywords

Indus script Deciphering predictive modeling N-gram analysis Comparative analysis Tamil Brahmi Clustering analysis Grammar Analysis.