International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 30 |
Year of Publication: 2024 |
Authors: Geetha Ramani, Joseph Samuel |
10.5120/ijca2024923834 |
Geetha Ramani, Joseph Samuel . Deciphering Indus Scripts through Clustering Techniques and Frequency Analysis. International Journal of Computer Applications. 186, 30 ( Jul 2024), 5-17. DOI=10.5120/ijca2024923834
In this research work, the deciphering of Indus scripts is undertaken through a comprehensive methodology that integrates Clustering analysis, comparative analysis with Tamil Brahmi, identification of primary, secondary, and composite symbols, N-gram analysis and Grammar Analysis. Commencing with Clustering analysis, four algorithms: K-means, Agglomerative, Birch, and Spectral Clustering are employed. By combining the outputs of these algorithms through voting, coherent patterns within the Indus script are identified, paving the way for a deeper understanding of its structural and semantic properties. Following this, a comparative analysis of the Indus script symbols with those of Tamil Brahmi is conducted, exploring potential linguistic connections and cultural influences. Subsequently, primary, secondary, and composite symbols within the Indus script corpus are identified, shedding light on their hierarchical usage and contextual coherence. This hierarchical classification enhances the understanding of the script's semantic organization and usage patterns, providing valuable insights into its communicative capabilities and linguistic conventions. Finally, through N-gram analysis, the predictive modeling of symbol sequences is undertaken, aiming to uncover underlying structures and linguistic patterns encoded within the script's corpus. This analysis yields a list of influential signs, offering fresh perspectives on the script's symbolic and cultural significance. This research also employs a comprehensive approach to analyze the grammatical aspects of the Indus scripts. Utilizing Frequency Analysis, meticulous examination of the co-occurrence of symbols within the script corpus uncovers recurring patterns and potential grammatical markers. Subsequently, through Pattern Recognition and Contextual Analysis, a deeper understanding of the structural and semantic properties of the script is achieved by identifying linguistic patterns. By contextualizing these patterns within the inscriptions and comparing them with known linguistic structures, the aim is to decipher the underlying grammar encoded within the script. Overall, this interdisciplinary approach represents a significant milestone in the ongoing quest to decipher the Indus script, providing innovative methodologies and insights for future research in ancient linguistics and archaeology. Notably, this research marks the first application of Clustering algorithms to the Indus script, thereby pioneering a novel approach to decipherment. It's pertinent to mention that the analysis is conducted on the Interactive Corpus of Indus Texts (ICIT) comprising 694 symbols, providing a robust foundation for the investigations.