International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 62 |
Year of Publication: 2025 |
Authors: Urmishree Bedamatta |
10.5120/ijca2025924461 |
Urmishree Bedamatta . Enhancing Data Authenticity: Leveraging Humanities Annotation Practices for NLP. International Journal of Computer Applications. 186, 62 ( Jan 2025), 34-37. DOI=10.5120/ijca2025924461
This paper explores the potential of applying textual criticism practices, traditionally a core aspect of humanities research, to enhance the authenticity and interpretability of linguistic data for Natural Language Processing (NLP) applications. By proposing a multi-layered annotation model, this work argues that annotations extending beyond syntactic and semantic labels, encompassing historical, cultural, and rhetorical contexts, can provide NLP systems with a deeper, context-aware understanding of language. Drawing on examples from the digital edition of the Odia Mahabharata, the paper illustrates how annotations that capture word evolution, cultural nuances, and stylistic choices can mitigate challenges in transcription, while preserving the authenticity of texts. The paper further demonstrates how such annotation practices enable NLP systems to address linguistic subtleties such as ambiguity, irony, and sentiment, making them more effective for complex tasks like machine translation, sentiment analysis, and content generation. Ultimately, this study argues that integrating humanities-driven annotation practices into NLP can not only improve the quality of computational models but also ensure the preservation and accessibility of culturally and historically significant language forms.