CFP last date
20 February 2025
Reseach Article

Enhancing Data Authenticity: Leveraging Humanities Annotation Practices for NLP

by Urmishree Bedamatta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 62
Year of Publication: 2025
Authors: Urmishree Bedamatta
10.5120/ijca2025924461

Urmishree Bedamatta . Enhancing Data Authenticity: Leveraging Humanities Annotation Practices for NLP. International Journal of Computer Applications. 186, 62 ( Jan 2025), 34-37. DOI=10.5120/ijca2025924461

@article{ 10.5120/ijca2025924461,
author = { Urmishree Bedamatta },
title = { Enhancing Data Authenticity: Leveraging Humanities Annotation Practices for NLP },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2025 },
volume = { 186 },
number = { 62 },
month = { Jan },
year = { 2025 },
issn = { 0975-8887 },
pages = { 34-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number62/enhancing-data-authenticity-leveraging-humanities-annotation-practices-for-nlp/ },
doi = { 10.5120/ijca2025924461 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-01-28T19:07:12.382070+05:30
%A Urmishree Bedamatta
%T Enhancing Data Authenticity: Leveraging Humanities Annotation Practices for NLP
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 62
%P 34-37
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper explores the potential of applying textual criticism practices, traditionally a core aspect of humanities research, to enhance the authenticity and interpretability of linguistic data for Natural Language Processing (NLP) applications. By proposing a multi-layered annotation model, this work argues that annotations extending beyond syntactic and semantic labels, encompassing historical, cultural, and rhetorical contexts, can provide NLP systems with a deeper, context-aware understanding of language. Drawing on examples from the digital edition of the Odia Mahabharata, the paper illustrates how annotations that capture word evolution, cultural nuances, and stylistic choices can mitigate challenges in transcription, while preserving the authenticity of texts. The paper further demonstrates how such annotation practices enable NLP systems to address linguistic subtleties such as ambiguity, irony, and sentiment, making them more effective for complex tasks like machine translation, sentiment analysis, and content generation. Ultimately, this study argues that integrating humanities-driven annotation practices into NLP can not only improve the quality of computational models but also ensure the preservation and accessibility of culturally and historically significant language forms.

References
  1. Bender, E. M. (2019). The #Bender Rule: On Naming the Languages We Study and the Languages We Use. ACL 2019.
  2. Bird, S., Klein, E. & Loper, M. 2009. Natural Language Processing with Python. O’Reilly Media.
  3. Bird, S., & Liberman, M. 2001. A Formal Framework for Linguistic Annotation. Speech Communication, 33(1-2), 23-60.
  4. Blodgett, S. L., Barocas, S., Dastin, J., & Wallach, H. 2020. Language (technology) is Power: A Critical Survey of “Bias” in NLP. ACL 2020.
  5. Charniak, E. 1993. Statistical Language Learning. MIT Press.
  6. Ide, N., & Pustejovsky, J. 2017. Handbook of Linguistic Annotation. Springer.
  7. Kress, G., van Leeuwen, T. 2001. Multimodal Discourse: The Modes and Media of Contemporary Communication. Edward Arnold.
  8. Labov, W. 1972. Sociolinguistic Patterns. University of Pennsylvania Press.
  9. Muller, T. 2016. Digital Humanities and Computational Linguistics: Exploring the Potential of Annotated Corpora. Language Resources and Evaluation.
  10. Tufekci, Z. 2014. Big Questions for Social Media Big Data: Representations and Biases in the Big Data Paradigm. Proceedings of the 2014 ACM Conference on Web Science.
Index Terms

Computer Science
Information Sciences
Natural Language Processing

Keywords

Textual criticism Multi-layered annotation Odia Mahabharata Natural language processing Digital humanities