CFP last date
20 December 2024
Reseach Article

Enhancing Semantic Understanding by Visualizing Sentence-Level Embeddings

by Akshata Upadhye
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 46
Year of Publication: 2023
Authors: Akshata Upadhye
10.5120/ijca2023923275

Akshata Upadhye . Enhancing Semantic Understanding by Visualizing Sentence-Level Embeddings. International Journal of Computer Applications. 185, 46 ( Nov 2023), 20-24. DOI=10.5120/ijca2023923275

@article{ 10.5120/ijca2023923275,
author = { Akshata Upadhye },
title = { Enhancing Semantic Understanding by Visualizing Sentence-Level Embeddings },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2023 },
volume = { 185 },
number = { 46 },
month = { Nov },
year = { 2023 },
issn = { 0975-8887 },
pages = { 20-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number46/33000-2023923275/ },
doi = { 10.5120/ijca2023923275 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:28:47.824098+05:30
%A Akshata Upadhye
%T Enhancing Semantic Understanding by Visualizing Sentence-Level Embeddings
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 46
%P 20-24
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The field of Natural Language Processing and Machine Learning is advancing rapidly. Due to these advances, various new architectures to train the language models and various new language models are introduced very frequently. These language models can be used in various applications involving text data. Since the number of choices available are high it is very important to have the right tools to evaluate these language models and in such a scenario visualization can help the researchers understand the semantic relationships within data used to train these models. Additionally, it can also be used to evaluate if the language model used to extract the features from the text data is able to model these semantic relationships. Since text data is typically high dimensional it is necessary to use dimensionality reduction techniques to be able to visualize the text data. Therefore, in this paper various dimensionality reduction techniques are discussed and a demonstration of how UMAP can be used for dimensionality reduction to visualize sentence level embeddings is provided.

References
  1. Jolliffe, Ian T., and Jorge Cadima. ”Principal component analysis: a review and recent developments.” Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences 374, no. 2065 (2016): 20150202.
  2. Ghojogh, Benyamin, Ali Ghodsi, Fakhri Karray, and Mark Crowley. ”Multidimensional scaling, sammon mapping, and isomap: Tutorial and survey.” arXiv preprint arXiv:2009.08136 (2020).
  3. Van der Maaten, Laurens, and Geoffrey Hinton. ”Visualizing data using t-SNE.” Journal of machine learning research 9, no. 11 (2008).
  4. Di Giovanni, Daniele, Roberto Enea, Valentina Di Micco, Arianna Benvenuto, Paolo Curatolo, and Leonardo Emberti Gialloreti. ”Using machine learning to explore shared genetic pathways and possible endophenotypes in autism spectrum disorder.” Genes 14, no. 2 (2023): 313.
  5. McInnes, Leland, John Healy, and James Melville. ”Umap: Uniform manifold approximation and projection for dimension reduction.” arXiv preprint arXiv:1802.03426 (2018).
  6. Becht, Etienne, Leland McInnes, John Healy, Charles-Antoine Dutertre, Immanuel WH Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W. Newell. ”Dimensionality reduction for visualizing single-cell data using UMAP.” Nature biotechnology 37, no. 1 (2019): 38-44.
  7. Dorrity, Michael W., Lauren M. Saunders, Christine Queitsch, Stanley Fields, and Cole Trapnell. ”Dimensionality reduction by UMAP to visualize physical and genetic interactions.” Nature communications 11, no. 1 (2020): 1537.
  8. Diaz-Papkovich, Alex, Luke Anderson-Trocme, and Simon Gravel. ”A ´ review of UMAP in population genetics.” Journal of Human Genetics 66, no. 1 (2021): 85-91.
  9. Diaz-Papkovich, Alex, Luke Anderson-Trocme, Chief Ben-Eghan, and ´ Simon Gravel. ”UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.” PLoS genetics 15, no. 11 (2019): e1008432.
  10. Le, Quoc, and Tomas Mikolov. ”Distributed representations of sentences and documents.” In International conference on machine learning, pp. 1188-1196. PMLR, 2014.
Index Terms

Computer Science
Information Sciences

Keywords

UMAP Sentence embeddings Text data visualization Semantic structure Natural language processing Dimensionality reduction.