Analyzing Semantic Properties of Word Embeddings Using Eigenvectors

Bassma Abdlrazg; Hanan Atetalla

doi:10.54361/ajmas.247424

Authors

Bassma Abdlrazg Department Mathematics, University of Omar AL-Mokhtar, ALbaida, Libya. https://orcid.org/0000-0002-7173-2028
Hanan Atetalla Department Mathematics, University of Omar AL-Mokhtar, ALbaida, Libya.

DOI:

https://doi.org/10.54361/ajmas.247424

Abstract

Dense word vectors have demonstrated their efficacy in several downstream natural language processing (NLP) tasks in recent years. Nevertheless, the interpretability of these embeddings' dimensions remains challenging. In this paper, we investigate how eigenvectors can reveal different semantic properties captured by word embedding models. Hence, we train word embeddings (e.g., Word2Vec) on English Wikipedia corpus to analyze the top eigenvectors, identify specific semantic properties (e.g., sentiment, formality) associated with each, and explore how these properties are encoded in the embedding space. This paper also discussed the limitations and potential benefits of this approach compared to other methods for analyzing word embeddings.

لقد أثبتت متجهات الكلمات الكثيفة فعاليتها في العديد من مهام معالجة اللغة الطبيعية اللاحقة في السنوات الأخيرة. ومع ذلك، لا تزال قابلية تفسير أبعاد هذه التضمينات تشكل تحديًا. في هذه الورقة، نستكشف كيف يمكن للمتجهات الذاتية أن تكشف عن خصائص دلالية مختلفة تم التقاطها بواسطة نماذج تضمين الكلمات. وبالتالي، نقوم بتدريب تضمينات الكلمات (على سبيل المثال، Word2Vec) على مجموعة ويكيبيديا الإنجليزية لتحليل المتجهات الذاتية العليا، وتحديد الخصائص الدلالية المحددة (على سبيل المثال، المشاعر، والشكلية) المرتبطة بكل منها، واستكشاف كيفية ترميز هذه الخصائص في مساحة التضمين. ناقشت هذه الورقة أيضًا القيود والفوائد المحتملة لهذا النهج مقارنة بالطرق الأخرى لتحليل تضمينات الكلمات