Stability of a Multilingual Sentiment Analysis based on Word-to-Word Translations

  • Noriko Horibe Sojo University
  • Keita Fujihira Japan Advanced Institute of Science and Technology
Keywords: machine translation, multilingual, sentiment analysis, sentiment dictionary

Abstract

People’s sentiments are known to have a large impact on changes in stock prices, products sales, and trends. Since web users generally state their opinion in various languages, it is important to develop a method of multilingual sentiment analysis for web texts. In this study, we design a multilingual sentiment analysis method based on word-to-word translation. The method classifies sentences by using a sentiment dictionary in a native language. The method consists of three phases: morphological analysis of a sentence, sentiment extraction of each word with the senti-ment dictionary, and sentiment extraction of a sentence based on words sentiments. We conduct sentiment classification experiments for sentences in English, German, French, and Spanish. In the experiments, we compare our method with three previous methods by the evaluation metrics “Accuracy,” “Precision,” “Recall,” and “F1-score.” The experimental results show that our method has an advantage on the stability for variations of languages.

References

T. Araujo, P. Neijens, and R. Vliegenthart, “Getting the Word Out on Twitter: The Role of Influentials, Information Brokers and Strong Ties in Building Word-of-mouth for Brands,” International Journal of Advertising, Vol. 36, No. 3, 2017, pp. 496-513.

N. Anstead and B. O'Loughlin, “Social Media Analysis and Public Opinion: The 2010 UK General Election,” Journal of Computer-Mediated Communication, Vol. 20, No. 2, 2015, pp. 204-220.

A. Abd-Alrazaq, D. Alhuwail, M. Househ, M. Hamdi, and Z. Shah, “Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study,” Journal of Medical Inter-net Research, Vol. 22, No. 4, 2020, e19016.

M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment in Twitter Events,” Journal of the American Society for Information Science and Technology, Vol. 6, No. 2, 2011, pp. 406-418.

J. Bollen, H. Mao, and X. Zeng, “Twitter Mood Predicts the Stock Market,” Journal of Com-putational Science, Vol. 2, No. 1, 2011, pp.1-8.

A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” Proc. International AAAI Confer-ence on Weblogs and Social Media, 2010, pp.178-185.

K. Ravi and V. Ravi, “A Survey on Opinion Mining and Sentiment Analysis: Tasks, Ap-proaches and Applications,” Knowledge-Based System, Vol. 89, 2015, pp. 14-46.

S. L. Lo, E. Cambria, R. Chiong, and D. Cornforth, “Multilingual Sentiment Analysis: From Formal to Informal and Scarce Resources Languages,” Artificial Intelligence Review, Vol. 48, No. 4, 2017, pp.499-527.

M. Kaity and V. Balakrishnan, “Sentiment Lexicons and non-English Languages: A Survey,” Knowledge and Information Systems, Vol. 62, 2020, pp. 4445-4480.

K. Dashtipour et al., “Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques,” Cognitive Computation, Vol. 8, 2016, pp. 757-771.

E. Boiy and M. F. Moens, “A Machine Learning Approach to Sentiment Analysis in Multi-lingual Web Texts,” Information Retrieval, Vol. 12, No. 5, 2009, pp. 526-558.

A. Cheng and O. Zhulyn, “A System for Multilingual Sentiment Learning on Large Data Sets,” Proc. International Conference on Computational Linguistics, 2012, pp. 577-592.

M. Attia, Y. Samih, A. Elkahky, and L. Kallmeyer, “Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks,” Proc. International Conference on Language Resources and Evaluation, 2018, pp. 635-640.

T. Kincl, M. Novák, and J. Přibil, “Improving Sentiment Analysis Performance on Morpho-logical Rich Languages: Language and Domain Independent Approach,” Computer Speech and Language, Vol. 56, 2019, pp. 36-51.

X. Wan, “Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis,” Proc. Conference on Empirical Methods in Natural Language Pro-cessing, 2008, pp. 553-561.

J. Brooke, M. Tofiloski, and M. Taboada, “Cross-linguistic Sentiment Analysis: From Eng-lish to Spanish,” Proc. International Conference on Recent Advances in Natural Language Processing, 2009, pp. 50-54.

K. Denecke, “Using SentiWordNet for Multilingual Sentiment Analysis,” Proc. IEEE Inter-national Conference on Data Engineering Workshop, 2008, pp. 507-512.

M. Araújo, A. Pereira, and F. Benevenuto, “A Comparative Study of Machine Translation for Multilingual Sentence-level Sentiment Analysis,” Information Sciences, Vol. 512, 2020, pp. 1078-1102.

E. F. Can, A. Ezen-Can, and F. Can, “Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data,” Proc. International Workshop on Learning from Lim-ited/Noisy Data for IR, 2018, arXiv:1806.04511.

A. Balahur and M. Turchi, “Comparative Experiments Using Supervised Learning and Translation for Multilingual Sentiment Analysis,” Computer Speech and Language, Vol. 28, No. 1, 2014, pp. 56-75.

H. Schmid, “Probabilistic Part-of-speech Tagging Using Decision Trees,” Proc. Interna-tional Conference on New Methods in Language Processing, 1994.

T. Nasukawa, D. Andrade, Y. Umino, Y. Muramatsu, and K. Yamamoto, “Finding Transla-tion Pairs for Cross-lingual Text Mining,” Proc. Annual Meeting of the Association for Nat-ural Language Processing, 2009, pp. 108-111. (In Japanese)

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” Transactions of the Association for Computational Linguistics, Vol. 5, 2017, pp. 135-146.

E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning Word Vectors for 157 Languages,” Proc. International Conference on Language Resources and Evaluation, 2018, pp. 3483-3487.

H. Takamura, T. Inui, and M. Okumura, “Extracting Semantic Orientations of Words using Spin Model,” Proc. Annual Meeting of the Association for Computational Linguistics, 2005, pp. 133-140.

I. Mozetič, M. Grčar, J. Smailović, “Twitter Sentiment for 15 European Languages,” Slove-nian language resource repository CLARNI.SI, http://hdl.handle.net/11356/1054. (Last ac-cess: March 2021)

T. Blard, “French Sentiment Analysis with BERT,” GitHub repository, https://github.com/TheophileBlard/french-sentiment-analysis-with-bert. (Last access: March 2021)

C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentiment Anal-ysis of Social Media Text,” Proc. Eighth International AAAI Conference on Weblogs and Social Media, 2014, pp. 216-225.

R. Socher et. al., “Recursive Deep Models for Semantic Compositionality Over A Sentiment Treebank,” Proc. Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1631-1642.

Published
2022-09-09
Section
Technical Papers