Vector Similarity of Related Words and Synonyms in the Japanese WordNet
Word2vec is a tool that produces vector representations of words from a large amount of text data. In this study, we show that only a part of the vector space produced by word2vec is sufficient to represent the collective sense of a set of related words in Japanese WordNet. Furthermore, we show that there is a subspace of the vector space that does not relate to the collective sense of related words and synonyms. We construct a compact decision tree by using the vectors to distinguish whether a given word belongs to the set of related words.
F. Bond, H. Isahara, S. Fujita, K. Uchimoto, T. Kuribayashi, Enhancing the Japanese WordNet, ALR7 Proc. the 7th Workshop on Asian Language Resources, pp.1-8 ,Association for Computational Linguistics. pp. 1-8, 2009
T. Hirao, T. Suzuki, K. Miyata, S. Hirokawa, Detection Methods for Misplacement of Synonyms in the Japanese WordNet, International Journal of Computer & Information Science, vol. 15, no.2, pp.26-35, 2014.
T. Hirao, T. Suzuki, K. Miyata, S. Hirokawa, A Trial for Detecting Misplacement of Synonyms in the Japanese WordNet using Corpus (in Japanese), ICIEC Technical report, vol. 114, no. 339, AI2014-18, pp. 13-18, 2014.
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
T. Mikolov et al. ,Distributed representations of words and phrases and their compositionality Proc. 27th Annual Conference on Neural Information Processing Systems, 2013.
Princeton University "About WordNet." WordNet. Princeton University. 2010, http://wordnet.princeton.edu.
I. Yamada et al, Construction of the Set of Instances from Hypernym-Hyponym relations (In Japanese). ICIEC Technical report, NLC2014-55, 2015.
L. Breiman, J. Friedman, C. J. Stone, R.A. Olshen, Classification and Regression Trees, Wadsworth & Brooks, 1984.
L. Breiman, Random Forests, Machine Learning, Volume 45, Issue 1, pp 5-32, 2001.