Vector Similarity of Related Words and Synonyms in the Japanese WordNet

Keywords: Decision tree, Thesaurus, Word2vec, WordNet


Word2vec is a tool that produces vector representations of words from a large amount of text data. In this study, we show that only a part of the vector space produced by word2vec is sufficient to represent the collective sense of a set of related words in Japanese WordNet. Furthermore, we show that there is a subspace of the vector space that does not relate to the collective sense of related words and synonyms. We construct a compact decision tree by using the vectors to distinguish whether a given word belongs to the set of related words.


