Improving the Consistency of Dialog Models Through Speaker Separation Learning

Sakuei Onishi; Takamune Onishi; Hiromitsu Shiina

doi:10.52731/iee.v10.i1.821

Sakuei Onishi Graduate School of Informatics, Okayama University of Science
Takamune Onishi Systems Nakashima
Hiromitsu Shiina Okayama University of Science

DOI: https://doi.org/10.52731/iee.v10.i1.821

Keywords: Dialogue System, User-RNN, Conditional Variational Autoencoder, Global Variational Transformer, Extended GVT

Abstract

In recent years, dialog systems, a type of application in the field of natural language processing, have become more prevalent in our daily lives, such as through help desk services. In dialog response generation, responses generated for a specific context may differ from those for other contexts not only grammatically but also semantically in some cases. Thus, simply applying translation technologies would cause issues with the diversity of the generated responses. Previous studies, such as VHRED and GVT, used sampled latent variables for response generation to achieve response diversity. In this study, we propose a method (extended GVTSC) for classifying dialogs before reflecting them in internal dialog processing, in addition to the characteristics of each speaker, to improve diversity while maintaining consistency.

References

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information

processing systems, 2017, pp. 5998–6008.

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

Copyright © by IIAI. Unauthorized reproduction of this article is prohibited. entropy-based data filtering,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 5650–5669. [Online]. Available: https://aclanthology.org/P19-1567

F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000.

K. Greff, R. K. Srivastava, J. Koutn ́ık, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222–2232, 2016.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.

O. Vinyals and Q. Le, “A neural conversational model,” in ICML Deep Learning Workshop 2015, 2015.

A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J.Y. Nie, J. Gao, and B. Dolan, “A neural network approach to contextsensitive generation of conversational responses,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational

Linguistics: Human Language Technologies. Denver, Colorado: Association for Computational Linguistics, May–Jun. 2015, pp. 196–205. [Online]. Available: https://aclanthology.org/N15-1020

R. Cs ́aky, P. Purgai, and G. Recski, “Improving neural conversational models with entropy-based data filtering,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 5650–5669. [Online]. Available: https://aclanthology.org/P19-1567

I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, ser. AAAI’16. AAAI Press, 2016, p. 3776–3783.

I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio,“A hierarchical latent variable encoder-decoder model for generating dialogues,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10983

Z. Lin, G. I. Winata, P. Xu, Z. Liu, and P. Fung, “Variational transformers for diverse response generation,” arXiv preprint arXiv:2003.12738, 2020.

B. Sun, S. Feng, Y. Li, J. Liu, and K. Li, “Generating relevant and coherent dialogue responses using self-separated conditional variational AutoEncoders,” in Proceedings

of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:

Long Papers). Online: Association for Computational Linguistics, Aug. 2021, pp. 5624–5637. [Online]. Available: https://aclanthology.org/2021.acl-long.437

J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan, “A diversity-promoting objective function for neural conversation models,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association

for Computational Linguistics, Jun. 2016, pp. 110–119. [Online]. Available: https://aclanthology.org/N16-1014

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating text generation with bert,” In International Conference on Learning Representation,2019.

M. Inaba, “A example based dialogue system using the open 2channel dialogue corpus,” Journal of Japanese Society for Artificial Intelligence, vol. 87, pp. 129—-132,2019.

T. Zhao, R. Zhao, and M. Eskenazi, “Learning discourse-level diversity for neural dialog models using conditional variational autoencoders,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, Jul. 2017, pp. 654–664. [Online]. Available: https://aclanthology.org/P17-1061

S. R. Bowman, L. Vilnis, O. Vinyals, A. Dai, R. Jozefowicz, and S. Bengio,“Generating sentences from a continuous space,” in Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 10–21. [Online]. Available: https://aclanthology.org/K16-1002

X. Zhou and W. Y. Wang, “MojiTalk: Generating emotional responses at scale,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 1128–1137. [Online]. Available: https://aclanthology.org/P18-1104