Improving Abstractive Summarization by Transfer Learning with Adaptive Document Selection

Masato Shirai; Kei Wakabayashi

doi:10.52731/ijscai.v7.i2.701

Masato Shirai Shimane University
Kei Wakabayashi University of Tsukuba

DOI: https://doi.org/10.52731/ijscai.v7.i2.701

Abstract

Abstractive document summarization based on neural networks is a promising approach to generate a flexible summary but requires a large amount of training data.
While transfer learning can address this issue, there is a potential concern about the negative transfer effect that deteriorates the performance when we use training documents irrelevant to the target domain, which has not been explicitly explored in document summarization tasks.
In this paper, we propose a method that selects training documents from the source domain that are expected to be useful for the target summarization.
The proposed method is based on the similarity of word distributions between each source document and a set of target documents.
We further propose an adaptive approach that builds a custom-made summarization model for each test document by selecting source documents similar to the test document.
In the experiment, we confirmed that the negative transfer actually happens also in the document summarization tasks.
Additionally, we show that the proposed method effectively avoids the negative transfer issue and improves summarization performance.

References

M. T. Rosenstein, Z. Marx, L. P. Kaelbling, and T. G. Dietterich, “To transfer or not to transfer,” in NIPS 2005 workshop on transfer learning, vol. 898, 2005, pp. 1–4.

X. Shi, W. Fan, and J. Ren, “Actively transfer domain knowledge,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2008, pp. 342–357.

M. Shirai, J. Liu, and T. Miura, “Transfer learning using latent domain for document stream classification,” in 2016 IEEE Second International Conference on Multimedia Big Data (BigMM). IEEE, 2016, pp. 82–88.

M. Long, J. Wang, G. Ding, W. Cheng, X. Zhang, and W. Wang, “Dual transfer learning,” in Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 2012, pp. 540–551.

P. Rai, A. Saha, H. Daum´e III, and S. Venkatasubramanian, “Domain adaptation meets active learning,” in Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing. Association for Computational Linguistics, 2010, pp. 27–32.

Z. Zhu, X. Zhu, Y. Ye, Y.-F. Guo, and X. Xue, “Transfer active learning,” in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 2169–2172.

R. Chattopadhyay, W. Fan, I. Davidson, S. Panchanathan, and J. Ye, “Joint transfer and batch-mode active learning,” in International Conference on Machine Learning, 2013, pp. 253–261.

T. Semwal, P. Yenigalla, G. Mathur, and S. B. Nair, “A practitioners’ guide to transfer learning for text classification using convolutional neural networks,” in Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 2018, pp. 513–521.

B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2012–2022.

B. Tan, Y. Song, E. Zhong, and Q. Yang, “Transitive transfer learning,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1155–1164.

Y. Keneshloo, N. Ramakrishnan, and C. K. Reddy, “Deep transfer reinforcement learning for text summarization,” in Proceedings of the 2019 SIAM International Conference on Data Mining. SIAM, 2019, pp. 675–683.

A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointergenerator networks,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017, pp. 1073–1083.

K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines to read and comprehend,” in Advances in neural information processing systems, 2015, pp. 1693–1701.

M. Grusky, M. Naaman, and Y. Artzi, “Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018, pp. 708–719.

C.-Y. Lin and E. Hovy, “Manual and automatic evaluation of summaries,” in Proceedings of the ACL-02 Workshop on Automatic Summarization-Volume 4. Association for Computational Linguistics, 2002, pp. 45–51.