Automatic Extractive Summarization for Japanese Academic Papers by LDA

  • Hideyuki Sawahata Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications
  • Tetsuro Nishino Department of Informatics, Gragute School of Informatics and Engineering, The University of Electro-Commnunications
Keywords: Automatic Summarization, Extractive summarization, Natural Language Processing, LDA, LSA

Abstract

The demand for automatic summarization of newspaper headlines and article is increasing and various studies on automatic summarization are currently conducted.
However, there are only a few studies on summarization of Japanese documents compared to those for English documents.

In this study, we verify whether existing summarization methods can be effective for academic papers written in Japanese. First, we demonstrate the effectiveness of topic-based extrac tive summarization methods Latent Semantic Analysis (LSA). We then show that more effective topic-based extractive summarization can be achieved by using Latent Dirichlet Allocation ( LDA).

References

Radityo Eko Prasojo, Mouna Kacimi, and Werner Nutt. Modeling and summarizing news events using semantic triples. In European Semantic Web Conference, pages 512–527. Springer, 2018.

Kai Hong, John M Conroy, Benoit Favre, Alex Kulesza, Hui Lin, and Ani Nenkova. A repository of state of the art and competitive baseline summaries for generic news summarization. In LREC, pages 1608–1616. Citeseer, 2014.

Parag Jain, Anirban Laha, Karthik Sankaranarayanan, Preksha Nema, Mitesh M. Khapra, and Shreyas Shetty. A mixed hierarchical attention based encoder-decoder approach for standard table summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 622–627, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In Hal Daum´e III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 11328–11339. PMLR, 13–18 Jul 2020.

G¨unes Erkan and Dragomir R Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479, 2004.

S. DEERWESTER. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci., 41(6):391–407, 1990.

Ilyas Cicekli Makbule Gulcin Ozsoy, Ferda Nur Alpaslan. Text summarization using latent semantic analysis. Journal of Information Science, 2011.

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.

Yihong Gong and Xin Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19–25, 2001.

Josef Steinberger and Karel Jezek. Using latent semantic analysis in text summarization and summary evaluation. 01 2004.

Gabriel Murray, Steve Renals, and Jean Carletta. Extractive summarization of meeting recordings. In in Proceedings of the 9th European Conference on Speech Communication and Technology, pages 593–596, 2005.

Makbule Ozsoy, Ilyas Cicekli, and Ferda Alpaslan. Text summarization of turkish texts using latent semantic analysis. In Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pages 869–876, 2010.

C.-Y. LIN. Rouge : A package for automatic evaluation of summaries. Proc.Workshop on Text Summarization Branches Out, Post Conference Workshop of ACL 2004, 2004.

The Asso for Natural Language Processing. Latex corpus of the transcations of the association for natural language processing. https://www.anlp.jp/resource/journal_latex/index.html, 2020.

Published
2023-10-07
Section
Technical Papers (Artificial Intelligence)