Time-series Keyword Extraction Method and Its Application to Discovery Japanese Key Technology Transition Insights
Abstract
This paper presents a time-series keyword extraction method and its application to the discovery of key technological transition insights in Japan. In general, it is one of the most important issues in understanding trends in science and technology. To grasp these trends, it will be possible to visualize trends in science and technology from time to time if a method is established to extract important keywords from time to time using the White Paper on Science, Technology, and Innovation published every year by the Japanese government as an example. In this method, words are extracted from text data from time to time, and a new Importance Transition Discovering Score (WITD-Score) is proposed as an index representing the likelihood of the occurrence of each word at that time, following the concept of F-Score. By extracting the transition of keywords from time to time from the change in the WITD-Score for each word, we can extract the transition of keywords from time to time. By implementing this method, we can discover important keywords from time-series text data and visualize the transition of keywords.
References
Science, Technology, and Innovation white paper in Japan https://whitepaper-search.nistep.go.jp/white-paper/list
A. A. Taha, A. Hanbury, “Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool,” BMC medical imaging, 15(1), pp.1-28, 2015.
ScatterText package https://github.com/JasonKessler/scattertext#understanding-scaled-f-score
S. Kato, T. Nakanishi, B. Ahsan, H. Shimauchi, Time-series topic analysis using singular spectrum transformation for detecting political business cycles, Journal of Cloud Computing: Advances, Systems and Applications, 10, 21, 2021.
D. M. Blei, A. Y. Ng, M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, 3(Jan), pp.993-1022, 2003.
T. Idé, K. Inoue, "Knowledge discovery from heterogeneous dynamic systems using change-point correlations," In Proceedings of the 2005 SIAM international con-ference on data mining, pp. 571-575, 2005.
Google Colaboratory. https://colab.research.google.com/
T. Kudo, Mecab: Yet another part-of-speech and morphological analyzer. http://mecab.sourceforge.net/, 2005.
J. S. Kessler, "Scattertext: a Browser-Based Tool for Visualizing how Corpora Dif-fer," ACL System Demonstrations. 2017.