Visual Explanation of Eigenvalues and Math Process in Latent Semantic Analysis

  • Yukari Shirota Gakushuin University
  • Basabi Chakraborty Iwate Prefectural University
Keywords: LSA, text mining, SVD, visualization, mathematics, multi-variant analysis, eigenvalue

Abstract

Latent Semantic Analysis (LSA) is a widely used method in text mining field to extract the underlying concepts in the text document. The mathematical technique behind LSA is Singular Value Decomposition (SVD) in which the key concept is the eigenvalues. It is difficult to understand the underlying mathematics for general people, not proficient in mathematics. One reason might be that the linear algebra textbooks available in the market are not written for non– mathematics majors such as economics students. We believe that there is better teaching method to explain the eigenvalues and eigenvectors to our students. In this paper, we would like to illustrate the method. In the main part of the paper, we have proposed a visualization of the mathematical process behind LSA to make it easily understandable to general people, novice in mathematics. In addition, to understand the SVD process more deeply, another example which is a time series data analysis by SVD is also presented.

Author Biographies

Yukari Shirota, Gakushuin University

DSc. Prof. of Faculty of Economics, Gakushuin University. Research fields are visualization of data on the web. web data visualization, social media analysis, and visual education methods for business mathematics. For over 17 years, she has developed visual teaching materials for business mathematics and statistics. In VINCI 2015, the tutorial titled “Visually Do Statistics for Business Persons: Visual Materials from Regression to Black-Sholes Model” she talked. Various visual teaching material sites have been published on the web which are freely available as follows:

http://www.gakushuin.ac.jp/univ/eco/english/teacher/sirota.html

http://www.gakushuin.ac.jp/univ/eco/english/teacher/sirota.html

http://www-cc.gakushuin.ac.jp/~20010570/VDStat/

http://www-cc.gakushuin.ac.jp/~20010570/mathABC/ABC/

Basabi Chakraborty, Iwate Prefectural University

She received B.Tech, M.Tech and Ph. D degrees in Radio Physics and Electronics from Calcutta University, India and worked in Indian Statistical Institute, Calcutta, India until 1990. She received another Ph. D in Information Science from Tohoku University, Japan in 1996. Currently she is a full professor in Software and Information Science department of Iwate Prefectural University, Japan. Her main research interests are in the area of Pattern Recognition, Machine Learning, Soft Computing Techniques, Data mining and Online Social media mining. She is a senior member of IEEE, member of ACM, INNS and Japanese Society of Artificial Intelligence.

References

N. Evangelopoulos, and L. Visinescu, “Text-mining the voice of the people,” Communications of the ACM, vol. 55, no. 2, pp. 62-69, 2012.

T. K. Landauer, P. W. Foltz, and D. Laham, “An introduction to latent semantic analysis,” Discourse processes, vol. 25, no. 2-3, pp. 259-284, 1998.

T. K. Landauer, D. S. McNamara, S. Dennis, and W. Kintsch, Handbook of latent semantic analysis: Psychology Press, 2013.

C. D. Manning, P. Raghavan, and H. Schuetze, Introduction to Information Retrieval: Cambridge University Press, 2008.

D. A. Grossman, Information retrieval: Algorithms and heuristics: Springer, 2004.

T. H. Wonnacott, and R. J. Wonnacott, REGRESSION: John Wiley & Sons, Inc., 1981.

S. Konishi, Introduction to Multivariate Analysis: Linear and Nonlinear Modeling: Chapman & Hall/CRC, 2014.

I. Koch, Analysis of Multivariate and High-Dimensional Data: Cambridge University Press, 2013.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, 2 ed.: Springer, 2009.

Y. Shirota, and T. Hashimoto, “Knowledge Visualization of Reasoning for Statistical Problems,” Annual Report of Gakushuin University Research Institute for Economics and Management (GEM Bulletin), vol. 28, pp. 45-54, 2014/12, 2014.

Y. Shirota, “Practical Teaching Methods of Linear Algebra for Students in the Economics Course,” Gakushuin Economics Papers, vol. 51, no. 2, pp. 133-147, 2014/07, 2014.

Y. Shirota, and T. Hashimoto, “Web Publication of Three-Dimensional Animation Materials for Business Mathematics : 10 Graphics for Economics Mathematics (Part 2),” Annual Report of Gakushuin University Research Institute for Economics and Management (GEM Bulletin), no. 26, pp. 13-22, 2012/12, 2012.

B. Mirkin, Core Concepts in Data Analysis: Summarization, Correlation and Visualization (Undergraduate Topics in Computer Science): Springer, 2011.

Wikipedia. "Singular Value Decomposition," 2015; http://en.wikipedia.org/wiki/Singular_value_decomposition.

S. Lipschutz, Theory and Problems of Beginning Linear Algebra: McGraw-Hill, 1997.

B. Kolman, and D. R. Hill, Introductory Linear Algebra, 8 ed.: Pearson, 2005.

W. K. Nicholson, Linear Algebra With Applications, 6 ed.: McGraw-Hill, 2003.

V. Plerou et al., “Random matrix approach to cross correlations in financila data,” Physical Review E, Vol. 65, No. 6, pp. 066126-1-066126-18, 2002.

V. Plerou et al., “A random matrix theory approach to fnancial cross-correlations,” Physica A: Statistical Mechanics and its Applications, Vol. 287, No. 34, pp. 374-382, 2000.

M.F. Lubis, Y. Shirota, and R.F. Sari, “Thailand's 2011 Flooding: its Impacts on Japan Companies in Stock Price Data,” Gakushuin Economics Papers, Vol. 52, No. 3, pp. 101-121, 2015.

Published
2016-03-30
Section
Technical Papers (Data Science & Institutional Research)