Syllabus Mining for Analysis of Searchable Information

  • Michiko Yasukawa Gunma University
  • Hirofumi Yokouchi Gunma University
  • Koichi Yamazaki Gunma University

Abstract

Searchable information in syllabus documents is beneficial for effective education at universities. In this study, we empirically investigate syllabus mining for evaluating the searchable information in syllabus documents. In our study, we analyze 6,493 online syllabi of a national university in Japan. First, we investigate syllabus classification and syllabus search by using established text mining methods, such as random forest, naive Bayes, and support vector machine, and an information retrieval method, which uses a pivoted document length normalization. We use F1 score for the analysis of syllabus classification. For the syllabus search, we use the mean reciprocal rank to measure the search effectiveness. The results of our experiments demonstrate that feature words in each of the syllabus documents are effective in improving searchability. Next, we investigate methods that provide word suggestions using deep learning approaches and large text corpora. In this experiment, we used a bibliographic database of university libraries in Japan, which contains 3,990,646 bibliographic entries, and a version of Japanese Wikipedia, which contains 2,351,545 articles. The results indicate that a wide range of vocabulary is efficient in improving the searchability of syllabi. Finally, using our findings as a basis, we propose guiding principles for assessing syllabus documents.
Published
2020-05-30
Section
Technical Papers