Syllabus Mining for Analysis of Searchable Information

  • Michiko Yasukawa Gunma University
  • Hirofumi Yokouchi Gunma University
  • Koichi Yamazaki Gunma University
Keywords: database, faculty development, information retrieval, text mining


Writing an effective syllabus is critically important for instructors to provide effective education at universities. However, little is known about how to create a well-written syllabus. It is necessary to elucidate what kind of information must be included in a syllabus. To achieve this goal, we focus on the searchable information in syllabi and analyze an actual syllabus collection that includes 6,493 syllabus documents of a national university in Japan. First, we investigate syllabus classification and syllabus search by using established text mining methods and an information retrieval method. The results of our experiments demonstrate that (i) knowledge discovery from syllabus documents is a challenging and non-trivial task, and (ii) just adding one particular word can already increase the searchability in syllabus search. Next, we investigate methods that provide word suggestions using deep learning approaches and large text corpora. In this experiment, we used a bibliographic database of university libraries in Japan, which contains 3,990,646 bibliographic entries, and a version of Japanese Wikipedia, which contains 2,351,545 articles. The results indicate that (iii) a vocabulary from a bibliographic database of university libraries is effective to ameliorate the efficacy measured by the mean reciprocal rank, and (iv) a wide range of vocabulary is essential in improving the recall in word suggestions.


M. Yasukawa, H. Yokouchi, and K. Yamazaki, “Syllabus mining for faculty development in science and engineering courses,” in 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019, Toyama, Japan, July 7-11, 2019. IEEE, 2019, pp. 334–341.

B. G. Davis, Tools for teaching. John Wiley & Sons, 2009.

H. M. Walker, “What should be in a syllabus?” SIGCSE Bull., vol. 37, no. 4, pp. 19–21, Dec. 2005. [Online]. Available:

H. Mima, “Mima search: a structuring knowledge system towards innovation for engineering education,” in Proceedings of the COLING/ACL on Interactive presentation sessions. Association for Computational Linguistics, 2006, pp. 21–24.

Y. Yaginuma, “Visualization method of web pages based on syllabus,” in 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE, 2017, pp. 1009–1010.

Y. Matsuda, T. Sekiya, and K. Yamaguchi, “Curriculum analysis of computer science departments by simplified, supervised LDA,” Journal of Information Processing, vol. 26, pp. 497–508, 2018.

T. Upstill, N. Craswell, and D. Hawking, “Buying bestsellers online: A case study in search & searchability,” in ADCS 2002, Proceedings of the Seventh Australasian Document Computing Symposium, Sydney, Australia, December 16, 2002, 2002. [Online]. Available:

C. Manning, P. Raghavan, and H. Schutze, “Introduction to information retrieval,” ¨ Natural Language Engineering, vol. 16, no. 1, pp. 100–103, 2010.

G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, Nov. 1975.

A. Takano, Y. Niwa, S. Nishioka, M. Iwayama, T. Hisamitsu, O. Imaichi, and H. Sakurai, “Information access based on associative calculation,” in International Conference on Current Trends in Theory and Practice of Computer Science. Springer, 2000, pp. 187–201.

A. Singhal, C. Buckley, and M. Mitra, “Pivoted document length normalization,” in SIGIR ’96, 1996, pp. 21–29.

L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

A. McCallum, K. Nigam et al., “A comparison of event models for naive bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752, no. 1, 1998, pp. 41–48.

B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992, pp. 144–152.

C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, p. 27, 2011.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space proceedings of workshop at iclr,” 2013.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality in: Nips,” 2013.

M. Suzuki, K. Matsuda, S. Sekine, N. Okazaki, and K. Inui, “A joint neural model for fine-grained named entity classification of wikipedia articles,” IEICE Transactions on Information and Systems, vol. 101, no. 1, pp. 73–81, 2018.

Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International conference on machine learning, 2014, pp. 1188–1196.

J. H. Lau and T. Baldwin, “An empirical evaluation of doc2vec with practical insights into document embedding generation,” arXiv preprint arXiv:1607.05368, 2016.

D. Onaifo and D. Rasmussen, “Increasing libraries’ content findability on the web with search engine optimization,” Library Hi Tech, vol. 31, no. 1, pp. 87–108, 2013.

L. Ivanovic, B. Dimic Surla, D. Surla, D. Ivanovic, Z. Konjovic, and G. Rudic, “Improving the discoverability of Ph.D. student work through a cris system,” The Electronic Library, vol. 36, no. 3, pp. 471–486, 2018.

J. Larsson, “The retrievability of a discipline: a domain analytic view of classification,” INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, vol. 12, no. 4, 2007.

M. B. Eberly, S. E. Newton, and R. A. Wiggins, “The syllabus as a tool for studentcentered learning,” The Journal of General Education, pp. 56–74, 2001.

J. T. Ishiyama and S. Hartlaub, “Does the wording of syllabi affect student course assessment in introductory political science classes?” PS: Political Science & Politics, vol. 35, no. 3, pp. 567–570, 2002.

M. S. Palmer, L. B. Wheeler, and I. Aneece, “Does the document matter? the evolving role of syllabi in higher education,” Change: The Magazine of Higher Learning, vol. 48, no. 4, pp. 36–47, 2016.

C. E. Keller Jr, J. G. Marcis, and A. B. Deck, “A national survey on the perceived importance of syllabi components: Differences and agreements between students and instructors in the principles of accounting course.” Academy of Educational Leadership Journal, vol. 18, no. 3, 2014.

Technical Papers