Performance Comparison on Automated Generation of Coding Rules: A Case Study on ISO 26000

  • Tetsuya Nakatoh Kyushu University
  • Satoru Uchida Kyushu University
  • Emi Ishita Kyushu University
  • Toru Oga Kyushu University
Keywords: Text mining, Coding rules, Automated Generation, SVM, ISO 26000


When texts are mined for meaningful information, one important aspect is to construct a coding rule that categorizes key terms into several conceptual groups. Usually, such a rule is human-made and tends to be subjective. The present study attempts to build coding rules automatically from the ISO 26000 document by using two proposed methods. The results were compared with the manually created coding rules, and the SVM method was proven to be more effective.

Author Biographies

Tetsuya Nakatoh, Kyushu University

Assistant Professor

Academic Information Section,
Research Institute for Information Technology

Satoru Uchida, Kyushu University
Faculty of Languages and Cultures
Emi Ishita, Kyushu University
Research and Development Division, Kyushu University Library
Toru Oga, Kyushu University
Faculty of Law


J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using support vector machines. WIT Transactions on Information and Communication Technologies, 28, 2002.

Y.-H. Chang, C.-Y. Chang, and Y.-H. Tseng. Trends of science education research: An automatic content analysis. Journal of Science Education and Technology, 19(4):315–331, 2010.

Y.-W. Chang and C.-J. Lin. Feature ranking using linear svm. In WCCI causation and prediction challenge, pages 53–64, 2008.

European Commission. Green paper: promoting a European framework for corporate social responsibility. Office for Official Publications of the European Communities, 2001.

K. R. Fleischmann, Y. Takayama, A.-S. Cheng, Y. Tomiura, D. W. Oard, and E. Ishita. Thematic analysis of words that invoke values in the net neutrality debate. iConference 2015 Proceedings, 2015.

J. Grimmer and B. M. Stewart. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013.

T. Nakatoh, S. Uchida, E. Ishita, and T. Oga. Automated generation of coding rules: Text-mining approach to ISO 26000. 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI2016), pages 154–158, 2016.

M. H. Nguyen and F. De la Torre. Optimal feature selection for support vector machines. Pattern recognition, 43(3):584–591, 2010.

M. Scharkow. Thematic content analysis using supervised machine learning: An empirical evaluation using german online news. Quality & Quantity, 47(2):761–773, 2013.

A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21–29, 1996.

Y. Takayama, Y. Tomiura, K. R. Fleischmann, A.-S. Cheng, D. W. Oard, and E. Ishita. Automatic dictionary extraction and content analysis associated with human values. Information Engineering Express, 1(4):107–118, 2015.

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for svms. In Proceedings of the 13th International Conference on Neural Information Processing Systems, pages 647–653. MIT Press, 2000.

J. L. S. Yan, N. McCracken, and K. Crowston. Semi-automatic content analysis of qualitative data. iConference 2014 Proceedings, 2014.

C. Zirn and H. Stuckenschmidt. Multidimensional topic analysis in political texts. Data & Knowledge Engineering, 90:38–53, 2014.