Performance Comparison on Automated Generation of Coding Rules: A Case Study on ISO 26000
AbstractWhen texts are mined for meaningful information, one important aspect is to construct a coding rule that categorizes key terms into several conceptual groups. Usually such a rule is human-made and tends to be subjective. The present study attempts to build coding rules automatically from the ISO 26000 document by using two proposed methods. The results were compared with the manually created coding rules, and the SVM method was proven to be more effective.
J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using support vector machines. WIT Transactions on Information and Communication Technologies, 28, 2002.
Y.-H. Chang, C.-Y. Chang, and Y.-H. Tseng. Trends of science education research: An automatic content analysis. Journal of Science Education and Technology, 19(4):315–331, 2010.
Y.-W. Chang and C.-J. Lin. Feature ranking using linear svm. In WCCI causation and prediction challenge, pages 53–64, 2008.
European Commission. Green paper: promoting a European framework for corporate social responsibility. Office for Official Publications of the European Communities, 2001.
K. R. Fleischmann, Y. Takayama, A.-S. Cheng, Y. Tomiura, D.W. Oard, and E. Ishita. Thematic analysis of words that invoke values in the net neutrality debate. iConference 2015 Proceedings, 2015.
J. Grimmer and B. M. Stewart. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013.
T. Nakatoh, S. Uchida, E. Ishita, and T. Oga. Automated generation of coding rules: Text-mining approach to ISO 26000. 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI2016), pages 154–158, 2016.
M. H. Nguyen and F. De la Torre. Optimal feature selection for support vector machines. Pattern recognition, 43(3):584–591, 2010.
M. Scharkow. Thematic content analysis using supervised machine learning: An empirical evaluation using german online news. Quality & Quantity, 47(2):761–773, 2013.
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21–29, 1996.
Y. Takayama, Y. Tomiura, K. R. Fleischmann, A.-S. Cheng, D.W. Oard, and E. Ishita. Automatic dictionary extraction and content analysis associated with human values. Information Engineering Express, 1(4):107–118, 2015.
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for svms. In Proceedings of the 13th International Conference on Neural Information Processing Systems, pages 647–653. MIT Press, 2000.
J. L. S. Yan, N. McCracken, and K. Crowston. Semi-automatic content analysis of qualitative data. iConference 2014 Proceedings, 2014.
C. Zirn and H. Stuckenschmidt. Multidimensional topic analysis in political texts. Data & Knowledge Engineering, 90:38–53, 2014.