The Relationship of English Foreign Language Learner Proficiency and an Entropy Based Measure

  • Brendan Flanagan Kyushu University
  • Sachio Hirokawa Kyushu University
Keywords: Learner Proficiency, Proficiency Prediction, Speaking Errors, Entropy


It is important for education systems to analyze and provide an appropriate level of feedback to meet the needs of learners. Predicting a learner’s proficiency level can be used to inform learner’s about their progress, and can also aid other parts of the characteristic analysis and feedback process, such as: focused analysis on learner proficiency subgroups. In this paper, we propose a measure based on the frequency of words in the sentences produced by learners during speaking exams to predict the learner’s language proficiency. The proposed measure is compared to the learner’s vocabulary size by correlation analysis. The results suggest that there is a stronger correlation between the proposed measure and the proficiency of the learner than the learner’s vocabulary size.


B. Flanagan, C. Yin, T. Suzuki, and S. Hirokawa, Classification and Clustering English Writing Errors Based on Native Language, Proc. 2014 IIAI 3rd International Conference on Advanced Applied Informatics (IIAIAAI), 2014, pp. 318-323.

E. B. Page, The use of the computer in analyzing student essays, International Review of Education, vol. 14, no. 2, 1968, pp. 210-225.

T. Supnithi, K. Uchimoto, T. Saiga, E. Izumi, S. Virach, and H. Isahara, Automatic proficiency level checking ased on SST corpus, Proc. RANLP, 2003, pp. 29-33.

L. Chen, J. Tetreault, and X. Xi, Towards using structural events to assess non-native speech, Proc. NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, 2010, pp. 74-79.

M. Chen, and K. Zechner, Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech, Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 2011, pp. 722-731.

D. Higgins, X. Xi, K. Zechner, and D. Williamson, A three-stage approach to the automated scoring of spontaneous spoken responses, Computer Speech & Language, vol. 25, no. 2, 2011, pp. 282-306.

K. Zechner, K. Evanini, S. Y. Yoon, L. Davis, X. Wang, L. Chen, and C. W. Leong, Automated Scoring of Speaking Items in an Assessment for Teachers of English as a Foreign Language, ACL 2014, 2014, pp. 134-143.

S. A. Crossley, T. Salsbury, D. S. McNamara, and S. Jarvis, Predicting lexical proficiency in language learner texts using computational indices, Language Testing, vol. 28, no. 4, 2011, pp. 561–580.

S. A. Crossley, and D. S. McNamara, Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication, Journal of Research in Reading, vol. 35, no. 2, 2012, pp. 115-135.

S. Y. Yoon, and S. Bhat, Assessment of ESL learners' syntactic competence based on similarity measures, roc. 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012, pp. 600-608.

M. Abe, Frequency Change Patterns across Proficiency Levels in Japanese EFL Learner Speech, Apples: Journal of Applied Language Studies, vol. 8, no. 3, 2014, pp. 85-96.

E. Izumi, K. Uchimoto, and H. Isahara, SST speech corpus of Japanese learners’ English and automatic detection of learners’ errors, ICAME Journal, vol. 28, 2004, pp. 31-48.

Y. Tono, T. Kaneko, H. Isahara, T. Saiga, E. Izumi, and M. Narita, The Standard Speaking Test (SST) Corpus: A 1 million-word spoken corpus of Japanese learners of English and its implications for L2 lexicography, Proc. Second Asialex International Congress, 2001, pp. 257-262.

C.E. Shannon, A Mathematical Theory of Communication, Bell system technical journal, vol. 27, no. 3, 1948, pp. 379-423.

H. Schmid, Probabilistic part-of-speech tagging using decision trees, Proc. international conference on new methods in language processing 12, 1994, pp. 44-49.

Technical Papers (Learning Technologies and Learning Environments)