Automatic Classification of Complaint Reports about City Park

Yuta Sano; Kohei Yamaguchi; Tsunenori Mine

doi:10.52731/iee.v1.i4.35

Yuta Sano Kyushu University
Kohei Yamaguchi Kyushu University
Tsunenori Mine Kyushu University

DOI: https://doi.org/10.52731/iee.v1.i4.35

Keywords: Categorization, Complaint Report, Government 2.0, Mutual Information, Random Forest

Abstract

Recently, it has become easy for the growing Goverment 2.0 movement to report complaints. On the other hand, there is a clearly identified and growing delay in responses from the government side due to an overload on government capacity to deal with the increasing number of complaint reports as the movement grows. In this paper, we propose a method of automatically categorizing complaint reports as a first step to reduce the pressure on the government side. We conducted experiments in categorizing the complaint reports. The experimental results showed the following findings: (1) Feature selection is key to improving the accuracy (F-score) of the categorization of complaint reports. The percentage of words that are strongly effective for categorization is about 3.9% of the total of distinct words. (2) Proposed Mutual-Information-based methods outperform a conventional Random-Forestbased method. (3) The city management section seems to classify complaint reports by focusing on demands expressed in the reports. (4) The categorization performance usually high if training data includes various types of categories of data.

References

Yuta Tominaga, Hidetsugu Namba, Toshiyuki Takezawa, Automatic Classification of Comments on Government SNS, the 18th ANLP annual symposium, pp.555–558, (2012) (in Japanese)

Papacharissi, Zizi. ”Without you, I’m nothing: Performances of the self on Twitter.” International Journal of Communication 6 (2012): 18.

Mingzhe Jin and Masakatsu Murakami, Authorship Identification Using Random Forests, Proc. of the Institute of Statistical Mathematics, Vol. 55, No. 2, pp.255–268 (2007) (in Japanese)

Leo Breiman, Random forests, Machine learning 45.1, pp. 5–32 (2001)

T. M. Cover, and P. E. Hart, Nearest Neighbor Pattern Classification, IEEE Transaction on Information Theory, IT-B(1). pp. 21–27 (1967)

V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York.(1995)

Matthews, R. A. J. and Merriam, T. V. N..Neural computation in stylometry I: An application to the works of Shakespeare and Fletcher. Literary and Linguistic Computing, 8(4) pp. 203–210.(1993)

L. Breiman, Bagging predictors, Machine Learning, 24, pp.123–140 (1996)

Yoav Freund, and Robert E. Schapire. Experiments with a new boosting algorithm, ICML, Vol. 96, (1996)

Blei, David M., Andrew Y. Ng, and Michael I. Jordan, Latent Dirichlet Allocation, the Journal of Machine Learning Research 3, pp.993–1022 (2003)

Yuzo Hirai, Introduction of Pattern Recognition, Morikita publisher, p.194 (2012) (in Japanese)