Automatic Classification of Complaint Reports about City Park

Yuta Sano, Kohei Yamaguchi, Tsunenori Mine

Abstract


Recently, it has become easy for the growing Goverment 2.0 movement to report complaints. On the other hand, there is a clearly identified and growing delay in responses from the government side due to an overload on government capacity to deal with the increasing number of complaint reports as the movement grows.

In this paper, we propose a method of automatically categorizing complaint reports as a first step to reduce the pressure on the government side. We conducted experiments in categorizing the complaint reports. The experimental results showed the following findings: (1) Feature selection is key to improving the accuracy (F-score) of the categorization of complaint reports. The percentage of words that are strongly effective for categorization is about 3.9% of the total of distinct words. (2) Proposed Mutual-Information(MI)-based methods outperform a conventional Random-Forest(RF)-based method. (3) The City management section seems to classify complaint reports by focusing on demands expressed in the reports. (4) The categorization performance usually high if training data includes various types of categories of data.


Full Text:

PDF

References


Chiba citizen coordination report of demonstration experiment (ChibaRepo), Chiba city.http://www.city.chiba.jp/shimin/shimin/kocho/chibarepojikken.html(2015/02/09 confirmed)

Evaluation reports to Chiba citizen coordination report of demonstration experiment (ChibaRepo trial).http://www.city.chiba.jp/shimin/shimin/kocho/download/chibarepo-hyoukasho.pdf(2015/02/09 confirmed)

Yuta Tominaga, Hidetsugu Namba, Toshiyuki Takezawa, Automatic Classification of Comments on Government SNS, the 18th ANLP annual symposium, pp.555--558, (2012) (in Japanese)

Leo Breiman, Random forests, Machine learning 45.1, pp. 5--32 (2001)

Minoru Sasaki, Hiroyuki Shin'noh, Classification of category of business on industry Web sites by using a document classification method, the 12th ANLP annual symposium, C2-2, pp.352--355 (2006) (in Japanese)

Mingzhe Jin and Masakatsu Murakami, Authorship Identification Using Random Forests, Proc. of the Institute of Statistical Mathematics, Vol. 55, No. 2, pp.255--268 (2007) (in Japanese)

T. M. Cover, and P. E. Hart, Nearest Neighbor Pattern Classification, IEEE Transaction on Information Theory, IT-B(1). pp. 21--27 (1967)

V. Vapnic, The Nature of Statistical Learning Theory, Springer, New York.(1995)

Matthews, R. A. J. and Merriam, T. V. N..Neural computation in stylometry I: An application to the works of Shakespeare and Fletcher. Literary and Linguistic Computing, 8(4) pp. 203--210.(1993)

L. Breiman, Bagging predictors, Machine Learning, 24, pp.123--140 (1996)

Yoav Freund, and Robert E. Schapire. Experiments with a new boosting algorithm, ICML, Vol. 96, (1996)

Blei, David M., Andrew Y. Ng, and Michael I. Jordan, Latent Dirichlet Allocation, the Journal of Machine Learning Research 3, pp.993--1022 (2003)

Cabocha.https://code.google.com/p/cabocha(2015/02/09 confirmed).

TF-IDF weight of a word in a document.http://blog.takuti.me/2014/01/tf-idf(2015/02/09 confirmed).

Class separation of imbalanced data (in Japanese): http://www.slideshare.net/sfchaos/ss-11307051(2015/02/09 confirmed)

Machine Learning for package user(5): Random Forest,http://tjo.hatenablog.com/entry/2013/12/24/190000(2015/02/09 confirmed)

Feature Selection with Mutual Information (in Japanese): http://aidiary.hatenablog.com/entry/20100619/1276950312

Yuzo Hirai, Introduction of Pattern Recognition, Morikita publisher, p.194 (2012)

kanetai's second storage.http://d.hatena.ne.jp/kanetai/20110705/1309849250(2015/02/09 confirmed)

Precision and Recall.http://petitviolet.hatenablog.com/entry/20110901/1314853107 (2015/02/09 confirmed)


Refbacks

  • There are currently no refbacks.