Detection of Unnatural Parts of Statistical Data

  • Tetsuya Nakatoh Nakamura Gakuen University
  • Takahiko Suzuki Kyushu University
  • Tsukasa Kamimasu Kyushu University
  • Sachio Hirokawa Advanced Institute of Industrial Technology
Keywords: Benford’s law, data reliability, statistical data, unnatural subsets


Ensuring the authenticity of statistical data is important because such data are used for various decision-making tasks. However, in practical applications, several types of data alterations have been reported. Therefore, it is necessary to validate the accuracy of statistical data. Benford’s law is a well-known method for detecting unnatural numerical data. According to Benford’s law, the occurrence probability of the first significant digits follows a particular distribution. However, the unnatural parts of data cannot be accurately identified. In this study, we attempted to identify the unnatural parts of statistical data available in tabular format. A subset of the target data was specified using the row and column names that define each cell in the table or the words displayed in the table title. By measuring the divergence of the subsets, we identified the unnatural subsets. In this paper, we present the results of the identification of unnatural subsets using the agricultural data acquired from the China Statistical Yearbook.


Nigrini, M. J., “Benford’s Law Applications for Forensic Accounting, Auditing, and Fraud Detection,” ISBN: 9781118152850, Wiley, 2012

National Bureau of Statistics of China, “China Statistical Yearbook,”, (accessed Jan. 2017)

Nihon Keizai Shimbun, “China’s statistics, dubious about reliability,” Z11C15A0EA2000/, 2015 (accessed Feb. 2018)

Sankei News: “Liaoning Province, China. Accept false statistics. Fiscal revenue inflated in the past.”, 2017.1.18 00:40, (accessed Feb. 2018)

Badkar, M., Benford’s Law Rises New Doubts About Chinese Economic Data,, BUSINESS INSIDER, Jan 11, 2013, (accessed Feb. 2018)

Japan Science & Technology Agency, “Science Portal China,” index.html, 2017, (accessed Oct. 2017)

Fraud analysis with SSAS: Benford’s law test in OLAP Cubes,, Microsoft, Jun 19, 2015

Leemis, L. M., Schmeiser, B. W, Evans, D. L., Survival Distributions Satisfying Benford’s Law, The American Statistician, 54:4, pp. 236–241, 2000

Cho, W., K., T., Gains, B.J, Breaking the (Benford) Law, The American Statistician, 61:3, pp.218–223, 2007.

Morrow, J., Benford’s Law, Families of Distributions and a test basis, CEPDP1291, LSE Research Online, 2010 (accessed Feb. 2018)

Simon Newcomb, “Note on the frequency of use of the different digits in natural numbers,” American Journal of Mathematics 4 (1/4), pp.39-–40, doi:10.2307/2369148, 1881.

Benford, F., “The law of anomalous numbers,” Proc. of the American Philosophical Society, 78:4, pp.551–572, Mar. 1938.

Nigrini, M., J., I’ve Got Your Number, Journal of Accountancy, May 1, 1999

Holz, C., A., The quality of China’s GDP statistics, In China Economic Review, Volume 30, 2014, pp 309–338, ISSN 1043-951X

Ichinomiya, S., Experimental verification on application of digital analysis, J-STAGE, 2011:21, pp.103–111, 2017

Andreas, D., Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientific Data, Journal of Applied Statistics, 34:3, pp.321–329, 2007

Arshadi1, L., Jahang, A., H., Benford’s law behavior of Internet traffic, Journal of Network and Computer Applications, Vol. 40, pp.194–205, 2014

CaseWare Analytics, ’s-law

Sarker, P., B., An Observation on the Significant Digits of Binominal Coefficients and Factorials, Sankhya B.35, pp.363–364, 1973.

Maurus, S., Plant, C., Let’s See Your Digits: Anomalousn-State Detection using Benford’s Law, KDD 2017 Research Paper, Aug. 2017

Technical Papers (Data Science & Institutional Research)