Iterative Consistency-Based Feature Selection and Its Application to Nucleotide Sequences of Influenza A Viruses

Kouichi Hirata; Sho Shimamura

doi:10.52731/iee.v5.i1.262

Kouichi Hirata Kyushu Institute of Technology
Sho Shimamura Kyushu Institute of Technology

DOI: https://doi.org/10.52731/iee.v5.i1.262

Keywords: Iterative Consistency-Based Feature Selection, Consistency-Based Feature Selection, CWC, LCC, Nucleotide Sequences, Influenza A Viruses

Abstract

In this paper, first we formulate a consistency-based feature selection problem as combinatorial optimization problems. Next, for the purpose of increasing the number of instances explained by the features, which we call explanatory instances, rather than decreasing the number of features themselves in consistency-based feature selection, we introduce an iterative consistency-based feature selection and design the algorithm to compute it. Finally, we apply the method to several nucleotide sequences of influenza A viruses and evaluate the advantage of the method.

References

E. Amaldi, V. Kann, “The complexity and approximability of finding maximum feasible subsystems of linear system,” Theoretical Computer Science, vol. 147, 1995, pp. 181–210.

E. Amaldi, V. Kann, “On the approximability of minimizing nonzero variables or unsatisfied relations in linear system,” Theoretical Computer Science, vol. 209, 1998, pp. 237–260.

A. Arauzo-Azofra, J. M. Benitez, J. L Castro, “Consistency measures for feature selection,” Journal of Intelligent Information Systems, vol. 30, 2008, pp. 273–292.

T. Bao, P. Bolotov, D. Dernovoy, B. Kiryutin, L. Zaslavsky, T. Tatusova, J. Ostell, D. Lipman, “The influenza virus resource at the National Center for Biotechnology Information,” Journal of Virology, vol. 82, 2008, pp. 596–601. Also available at: http://www.ncbi.nlm.gov/genomes/FLU/.

V. Bol´on-Canedo, N. S´anchez-Maro˜no, A. Alonso-Betanozos, “Feature selection for high-dimensional data,” Springer, 2015.

I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh, “Feature extraction: Foundations and applications,” Springer, 2006.

I. Hamada, T. Shimada, D. Nakata, K. Hirata, T. Kuboyama, “Classifying nucleotide sequences and their positions of influenza A viruses through several kernels,” Proc. 4th Int’l Conf. on Pattern Recognition Applications and Methods (ICPRAM 15), 2015, pp. 354–359.

H. Liu, H. Motoda, M. Dash, “A monotonic measure for optimal feature selection,” Proc. 10th European Conf. Machine Learning (ECML 98), 1998, pp. 101–106.

S. Makino, T. Shimada, K. Hirata, K. Yonezawa, K. Ito, “A trim distance between positions in nucleotide sequences,” Proc. 15th Int’l Conf. on Discovery Science (DS 12), Lecture Notes in Artificial Intelligence, vol. 7562, 2012, pp. 81–91.

L. Molina, L. Belanche, A. Nebot, “Feature selection algorithms: A survey and experimental evaluation,” Proc. 2002 IEEE Int’l Conf. Data Mining (ICDM 02), 2002, pp. 306–313.

Z. Pawlik, “Rough set: Theoretical aspects of reasoning about data,” Kluwer Academic Press, 1991.

W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery: “Numerical recipes in C (2nd edition),” Cambridge University Press, 1992.

C. E. Shannon: “A mathematical theory of communication,” Bell System Technical Journal, vol 27, 1948, pp. 379–423, 623–666.

T. Shimada, I. Hamada, K. Hirata, K. Yonezawa, K. Ito, “Clustering of positions in nucleotide sequences by trim distance,” Proc. IIAI Int’l Conf. on Advanced Applied Informatics (IIAI AAI 13), 2013, pp. 129–134.

S. Shimamura, K. Hirata, “On temporal and regional analysis for nucleotide sequences of influenza A (H1N1) viruses based on feature selection,” Proc. 2016 Int’l Workshop on Smart Info-Media Systems in Asia (SISA 16), 2016, pp. 38–42.

S. Shimamura, K. Hirata, “The reselection of adjacent sets by consistency-based feature selection,” Proc. 2nd Int’l Conf. on Information Science and System (ICISS 19), 2019, 4 pages.

S. Shimamura, K. Hirata, “Introducing fluctuation into increasing order of symmetric uncertainty for consistency-based feature selection,” Proc. 15th Annual Conf. on Theory and Applications of Models of Computing, Lecture Notes in Computer Science, vol. 11436, 2019, pp. 550–565.

K. Shin, X. M. Xu, “Consistency-based feature selection,” Proc. 13th Int’l Conf. Knowledge-Based and Intelligent Information & Engineering (KES 09), 2009, pp. 342–350.

K. Shin, D. Fernalndes, S. Miyazaki, “Consistency measures for feature selection: A formal definition, relative sensitivity comparison, and a fast algorithm,” Proc. 20th Int’l Joint Conf. Artificial Intelligence (IJCAI 11), 2011, pp. 1491–1497.

K. Shin, T. Kuboyama, T. Hashimoto, D. Shepard, “Super-CWC and super-LCC: Super fast feature selection algorithms,” Proc. IEEE Int’l Conf. Big Data, 2015, pp. 61–67.

K. Shin, S. Miyazaki, “A fast and accurate feature selection algorithm based on binary consistency measure,” Computational Intelligence, vol. 32, 2016, pp. 645–667.

Z. Zhao, H. Liu, “Searching for interacting features,” Proc. 16th Int’l Joint Conf. Artificial Intelligence (IJCAI 07), 2007, pp. 1156–1161.