Selective Potentiality and Moving Focus for Interpreting Multi-Layered Neural Network

Ryotaro Kamimura

doi:10.52731/ijscai.v9.i1.886

Ryotaro Kamimura Tokai University

DOI: https://doi.org/10.52731/ijscai.v9.i1.886

Keywords: structural potentiality, selective potentiality, moving focus, prototype, interpretation

Abstract

The present paper aims to demonstrate the existence of simplification forces in neural networks. These simplification forces can be represented by the simplest network, called ``prototype.'' To extract the prototype, we need to identify necessary and important information during learning. The structural potentiality has been proposed to reduce information, aiming to reduce unnecessary information, but one of its problems lies in excessive information reduction. To preserve important information, we need to maximize or at least weaken the excessive information reduction. To solve this problem, we introduce a new potentiality called ``selective potentiality,'' which allows us to move a focus field where a group of connection weights can be flexibly reduced. This method aims to replace the troublesome contradictory operations of potentiality reduction and augmentation with more concrete and manageable ones.

The method was applied to an artificial dataset, in which linear and non-linear relations were introduced. The results confirmed that selective potentiality could be increased to weaken structural potentiality reduction. The selective potentiality showed strong forces of simplification throughout the entire learning process. By seeking the simplest prototype, additional results were obtained, where networks tried to infer the outputs, enhancing both linear and non-linear inputs for better generalization.

References

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, “Implicit bias of gradient descent on linear convolutional networks,” Advances in neural information processing systems, vol. 31, 2018.

D. Soudry, E. Hoffer, M. S. Nacson, S. Gunasekar, and N. Srebro, “The implicit bias of gradient descent on separable data,” Journal of Machine Learning Research, vol. 19, no. 70, pp. 1–57, 2018.

S. Arora, N. Cohen,W. Hu, and Y. Luo, “Implicit regularization in deep matrix factorization,” Advances in Neural Information Processing Systems, vol. 32, 2019.

G. Blanc, N. Gupta, G. Valiant, and P. Valiant, “Implicit regularization for deep neural networks driven by an ornstein-uhlenbeck like process,” in Conference on learning theory, PMLR, 2020, pp. 483–513.

A. Ali, E. Dobriban, and R. Tibshirani, “The implicit regularization of stochastic gradient flow for least squares,” in International conference on machine learning, PMLR, 2020, pp. 233–244.

N. Razin and N. Cohen, “Implicit regularization in deep learning may not be explainable by norms,” Advances in neural information processing systems, vol. 33, pp. 21 174–21 187, 2020.

S. L. Smith, B. Dherin, D. G. Barrett, and S. De, “On the origin of implicit regularization in stochastic gradient descent,” arXiv preprint arXiv:2101.12176, 2021.

M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, 2016, pp. 1135–1144.

F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.

Y. Zhang, P. Tiˇno, A. Leonardis, and K. Tang, “A survey on neural network interpretability,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 5, pp. 726–742, 2021.

C. Molnar, G. Casalicchio, and B. Bischl, “Interpretable machine learning–a brief history, state-of-the-art and challenges,” in ECML PKDD 2020 Workshops: Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): SoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Springer, 2021, pp. 417–431.

F.-L. Fan, J. Xiong, M. Li, and G. Wang, “On interpretability of artificial neural networks: A survey,” IEEE Transactions on Radiation and Plasma Medical Sciences, 2021.

X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and D. Dou, “Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond,” Knowledge and Information Systems, pp. 1–38, 2022.

Y. Liang, S. Li, C. Yan, M. Li, and C. Jiang, “Explaining the black-box model: A survey of local methods for deep neural networks,” Neurocomputing, vol. 419, pp. 168–182, 2021.

C. Yang, A. Rangarajan, and S. Ranka, “Global model interpretation via recursive partitioning,” in 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), IEEE, 2018, pp. 1563–1570.

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature machine intelligence, vol. 2, no. 1, pp. 2522–5839, 2020.

Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, 1995.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.

E. L. Bienenstock, L. N. Cooper, and P. W. Munro, “Theory for the development of neuron selectivity,” Journal of Neuroscience, vol. 2, pp. 32–48, 1982.

Z.Wang, T. Zeng, Y. Ren, Y. Lin, H. Xu, X. Zhao, Y. Liu, and D. Ielmini, “Toward a generalized bienenstock-cooper-munro rule for spatiotemporal learning via triplet-stdp in memristive devices,” Nature communications, vol. 11, no. 1, pp. 1–10, 2020.

C. Cadieu, M. Kouh, A. Pasupathy, C. E. Connor, M. Riesenhuber, and T. Poggio, “A model of v4 shape selectivity and invariance,” Journal of neurophysiology, vol. 98, no. 3, pp. 1733–1750, 2007.

O. Barak, M. Rigotti, and S. Fusi, “The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off,” Journal of Neuroscience, vol. 33, no. 9, pp. 3844–3856, 2013.

M. Rigotti, O. Barak, M. R. Warden, X.-J. Wang, N. D. Daw, E. K. Miller, and S. Fusi, “The importance of mixed selectivity in complex cognitive tasks,” Nature, vol. 497, no. 7451, pp. 585–590, 2013.

M. V. Peelen and P. Downing, “Category selectivity in human visual cortex,” 2020.

T. Vu and A. Eldawy, “Deepsampling: Selectivity estimation with predicted error and response time,” arXiv preprint arXiv:2008.06831, 2020.

J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D.-Y. Yeung, “Gaan: Gated attention networks for learning on large and spatiotemporal graphs,” arXiv preprint arXiv:1803.07294, 2018.

D. Bahdanau, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.

J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of naacL-HLT, Minneapolis, Minnesota, vol. 1, 2019, p. 2.

S.Watanabe, Knowing and guessing: A quantitative study of inference and information. New York: John Wiley and Sons Inc, 1969.

R. Linsker, “Self-organization in a perceptual network,” Computer, vol. 21, no. 3, pp. 105–117, 1988.

——, “Perceptual neural organization: Some approaches based on network models and information theory,” Annual review of Neuroscience, vol. 13, no. 1, pp. 257–281, 1990.

——, “Improved local learning rule for information maximization and related applications,” Neural networks, vol. 18, no. 3, pp. 261–265, 2005.

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm, “Mutual information neural estimation,” in International conference on machine learning, PMLR, 2018, pp. 531–540.

R. Fritschek, R. F. Schaefer, and G. Wunder, “Deep learning for channel coding via neural mutual information estimation,” in 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), IEEE, 2019, pp. 1–5.

J. Song and S. Ermon, “Understanding the limitations of variational mutual information estimators,” arXiv preprint arXiv:1910.06222, 2019.

S. Molavipour, G. Bassi, and M. Skoglund, “Conditional mutual information neural estimator,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 5025–5029.

N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw), IEEE, 2015, pp. 1–5.

A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” arXiv preprint arXiv:1612.00410, 2016.

M. Chalk, O. Marre, and G. Tkacik, “Relevant sparse codes with variational information bottleneck,” Advances in Neural Information Processing Systems, vol. 29, pp. 1957–1965, 2016.

R. A. Amjad and B. C. Geiger, “Learning representations for neural network-based classification using the information bottleneck principle,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 9, pp. 2225–2239, 2019.

A. M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, “On the information bottleneck theory of deep learning,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124 020, 2019.

A. Kolchinsky, B. D. Tracey, and D. H. Wolpert, “Nonlinear information bottleneck,” Entropy, vol. 21, no. 12, p. 1181, 2019.

Z. Goldfeld and Y. Polyanskiy, “The information bottleneck problem and its applications in machine learning,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 19–38, 2020.

R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neural networks via information,” arXiv preprint arXiv:1703.00810, 2017.

G. K. Zipf, Human behavior and the principle of least effort: An introduction to human ecology. Connecticut: Martino Fine Books, 2012.

P. Vogt, “Minimum cost and the emergence of the zipf-mandelbrot law,” 2004.

A. Hackmann and T. Klarl, “The evolution of zipf’s law for us cities,” Papers in Regional Science, vol. 99, no. 3, pp. 841–852, 2020.

G. De Marzo, A. Gabrielli, A. Zaccaria, and L. Pietronero, “Dynamical approach to zipf’s law,” Physical Review Research, vol. 3, no. 1, p. 013 084, 2021.

A. El Kaabouchi, F. X. Machu, J. Cocks, R. Wang, Y. Zhu, and Q. A. Wang, “Study of a measure of efficiency as a tool for applying the principle of least effort to the derivation of the zipf and the pareto laws,” Advances in Complex Systems, vol. 24, no. 07n08, p. 2 150 013, 2021.

Q. A. Wang, “Principle of least effort vs. maximum efficiency: Deriving zipf-pareto’s laws,” Chaos, Solitons & Fractals, vol. 153, p. 111 489, 2021.

G. M. Linders and M. M. Louwerse, “Zipf’s law revisited: Spoken dialog, linguistic units, parameters, and the principle of least effort,” Psychonomic Bulletin & Review, vol. 30, no. 1, pp. 77–101, 2023.

A. Agouzal, T. Lafouge, and M. Bertin, “Relationship between the principle of least effort and the average cost of information in a zipfian context,” Journal of Informetrics, vol. 18, no. 1, p. 101 478, 2024.

J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.

J. Frankle, G. K. Dziugaite, D. M. Roy, and M. Carbin, “Stabilizing the lottery ticket hypothesis,” arXiv preprint arXiv:1903.01611, 2019.

X. Ma, G. Yuan, X. Shen, T. Chen, X. Chen, X. Chen, N. Liu, M. Qin, S. Liu, Z. Wang, et al., “Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?” Advances in Neural Information Processing Systems, vol. 34, 2021.

E. Malach, G. Yehudai, S. Shalev-Schwartz, and O. Shamir, “Proving the lottery ticket hypothesis: Pruning is all you need,” in International Conference on Machine Learning, PMLR, 2020, pp. 6682–6691.

J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Linear mode connectivity and the lottery ticket hypothesis,” in International Conference on Machine Learning, PMLR, 2020, pp. 3259–3269.

X. Chen, Y. Cheng, S. Wang, Z. Gan, J. Liu, and Z. Wang, “The elastic lottery ticket hypothesis,” Advances in Neural Information Processing Systems, vol. 34, 2021.

U. Evci, Y. Ioannou, C. Keskin, and Y. Dauphin, “Gradient flow in sparse neural networks and how lottery tickets win,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 6577–6586.

Y. Bai, H. Wang, Z. Tao, K. Li, and Y. Fu, “Dual lottery ticket hypothesis,” arXiv preprint arXiv:2203.04248, 2022.

A. da Cunha, E. Natale, and L. Viennot, “Proving the strong lottery ticket hypothesis for convolutional neural networks,” in International Conference on Learning Representations, 2022.

Qualitative Bankruptcy, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C52889, 2014.

R. Kamimura, “Forced and natural creative-prototype learning for interpreting multi-layered neural networks,” in 2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), IEEE, 2024, pp. 343–350.