Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch
Ruo Ando1, Yoshihisa Fukuhara2, Yoshiyasu Takefuji3

1Ruo Ando, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan.

2Yoshihisa Fukuhara, Musashino University, Department of Data Science, 3-3-3 Ariake, Koto-Ku, Tokyo, Japan.

3Yoshiyasu Takefuji, Musashino University, Department of Data Science, 3-3-3 Ariake, Koto-Ku, Tokyo, Japan.

Manuscript received on 30 June 2023 | Revised Manuscript received on 15 June 2023 | Manuscript Accepted on 15 June 2023 | Manuscript published on 30 June 2023 | PP: 1-6 | Volume-3 Issue-4, June 2023 | Retrieval Number: 100.1/ijainn.D1070063423 | DOI: 10.54105/ijainn.D1070.063423

Open Access | Editorial and Publishing Policies | Cite | Zenodo | Indexing and Abstracting
© The Authors. Published by Lattice Science Publication (LSP). This is an open access article under the CC-BY-NC-ND license (

Abstract: Recently, datasets have been discovered for which adaptive optimizers are not more than adequate. No evaluation criteria have been established for optimization as to which algorithm is appropriate. In this paper, we propose a characterization method by implementing backward automatic differentiation and characterizes the optimizer by tracking the gradient and the value of the signal flowing to the output layer at each epoch. The proposed method was applied to a CNN (Convolutional Neural Network) recognizing CIFAR-10, and experiments were conducted comparing and Adam (adaptive moment estimation) and SGD (stochastic gradient descent). The experiments revealed that for batch sizes of 50, 100, 150, and 200, SGD and Adam significantly differ in the characteristics of the time series of signals sent to the output layer. This shows that the ADAM optimizer can be clearly characterized from the input signal series for each batch size.

Keywords: Characterization of Optimizers, Adaptive Optimizer, Reverse Mode Differentiation, CNN
Scope of the Article: Neural Networks