Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network Structures

Saito, Shota; Shirakawa, Shinichi

doi:10.1007/978-3-030-30484-3_33

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Included in the following conference series:

International Conference on Artificial Neural Networks

4130 Accesses
3 Citations
1 Altmetric

Abstract

A method of simultaneously optimizing both the structure of neural networks and the connection weights in a single training loop can reduce the enormous computational cost of neural architecture search. We focus on the probabilistic model-based dynamic neural network structure optimization that considers the probability distribution of structure parameters and simultaneously optimizes both the distribution parameters and connection weights based on gradient methods. Since the existing algorithm searches for the structures that only minimize the training loss, this method might find overly complicated structures. In this paper, we propose the introduction of a penalty term to control the model complexity of obtained structures. We formulate a penalty term using the number of weights or units and derive its analytical natural gradient. The proposed method minimizes the objective function injected the penalty term based on the stochastic gradient descent. We apply the proposed method in the unit selection of a fully-connected neural network and the connection selection of a convolutional neural network. The experimental results show that the proposed method can control model complexity while maintaining performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Neural Architecture Search with Structure Complexity Control

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

Article 24 May 2023

Hierarchical multi-scale parametric optimization of deep neural networks

Article Open access 31 July 2023

Notes

1.
The negative value of $\epsilon '$ encourages the increase of the number of active units.

References

Akimoto, Y., Shirakawa, S., Yoshinari, N., Uchida, K., Saito, S., Nishida, K.: Adaptive stochastic natural gradient method for one-shot neural architecture search. In: International Conference on Machine Learning (ICML), pp. 171–180 (2019)
Google Scholar
Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998). https://doi.org/10.1162/089976698300017746
Article Google Scholar
Dong, J., Cheng, A., Juan, D., Wei, W., Sun, M.: PPP-Net: platform-aware progressive search for pareto-optimal neural architectures. In: International Conference on Learning Representations (ICLR) Workshop (2018)
Google Scholar
Elsken, T., Metzen, J.H., Hutter, F.: Efficient multi-objective neural architecture search via Lamarckian evolution. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: Neural Information Processing Systems (NIPS) (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)
Google Scholar
Kandasamy, K., Neiswanger, W., Schneider, J., Poczos, B., Xing, E.: Neural architecture search with Bayesian optimisation and optimal transport. In: Neural Information Processing Systems (NIPS) (2018)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: IEEE International Conference on Computer Vision (ICCV), pp. 2736–2744 (2017)
Google Scholar
Okuta, R., Unno, Y., Nishino, D., Hido, S., Loomis, C.: CuPy: a NumPy-compatible library for NVIDIA GPU calculations. In: Workshop on Machine Learning Systems (LearningSys) in the 31st Annual Conference on Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(18), 1–65 (2017)
MathSciNet MATH Google Scholar
Pham, H., Guan, M., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning (ICML), pp. 4095–4104 (2018)
Google Scholar
Real, E., et al.: Large-scale evolution of image classifiers. In: International Conference on Machine Learning (ICML), pp. 2902–2911 (2017)
Google Scholar
Saito, S., Shirakawa, S., Akimoto, Y.: Embedded feature selection using probabilistic model-based optimization. In: Genetic and Evolutionary Computation Conference (GECCO) Companion, pp. 1922–1925 (2018). https://doi.org/10.1145/3205651.3208227
Shirakawa, S., Iwata, Y., Akimoto, Y.: Dynamic optimization of neural network structures using probabilistic modeling. In: The 32nd AAAI Conference on Artificial Intelligence (AAAI-18), pp. 4074–4082 (2018)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Genetic and Evolutionary Computation Conference (GECCO), pp. 497–504 (2017). https://doi.org/10.1145/3071178.3071229
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Workshop on Machine Learning Systems (LearningSys) in Neural Information Processing Systems (NIPS), pp. 1–6 (2015)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: British Machine Vision Conference (BMVC), pp. 87.1-87.12 (2016). https://doi.org/10.5244/C.30.87
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar

Download references

Acknowledgment

This work is partially supported by the SECOM Science and Technology Foundation.

Author information

Authors and Affiliations

Yokohama National University, Yokohama, Kanagawa, Japan
Shota Saito & Shinichi Shirakawa
SkillUp AI Co., Ltd., Tokyo, Japan
Shota Saito

Authors

Shota Saito
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Shirakawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shota Saito .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saito, S., Shirakawa, S. (2019). Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network Structures. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-30484-3_33
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics