An adaptive mechanism to achieve learning rate dynamically

Zhang, Jinjing; Hu, Fei; Li, Li; Xu, Xiaofei; Yang, Zhanbo; Chen, Yanbin

doi:10.1007/s00521-018-3495-0

An adaptive mechanism to achieve learning rate dynamically

Original Article
Published: 26 April 2018

Volume 31, pages 6685–6698, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jinjing Zhang ORCID: orcid.org/0000-0003-2918-618X¹,
Fei Hu^1,2,
Li Li ORCID: orcid.org/0000-0003-4818-8770¹,
Xiaofei Xu¹,
Zhanbo Yang¹ &
…
Yanbin Chen³

730 Accesses
16 Citations
3 Altmetric
Explore all metrics

Abstract

Gradient descent is prevalent for large-scale optimization problems in machine learning; especially it nowadays plays a major role in computing and correcting the connection strength of neural networks in deep learning. However, many gradient-based optimization methods contain more sensitive hyper-parameters which require endless ways of configuring. In this paper, we present a novel adaptive mechanism called adaptive exponential decay rate (AEDR). AEDR uses an adaptive exponential decay rate rather than a fixed and preconfigured one, and it can allow us to eliminate one otherwise tuning sensitive hyper-parameters. AEDR also can be used to calculate exponential decay rate adaptively by employing the moving average of both gradients and squared gradients over time. The mechanism is then applied to Adadelta and Adam; it reduces the number of hyper-parameters of Adadelta and Adam to only a single one to be turned. We use neural network of long short-term memory and LeNet to demonstrate how learning rate adapts dynamically. We show promising results compared with other state-of-the-art methods on four data sets, the IMDB (movie reviews), SemEval-2016 (sentiment analysis in twitter) (IMDB), CIFAR-10 and Pascal VOC-2012.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AMAdam: adaptive modifier of Adam method

Article 27 February 2024

ABNGrad: adaptive step size gradient descent for optimizing neural networks

Article 16 February 2024

Adam revisited: a weighted past gradients perspective

Article 03 January 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems. Curran Associates Inc., pp 1097–1105
Tompson J, Jain A, Lecun Y et al (2014) Joint training of a convolutional network and a graphical model for human pose estimation. Eprint Arxiv, pp 1799–1807
Farabet C, Couprie C, Najman L et al (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Article Google Scholar
Alex K, Ilya S, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS
Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Article Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Ma J, Sheridan RP, Liaw A et al (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55(2):263
Article Google Scholar
Xiong HY, Alipanahi B, Lee LJ et al (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806
Article Google Scholar
Khalil-Hani M, Liew SS, Bakhteri R (2015) An optimized second order stochastic learning algorithm for neural network training. In: International conference on neural information processing. Springer, pp 38–45
Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D (2015) Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on Machine Learning, Computer science, vol 37, pp 1613–1622
Sutskever I, Martens J, Dahl GE et al (2013) On the importance of initialization and momentum in deep learning. ICML (3) 28:1139–1147
Google Scholar
Johnson, R, Tong Z (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504
Article MathSciNet Google Scholar
Deng L, Li J, Huang JT et al (2013) Recent advances in deep learning for speech research at Microsoft. In: IEEE international conference on acoustics, speech and signal processing. IEEE, pp 8604–8608
Dauphin Y, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. arXiv, 1C14
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407
Article MathSciNet Google Scholar
Darken C, Chang J, Moody J (1992) Learning rate schedules for faster stochastic gradient search. In: Neural networks for signal processing II, proceedings of the 1992 IEEE workshop, (September), 1C11
Sutton RS (1986) Two problems with backpropagation and other steepest-descent learning procedures for networks. In: Proceedings of 8th annual conference. Cognitive Science Society
Bottou L (1991) Stochastic gradient learning in neural networks. In: Neuro-Nimes
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv:1212.5701
Nesterov Y (1983) A method for unconstrained convex minimization problem with the rate of convergence o(1/k2). Doklady ANSSSR (translated as Soviet. Math. Docl.), vol 269, pp 543–547
Bengio Y, Boulanger-Lewandowski N, Pascanu R (2013) Advances in optimizing recurrent networks. IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, Vancouver, BC, Canada, pp 8624–8628
Google Scholar
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw: The Official Journal of the International Neural Network Society 12(1):145C151
Article MathSciNet Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121C2159
MathSciNet MATH Google Scholar
Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Ng AY (2012) Large scale distributed deep networks. In: NIPS 2012: neural information processing systems, 1C11
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543
Schaul T, Zhang S, Lecun Y (2012) No more pesky. Learn Rates 28:343–351
Google Scholar
Maas AL, Daly RE, Pham PT et al (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1. Association for Computational Linguistics, pp 142–150
Nakov P, Ritter A, Rosenthal S, Sebastiani F, Stoyanov V (2011) Evaluation measures for the SemEval-2016 ask 4 sentiment analysis in Twitter. http://alt.qcri.org/semeval2016/task4/
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Everingham M, Gool LV, Williams CKI et al (2010) The pascal, visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Guo J (2013) Backpropagation through time. Unpubl. ms., Harbin Institute of Technology
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Article Google Scholar
Hu F, Xu X, Wang J, Yang Z, Li L (2017) Memory-enhanced latent semantic model: short text understanding for sentiment analysis. International conference on database systems for advanced applications. Springer, Cham, pp 393–407
Chapter Google Scholar
Steijvers M, Grunwald P (1996) A recurrent network that performs a context-sensitive prediction task. In: Conference of the cognitive science
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–58
MathSciNet MATH Google Scholar
Wang S, Manning CD (2013) Fast dropout training. In: Proceedings of the 30th international conference on machine learning, pp 118–126. ACM
Wang S, Manning C (2013) Fast dropout training. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 118–126
Babaeizadeh M, Smaragdis P, Campbell RH (2016) NoiseOut: a simple way to prune neural networks. In: Emdnn Nips workshops. arXiv:1611.06211
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. International conference on learning representations, computer science, pp 1150–1210
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211C252
Article MathSciNet Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278C2323
Article Google Scholar
Deng Y, Ren Z, Kong Y et al (2017) A hierarchical fused fuzzy deep neural network for data classification. IEEE Trans Fuzzy Syst 25(4):1006–1012
Article Google Scholar
Yue D, Feng B, Kong Y, Ren Z, Dai Q (2017) Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans Neural Netw Learn Syst 28(3):653–664
Article Google Scholar
Zhang H, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer SOM. IEEE Trans Neural Netw Learn Syst 27(12):2537
Article Google Scholar

Download references

Acknowledgements

The work was supported by the Fundamental Research Funds For the Central Universities (No. XDJK2017D059), Scientific and Technological Research Program of Chongqing University of Education (Nos. KY2016TZ02 and 2017XJPT07), Key Research Program of Chongqing Education Science 13th Five-Year Plan 2017 (No. 2017-GX-139). Li Li is the corresponding author for the paper.

Author information

Authors and Affiliations

School of Computer and Information Science, Southwest University, Chongqing, China
Jinjing Zhang, Fei Hu, Li Li, Xiaofei Xu & Zhanbo Yang
Network Centre, Chongqing University of Education, Chongqing, China
Fei Hu
School of Computer Science, Chongqing University, Chongqing, China
Yanbin Chen

Authors

Jinjing Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Fei Hu
View author publications
You can also search for this author inPubMed Google Scholar
Li Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofei Xu
View author publications
You can also search for this author inPubMed Google Scholar
Zhanbo Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yanbin Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Li Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Hu, F., Li, L. et al. An adaptive mechanism to achieve learning rate dynamically. Neural Comput & Applic 31, 6685–6698 (2019). https://doi.org/10.1007/s00521-018-3495-0

Download citation

Received: 04 August 2017
Accepted: 18 April 2018
Published: 26 April 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00521-018-3495-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive mechanism to achieve learning rate dynamically

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AMAdam: adaptive modifier of Adam method

ABNGrad: adaptive step size gradient descent for optimizing neural networks

Adam revisited: a weighted past gradients perspective

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now