HyperSTAR: Task-Aware Hyperparameter Recommendation for Training and Compression

Liu, Chang; Mittal, Gaurav; Karianakis, Nikolaos; Fragoso, Victor; Yu, Ye; Fu, Yun; Chen, Mei

doi:10.1007/s11263-023-01961-0

HyperSTAR: Task-Aware Hyperparameter Recommendation for Training and Compression

Published: 21 December 2023

Volume 132, pages 1913–1927, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chang Liu ORCID: orcid.org/0000-0002-0219-4748¹^na1,
Gaurav Mittal²^na1,
Nikolaos Karianakis²,
Victor Fragoso²,
Ye Yu²,
Yun Fu¹ &
…
Mei Chen²

396 Accesses
1 Altmetric
Explore all metrics

Abstract

Hyperparameter optimization (HPO) methods alleviate the significant effort required to obtain hyperparameters that perform optimally on visual learning problems. Existing methods are computationally inefficient because they are task agnostic (i.e., they do not adapt to a given task). We present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware HPO algorithm that improves HPO efficiency for a target dataset by using prior knowledge from previous hyperparameter searches to recommend effective hyperparameters conditioned on the target dataset. HyperSTAR ranks and recommends hyperparameters by predicting their performance on the target dataset. To do so, it learns a joint dataset-hyperparameter space in an end-to-end manner that enables its performance predictor to use previously found effective hyperparameters for other similar tasks. The hyperparameter recommendations of HyperSTAR combined with existing HPO techniques lead to a task-aware HPO system that reduces the time to find the optimal hyperparameters for the target learning problem. Our experiments on image classification, object detection, and model pruning validate that HyperSTAR reduces the evaluation of different hyperparameter configurations by about $50\%$ compared to existing methods and, when combined with Hyperband, uses only $25\%$ of the budget required by the vanilla Hyperband and Bayesian Optimized Hyperband to achieve the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meta-hyperband: Hyperparameter Optimization with Meta-learning and Coarse-to-Fine

Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization

Hyperparameter Search Space Pruning – A New Component for Sequential Model-Based Hyperparameter Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Dataset available at http://bit.ly/3r16oIA.

References

Achille, A., Lam, M., & Tewari, R. et al. (2019). Task2vec: Task embedding for meta-learning. In Proceeding of IEEE ICCV.
Bardenet, R., Brendel, M., & Kégl, B., et al. (2013). Collaborative hyperparameter tuning. In Proceedings of ICML.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 281–305.
MathSciNet Google Scholar
Bossard, L., Guillaumin, M., & Van Gool, L. (2014). Food-101-mining discriminative components with random forests. In Proceedings of ECCV.
Chen, W., Liu, T. Y., & Lan, Y., et al. (2009). Ranking measures and loss functions in learning to rank. In Proceeding of NeurIPS.
Cimpoi, M., Maji, S., & Kokkinos, I., et al. (2014). Describing textures in the wild. In Proceeding of IEEE CVPR.
Deng, J., Dong, W., & Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In Proceeding of IEEE CVPR.
Donahue, J., Jia, Y., & Vinyals, O., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In Proceeding of ICML.
Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. In Proceeding of ICML.
Feurer, M., Klein, A., & Eggensperger, K., et al. (2015a). Efficient and robust automated machine learning. In Proceeding of NeurIPS.
Feurer, M., Klein, A., & Eggensperger, K., et al. (2015b). Efficient and robust automated machine learning. In Proceeding of NeurIPS.
Feurer, M., Springenberg, J., & Hutter, F. (2015c). Initializing Bayesian hyperparameter optimization via meta-learning. In Proceeding of the AAAI Conf. on AI.
Franceschi, L., Donini, M., & Frasconi, P., et al. (2017). Forward and reverse gradient-based hyperparameter optimization. In Proceeding of ICML.
Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In Proceeding of ICML.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceeding of IEEE CVPR.
Girshick, R. (2015). Fast r-cnn. In Proceeding of IEEE ICCV.
Goldman, E., Herzig, R., & Eisenschtat, A. et al. (2019). Precise detection in densely packed scenes. In Proceeding of IEEE CVPR.
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset.
He, K., Gkioxari, G., & Dollár, P., et al. (2017). Mask r-cnn. In Proceeding of IEEE ICCV.
He, K., Zhang, X., & Ren, S. et al. (2016). Deep residual learning for image recognition. In Proceeding of IEEE CVPR.
Hoffman, J., Tzeng, E., & Park, T. et al. (2018). Cycada: Cycle-consistent adversarial domain adaptation. In Proceeding of ICML.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceeding of IEEE CVPR.
Huang, G., Liu, Z., & Van Der Maaten, L. et al. (2017). Densely connected convolutional networks. In Proceeding of IEEE CVPR.
Hutter, F., Kotthoff, L., & Vanschoren, J. (eds) (2018). Automatic machine learning: Methods, systems, challenges. Springer (in press), available at http://automl.org/book.
Iwana, B. K., Raza Rizvi, S. T., & Ahmed, S., et al. (2016). Judging a book by its cover. arXiv:1610.09204.
Jamieson, K., & Talwalkar, A. (2016). Non-stochastic best arm identification and hyperparameter optimization. In Proceeding of AISTATS.
Jin, H., Song, Q., & Hu, X. (2019). Auto-keras: An efficient neural architecture search system. In Proceeding of ACM KDD.
Kandasamy, K., Dasarathy, G., & Schneider, J., et al. (2017). Multi-fidelity bayesian optimisation with continuous approximations. In Proceeding of ICML.
Kim, J., Kim, S., & Choi, S. (2017). Learning to warm-start bayesian hyperparameter optimization. arXiv:1710.06219.
Klein, A., Falkner, S., & Bartels, S. et al. (2017). Fast bayesian optimization of machine learning hyperparameters on large datasets. In Proceeding of AISTATS.
Klein, A., Falkner, S., & Springenberg, J. T., et al. (2016). Learning curve prediction with Bayesian neural networks. ICLR.
Kokiopoulou, E., Hauth, A., & Sbaiz, L., et al. (2019). Fast task-aware architecture inference. arXiv:1902.05781
Kozerawski, J., Fragoso, V., & Karianakis, N. et al. (2020). BLT: Balancing long-tailed datasets with adversarially-perturbed images. In Proceeding of ACCV.
Li, H., Fowlkes, C., & Yang, H., et al. (2023). Guided recommendation for model fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3633–3642).
Li, L., Jamieson, K., & DeSalvo, G., et al. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research.
Lindauer, M., & Hutter, F. (2018). Warmstarting of model-based algorithm configuration. In Proceeding of the AAAI Conference on AI.
Liu, Z., Luo, P., & Qiu, S., et al. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceeding of IEEE CVPR.
Ma, N., Zhang, X., Zheng, H. T., et al. (2018). Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceeding of ECCV.
Maclaurin, D., Duvenaud, D., & Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceeding of ICML.
Milan, A., Leal-Taixé, L., & Reid, I., et al. (2016). Mot16: A benchmark for multi-object tracking. arXiv:1603.00831
Mittal, G., Liu, C., & Karianakis, N. et al. (2020). Hyperstar: Task-aware hyperparameters for deep networks. In Proceeding of IEEE/CVF CVPR.
Molchanov, P., Mallya, A., & Tyree, S., et al. (2019). Importance estimation for neural network pruning. In Proceeding of IEEE CVPR.
Netzer, Y., Wang, T., & Coates, A. et al. (2011). Reading digits in natural images with unsupervised feature learning.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. et al. (2012). Cats and dogs. In Proceeding of IEEE CVPR.
Pedregosa, F. (2016). Hyperparameter optimization with approximate gradient. In Proceeding of ICML.
Perrone, V., Jenatton, R., & Seeger, M. W. et al. (2018). Scalable hyperparameter transfer learning. In Proceeding of NeurIPS.
Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceeding of IEEE CVPR.
Redmon, J., & Farhadi, A. (2017). Yolo9000: better, faster, stronger. In Proceeding of IEEE CVPR.
Ren, S., He, K., & Girshick, R. et al. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceeding of NeurIPS.
Romberg, S., Pueyo, L. G., & Lienhart, R. et al. (2011). Scalable logo recognition in real-world images. In Proceeding of ACM ICMR.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR https://arxiv.org/abs/1409.1556v6
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Proceeding of NeurIPS.
Snoek, J., Rippel, O., & Swersky, K. et al. (2015). Scalable bayesian optimization using deep neural networks. In: Proceeding of ICML.
Swersky, K., Snoek, J., & Adams, R. P. (2013). Multi-task bayesian optimization. In Proceeding of NeurIPS, (pp. 2004–2012).
Swersky, K., Snoek, J., & Prescott Adams, R. (2014). Freeze–thaw Bayesian optimization. arXiv:1406.3896
Tzeng, E., Hoffman, J., & Saenko, K., et al. (2017). Adversarial discriminative domain adaptation. In Proceeding of IEEE CVPR.
Vaswani, A., Shazeer, N., & Parmar, N., et al. (2017). Attention is all you need. In Proceeding of NeurIPS.
Wang, Y. X., Ramanan, D., & Hebert, M. (2017). Learning to model the tail. In Proceeding of NeurIPS.
Wong, C., Houlsby, N., & Lu, Y., et al. (2018). Transfer learning with neural automl. In Proceeding of NeurIPS.
Wu, X., Zhan, C., & Lai, Y. K., et al. (2019). Ip102: A large-scale benchmark dataset for insect pest recognition. In Proceeding of IEEE CVPR.
Xiao, J., Hays, J., & Ehinger, K. A. et al. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In Proceeding of IEEE CVPR.
Xiao, Y., Xing, E. P., & Neiswanger, W. (2021). Amortized auto-tuning: Cost-efficient bayesian transfer optimization for hyperparameter recommendation. arXiv:2106.09179
Xu, H., Kang, N., & Zhang, G. et al. (2021). Nasoa: Towards faster task-oriented online fine-tuning with a zoo of models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 5097–5106).
Xue, C., Yan, J., & Yan, R., et al. (2019). Transferable automl by model sharing over grouped datasets. In Proceeding of IEEE CVPR.
Yan, C., Zhang, Y., & Zhang, Q., et al. (2022). Privacy-preserving online automl for domain-specific face detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4134–4144).
Yang, D., Myronenko, A., & Wang, X. et al. (2021). T-automl: Automated machine learning for lesion segmentation using transformers in 3d medical imaging. In Proceedings of the IEEE/CVF international conference on computer vision, (pp. 3962–3974).
Yogatama, D., & Mann, G. (2014). Efficient transfer learning method for automatic hyperparameter tuning. In Proceeding of AISTATS.
Zhou, K., Hong, L., & Hu, S. et al. (2021). Dha: End-to-end joint optimization of data augmentation policy, hyper-parameter and architecture. arXiv:2109.05765
Zhou, B., Lapedriza, A., & Khosla, A., et al. (2017). Places: A 10 million image database for scene recognition. T-PAMI.
Zhu, M. (2004). Recall, precision and average precision. Dept of Statistics and Actuarial Science, Univ of Waterloo, 2(30), 6.
Google Scholar
Ziller, A., Hansjakob, J., & Rusinov, V. et al. (2019). Oktoberfest food dataset. arXiv:1912.05007

Download references

Acknowledgements

Special thanks to Microsoft Custom Vision team for their valuable feedback and support.

Author information

C. Liu and G. Mittal: Equal contribution. Chang Liu was a research intern at Microsoft.

Authors and Affiliations

Northeastern University, Boston, MA, USA
Chang Liu & Yun Fu
Microsoft Cloud + AI, Redmond, WA, USA
Gaurav Mittal, Nikolaos Karianakis, Victor Fragoso, Ye Yu & Mei Chen

Authors

Chang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Gaurav Mittal
View author publications
You can also search for this author inPubMed Google Scholar
Nikolaos Karianakis
View author publications
You can also search for this author inPubMed Google Scholar
Victor Fragoso
View author publications
You can also search for this author inPubMed Google Scholar
Ye Yu
View author publications
You can also search for this author inPubMed Google Scholar
Yun Fu
View author publications
You can also search for this author inPubMed Google Scholar
Mei Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mei Chen.

Additional information

Communicated by Arun Mallya.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 1650 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, C., Mittal, G., Karianakis, N. et al. HyperSTAR: Task-Aware Hyperparameter Recommendation for Training and Compression. Int J Comput Vis 132, 1913–1927 (2024). https://doi.org/10.1007/s11263-023-01961-0

Download citation

Received: 20 December 2022
Accepted: 19 November 2023
Published: 21 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11263-023-01961-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HyperSTAR: Task-Aware Hyperparameter Recommendation for Training and Compression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Meta-hyperband: Hyperparameter Optimization with Meta-learning and Coarse-to-Fine

Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization

Hyperparameter Search Space Pruning – A New Component for Sequential Model-Based Hyperparameter Optimization

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 1650 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now