Abstract
Hyperparameter optimization (HPO) methods alleviate the significant effort required to obtain hyperparameters that perform optimally on visual learning problems. Existing methods are computationally inefficient because they are task agnostic (i.e., they do not adapt to a given task). We present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware HPO algorithm that improves HPO efficiency for a target dataset by using prior knowledge from previous hyperparameter searches to recommend effective hyperparameters conditioned on the target dataset. HyperSTAR ranks and recommends hyperparameters by predicting their performance on the target dataset. To do so, it learns a joint dataset-hyperparameter space in an end-to-end manner that enables its performance predictor to use previously found effective hyperparameters for other similar tasks. The hyperparameter recommendations of HyperSTAR combined with existing HPO techniques lead to a task-aware HPO system that reduces the time to find the optimal hyperparameters for the target learning problem. Our experiments on image classification, object detection, and model pruning validate that HyperSTAR reduces the evaluation of different hyperparameter configurations by about \(50\%\) compared to existing methods and, when combined with Hyperband, uses only \(25\%\) of the budget required by the vanilla Hyperband and Bayesian Optimized Hyperband to achieve the best performance.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Dataset available at http://bit.ly/3r16oIA.
References
Achille, A., Lam, M., & Tewari, R. et al. (2019). Task2vec: Task embedding for meta-learning. In Proceeding of IEEE ICCV.
Bardenet, R., Brendel, M., & Kégl, B., et al. (2013). Collaborative hyperparameter tuning. In Proceedings of ICML.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 281–305.
Bossard, L., Guillaumin, M., & Van Gool, L. (2014). Food-101-mining discriminative components with random forests. In Proceedings of ECCV.
Chen, W., Liu, T. Y., & Lan, Y., et al. (2009). Ranking measures and loss functions in learning to rank. In Proceeding of NeurIPS.
Cimpoi, M., Maji, S., & Kokkinos, I., et al. (2014). Describing textures in the wild. In Proceeding of IEEE CVPR.
Deng, J., Dong, W., & Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In Proceeding of IEEE CVPR.
Donahue, J., Jia, Y., & Vinyals, O., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In Proceeding of ICML.
Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. In Proceeding of ICML.
Feurer, M., Klein, A., & Eggensperger, K., et al. (2015a). Efficient and robust automated machine learning. In Proceeding of NeurIPS.
Feurer, M., Klein, A., & Eggensperger, K., et al. (2015b). Efficient and robust automated machine learning. In Proceeding of NeurIPS.
Feurer, M., Springenberg, J., & Hutter, F. (2015c). Initializing Bayesian hyperparameter optimization via meta-learning. In Proceeding of the AAAI Conf. on AI.
Franceschi, L., Donini, M., & Frasconi, P., et al. (2017). Forward and reverse gradient-based hyperparameter optimization. In Proceeding of ICML.
Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In Proceeding of ICML.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceeding of IEEE CVPR.
Girshick, R. (2015). Fast r-cnn. In Proceeding of IEEE ICCV.
Goldman, E., Herzig, R., & Eisenschtat, A. et al. (2019). Precise detection in densely packed scenes. In Proceeding of IEEE CVPR.
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset.
He, K., Gkioxari, G., & Dollár, P., et al. (2017). Mask r-cnn. In Proceeding of IEEE ICCV.
He, K., Zhang, X., & Ren, S. et al. (2016). Deep residual learning for image recognition. In Proceeding of IEEE CVPR.
Hoffman, J., Tzeng, E., & Park, T. et al. (2018). Cycada: Cycle-consistent adversarial domain adaptation. In Proceeding of ICML.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceeding of IEEE CVPR.
Huang, G., Liu, Z., & Van Der Maaten, L. et al. (2017). Densely connected convolutional networks. In Proceeding of IEEE CVPR.
Hutter, F., Kotthoff, L., & Vanschoren, J. (eds) (2018). Automatic machine learning: Methods, systems, challenges. Springer (in press), available at http://automl.org/book.
Iwana, B. K., Raza Rizvi, S. T., & Ahmed, S., et al. (2016). Judging a book by its cover. arXiv:1610.09204.
Jamieson, K., & Talwalkar, A. (2016). Non-stochastic best arm identification and hyperparameter optimization. In Proceeding of AISTATS.
Jin, H., Song, Q., & Hu, X. (2019). Auto-keras: An efficient neural architecture search system. In Proceeding of ACM KDD.
Kandasamy, K., Dasarathy, G., & Schneider, J., et al. (2017). Multi-fidelity bayesian optimisation with continuous approximations. In Proceeding of ICML.
Kim, J., Kim, S., & Choi, S. (2017). Learning to warm-start bayesian hyperparameter optimization. arXiv:1710.06219.
Klein, A., Falkner, S., & Bartels, S. et al. (2017). Fast bayesian optimization of machine learning hyperparameters on large datasets. In Proceeding of AISTATS.
Klein, A., Falkner, S., & Springenberg, J. T., et al. (2016). Learning curve prediction with Bayesian neural networks. ICLR.
Kokiopoulou, E., Hauth, A., & Sbaiz, L., et al. (2019). Fast task-aware architecture inference. arXiv:1902.05781
Kozerawski, J., Fragoso, V., & Karianakis, N. et al. (2020). BLT: Balancing long-tailed datasets with adversarially-perturbed images. In Proceeding of ACCV.
Li, H., Fowlkes, C., & Yang, H., et al. (2023). Guided recommendation for model fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3633–3642).
Li, L., Jamieson, K., & DeSalvo, G., et al. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research.
Lindauer, M., & Hutter, F. (2018). Warmstarting of model-based algorithm configuration. In Proceeding of the AAAI Conference on AI.
Liu, Z., Luo, P., & Qiu, S., et al. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceeding of IEEE CVPR.
Ma, N., Zhang, X., Zheng, H. T., et al. (2018). Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceeding of ECCV.
Maclaurin, D., Duvenaud, D., & Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceeding of ICML.
Milan, A., Leal-Taixé, L., & Reid, I., et al. (2016). Mot16: A benchmark for multi-object tracking. arXiv:1603.00831
Mittal, G., Liu, C., & Karianakis, N. et al. (2020). Hyperstar: Task-aware hyperparameters for deep networks. In Proceeding of IEEE/CVF CVPR.
Molchanov, P., Mallya, A., & Tyree, S., et al. (2019). Importance estimation for neural network pruning. In Proceeding of IEEE CVPR.
Netzer, Y., Wang, T., & Coates, A. et al. (2011). Reading digits in natural images with unsupervised feature learning.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. et al. (2012). Cats and dogs. In Proceeding of IEEE CVPR.
Pedregosa, F. (2016). Hyperparameter optimization with approximate gradient. In Proceeding of ICML.
Perrone, V., Jenatton, R., & Seeger, M. W. et al. (2018). Scalable hyperparameter transfer learning. In Proceeding of NeurIPS.
Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceeding of IEEE CVPR.
Redmon, J., & Farhadi, A. (2017). Yolo9000: better, faster, stronger. In Proceeding of IEEE CVPR.
Ren, S., He, K., & Girshick, R. et al. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceeding of NeurIPS.
Romberg, S., Pueyo, L. G., & Lienhart, R. et al. (2011). Scalable logo recognition in real-world images. In Proceeding of ACM ICMR.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR https://arxiv.org/abs/1409.1556v6
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Proceeding of NeurIPS.
Snoek, J., Rippel, O., & Swersky, K. et al. (2015). Scalable bayesian optimization using deep neural networks. In: Proceeding of ICML.
Swersky, K., Snoek, J., & Adams, R. P. (2013). Multi-task bayesian optimization. In Proceeding of NeurIPS, (pp. 2004–2012).
Swersky, K., Snoek, J., & Prescott Adams, R. (2014). Freeze–thaw Bayesian optimization. arXiv:1406.3896
Tzeng, E., Hoffman, J., & Saenko, K., et al. (2017). Adversarial discriminative domain adaptation. In Proceeding of IEEE CVPR.
Vaswani, A., Shazeer, N., & Parmar, N., et al. (2017). Attention is all you need. In Proceeding of NeurIPS.
Wang, Y. X., Ramanan, D., & Hebert, M. (2017). Learning to model the tail. In Proceeding of NeurIPS.
Wong, C., Houlsby, N., & Lu, Y., et al. (2018). Transfer learning with neural automl. In Proceeding of NeurIPS.
Wu, X., Zhan, C., & Lai, Y. K., et al. (2019). Ip102: A large-scale benchmark dataset for insect pest recognition. In Proceeding of IEEE CVPR.
Xiao, J., Hays, J., & Ehinger, K. A. et al. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In Proceeding of IEEE CVPR.
Xiao, Y., Xing, E. P., & Neiswanger, W. (2021). Amortized auto-tuning: Cost-efficient bayesian transfer optimization for hyperparameter recommendation. arXiv:2106.09179
Xu, H., Kang, N., & Zhang, G. et al. (2021). Nasoa: Towards faster task-oriented online fine-tuning with a zoo of models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 5097–5106).
Xue, C., Yan, J., & Yan, R., et al. (2019). Transferable automl by model sharing over grouped datasets. In Proceeding of IEEE CVPR.
Yan, C., Zhang, Y., & Zhang, Q., et al. (2022). Privacy-preserving online automl for domain-specific face detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4134–4144).
Yang, D., Myronenko, A., & Wang, X. et al. (2021). T-automl: Automated machine learning for lesion segmentation using transformers in 3d medical imaging. In Proceedings of the IEEE/CVF international conference on computer vision, (pp. 3962–3974).
Yogatama, D., & Mann, G. (2014). Efficient transfer learning method for automatic hyperparameter tuning. In Proceeding of AISTATS.
Zhou, K., Hong, L., & Hu, S. et al. (2021). Dha: End-to-end joint optimization of data augmentation policy, hyper-parameter and architecture. arXiv:2109.05765
Zhou, B., Lapedriza, A., & Khosla, A., et al. (2017). Places: A 10 million image database for scene recognition. T-PAMI.
Zhu, M. (2004). Recall, precision and average precision. Dept of Statistics and Actuarial Science, Univ of Waterloo, 2(30), 6.
Ziller, A., Hansjakob, J., & Rusinov, V. et al. (2019). Oktoberfest food dataset. arXiv:1912.05007
Acknowledgements
Special thanks to Microsoft Custom Vision team for their valuable feedback and support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Arun Mallya.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, C., Mittal, G., Karianakis, N. et al. HyperSTAR: Task-Aware Hyperparameter Recommendation for Training and Compression. Int J Comput Vis 132, 1913–1927 (2024). https://doi.org/10.1007/s11263-023-01961-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01961-0