Abstract
Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in their estimation process, limiting their applicability in a more realistic setting where only the trained model is accessible. The few methods that do not require source data exhibit considerably inferior performance. In this work, we propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data. We establish connections between our approach for unsupervised calibration and temperature scaling. We then employ a gradient-based strategy to evaluate the correctness of the calibrated predictions. Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability. Furthermore, our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baek, C., Jiang, Y., Raghunathan, A., Kolter, J.Z.: Agreement-on-the-line: predicting the performance of neural networks under distribution shift. In: Advances in Neural Information Processing Systems (NeurIPS) (2022). https://openreview.net/forum?id=EZZsnke1kt
Bándi, P., et al.: From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE Trans. Med. Imaging 38, 550–560 (2019). https://api.semanticscholar.org/CorpusID:59600431
Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Christie, G., Fendley, N., Wilson, J., Mukherjee, R.: Functional map of the world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6172–6180 (2018). https://doi.org/10.1109/CVPR.2018.00646
Deng, W., Gould, S., Zheng, L.: What does rotation prediction tell us about classifier accuracy under varying testing environments? In: Proceedings of the International Conference of Machine Learning (ICML) (2021). https://proceedings.mlr.press/v139/deng21a/deng21a.pdf
Deng, W., Suh, Y., Gould, S., Zheng, L.: Confidence and dispersity speak: Characterising prediction matrix for unsupervised accuracy estimation. In: Proceedings of the International Conference on Machine Learning (ICML) (2023). https://api.semanticscholar.org/CorpusID:256503627
Deng, W., Zheng, L.: Are labels always necessary for classifier accuracy evaluation? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://openaccess.thecvf.com/content/CVPR2021/papers/Deng_Are_Labels_Always_Necessary_for_Classifier_Accuracy_Evaluation_CVPR_2021_paper.pdf
Denker, J., et al.: Neural network recognizer for hand-written zip code digits. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 1 (1989)
Fang, C., Xu, Y., Rockmore, D.N.: Unbiased metric learning: on the utilization of multiple datasets and web images for softening bias. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1657–1664 (2013). https://api.semanticscholar.org/CorpusID:722896
de G. Matthews, A.G., Hron, J., Rowland, M., Turner, R.E., Ghahramani, Z.: Gaussian process behaviour in wide deep neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018). https://openreview.net/forum?id=H1-nGgWC-
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the International Conference of Machine Learning (ICML) (2015). https://proceedings.mlr.press/v37/ganin15.pdf
Garg, S., Balakrishnan, S., Lipton, Z.C., Neyshabur, B., Sedghi, H.: Leveraging unlabeled data to predict out-of-distribution performance. In: Proceedings of the International Conference on Learning Representations, ICLR (2022). https://arxiv.org/abs/2201.04234
Garriga-Alonso, A., Rasmussen, C.E., Aitchison, L.: Deep convolutional networks as shallow gaussian processes. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019). https://openreview.net/forum?id=Bklfsi0cKm
Gaston, J., Das, S., of Computer Science, R.P.I.D.: Active Learning of Gaussian Mixture Models Using Direct Estimation of Error Reduction (2012). https://books.google.com.au/books?id=dPh8zQEACAAJ
Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Norouzi, M., Swersky, K.: Your classifier is secretly an energy based model and you should treat it like one. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020). https://openreview.net/forum?id=Hkxzx0NtDB
Guillory, D., Shankar, V., Ebrahimi, S., Darrell, T., Schmidt, L.: Predicting with confidence on unseen distributions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1114–1124 (2021). https://doi.org/10.1109/ICCV48922.2021.00117
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021), https://openreview.net/forum?id=lQdXeXDoWtI
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1321–1330 (2017), https://proceedings.mlr.press/v70/guo17a/guo17a.pdf
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. ArXiv abs/1610.02136 (2016). https://api.semanticscholar.org/CorpusID:13046179
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ArXiv abs/1503.02531 (2015). https://api.semanticscholar.org/CorpusID:7200347
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Huang, R., Geng, A., Li, Y.: On the importance of gradients for detecting distributional shifts in the wild. In: Advances in Neural Information Processing Systems (NeurIPS) (2021). https://proceedings.nips.cc/paper_files/paper/2021/file/063e26c670d07bb7c4d30e6fc69fe056-Paper.pdf
Jiang, Y., Nagarajan, V., Baek, C., Kolter, J.Z.: Assessing generalization of SGD via disagreement. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022). https://openreview.net/forum?id=WvOGCEAQhxl
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems (NeurIPS). vol. 30 (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
Koh, P.W., et al.: WILDS: a benchmark of in-the-wild distribution shifts. In: Proceedings of the International Conference on Machine Learning (ICML). vol. 139, pp. 5637–5664 (2021). https://proceedings.mlr.press/v139/koh21a.html
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. ATT Labs [Online] 2 (2010). http://yann.lecun.com/exdb/mnist
Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5543–5551 (2017). https://api.semanticscholar.org/CorpusID:6037691
Lu, Y., Wang, Z., Zhai, R., Kolouri, S., Campbell, J., Sycara, K.P.: Predicting out-of-distribution error with confidence optimal transport. In: ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML (2023). https://openreview.net/pdf?id=dNGxmwRpFyG
Manh Bui, H., Liu, A.: Density-softmax: efficient test-time model for uncertainty estimation and robustness under distribution shifts. In: Proceedings of the International Conference on Machine Learning (ICML) (2024)
Mukhoti, J., Kirsch, A., van Amersfoor t, J., Torr, P.H., Gal, Y.: Deep deterministic uncertainty: a new simple baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24384–24394 (2023). https://openaccess.thecvf.com/content/CVPR2023/papers/Mukhoti_Deep_Deterministic_Uncertainty_A_New_Simple_Baseline_CVPR_2023_paper.pdf
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011). https://ai.stanford.edu/~twangcat/papers/nips2011_housenumbers.pdf
Peng, R., Zou, H., Wang, H., Zeng, Y., Huang, Z., Zhao, J.: Energy-based automated model evaluation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2024). https://openreview.net/forum?id=CHGcP6lVWd
Rosenfeld, E., Garg, S.: (almost) provable error bounds under distribution shift via disagreement discrepancy. ArXiv abs/2306.00312 (2023). https://api.semanticscholar.org/CorpusID:258999608
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://openaccess.thecvf.com/content_cvpr_2017/papers/Venkateswara_Deep_Hashing_Network_CVPR_2017_paper.pdf
Xie, R., Odonnat, A., Feofanov, V., Redko, I., Zhang, J., An, B.: Leveraging gradients for unsupervised accuracy estimation under distribution shift (2024)
Xie, R., Wei, H., Feng, L., Cao, Y., An, B.: On the importance of feature separability in predicting out-of-distribution error. In: Advances in Neural Information Processing Systems (NeurIPS) (2023). https://openreview.net/forum?id=A86JTXllHa
Yu, Y., Yang, Z., Wei, A., Ma, Y., Steinhardt, J.: Predicting out-of-distribution error with the projection norm. In: Proceedings of the International Conference on Machine Learning (ICML). vol. 162, pp. 25721–25746 (2022). https://proceedings.mlr.press/v162/yu22i/yu22i.pdf
Acknowledgements
This research is supported by the National Key Research and Development Program of China No. 2020AAA0109400 and the Shenyang Science and Technology Plan Fund (No. 21-102-0-09).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khramtsova, E., Baktashmotlagh, M., Zuccon, G., Wang, X., Salzmann, M. (2025). Source-Free Domain-Invariant Performance Prediction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15138. Springer, Cham. https://doi.org/10.1007/978-3-031-72989-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-72989-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72988-1
Online ISBN: 978-3-031-72989-8
eBook Packages: Computer ScienceComputer Science (R0)