Skip to main content

Direct Distillation Between Different Domains

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15138))

Included in the following conference series:

  • 265 Accesses

Abstract

Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the teacher network (i.e., the source domain). The traditional domain adaptation techniques can be integrated with KD in a two-stage process to bridge the domain gap, but the ultimate reliability of two-stage approaches tends to be limited due to the high computational consumption and the additional errors accumulated from both stages. To solve this problem, we propose a new one-stage method dubbed “Direct Distillation between Different Domains” (4Ds). We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data. As a result, the teacher network can effectively transfer categorical knowledge that aligns with the target domain of the student network. Intensive experiments on various benchmark datasets demonstrate that our proposed 4Ds method successfully produces reliable student networks and outperforms state-of-the-art approaches. Code is available at https://github.com/tangjialiang97/4Ds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Each Fourier coefficient \(\boldsymbol{\mathcal {F}}^{T}_{\text {ad}}(u, v)=\frac{1}{H W} \sum _{h=1}^{H} \sum _{w=1}^{W} \textbf{f}_{\text {ad}}^{T}(h, w) e ^{-\textrm{i} 2 \pi \left( \frac{u h}{H}+\frac{v w}{W}\right) }=\boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {real}}(u, v)+\textrm{i} \boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {img}}(u, v),\) where \(\textrm{i}\) is the imaginary unit, \(\boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {real}}\) and \(\boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {img}}\) are the real and imaginary parts, respectively.

  2. 2.

    Each \(\textbf{f}_{\text {ift}}^{T}(h, w)\) is computed as: \(\textbf{f}_{\text {ift}}^{T}(h, w)=\frac{1}{U V} \sum _{u=1}^{U} \sum _{v=1}^{V} \boldsymbol{\mathcal {F}}_{\text {ref}}^{T}(u, v) e ^{\textrm{i} 2 \pi \left( \frac{u h}{U}+\frac{v w}{V}\right) }\).

References

  1. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9163–9171 (2019)

    Google Scholar 

  2. Chen, D., Mei, J.P., Zhang, H., Wang, C., Feng, Y., Chen, C.: Knowledge distillation with the reused teacher classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11933–11942 (2022)

    Google Scholar 

  3. Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., Chen, C.: Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 7028–7036 (2021)

    Google Scholar 

  4. Chen, G., Peng, P., Ma, L., Li, J., Du, L., Tian, Y.: Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 458–467 (2021)

    Google Scholar 

  5. Chen, H., Guo, T., Xu, C., Li, W., Xu, C., Xu, C., Wang, Y.: Learning student networks in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6428–6437 (2021)

    Google Scholar 

  6. Chen, H., et al.: Data-free learning of student networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3514–3522 (2019)

    Google Scholar 

  7. Chen, P., Liu, S., Zhao, H., Jia, J.: Distilling knowledge via knowledge review. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5008–5017 (2021)

    Google Scholar 

  8. Choi, Y., Choi, J., El-Khamy, M., Lee, J.: Data-free network quantization with adversarial knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 710–711 (2020)

    Google Scholar 

  9. Cooley, J.W., Lewis, P.A., Welch, P.D.: The fast fourier transform and its applications. IEEE Trans. Educ. (ToE) 12(1), 27–34 (1969)

    Article  Google Scholar 

  10. Ding, Y., Sheng, L., Liang, J., Zheng, A., He, R.: Proxymix: proxy-based mixup training with label refinery for source-free domain adaptation. Neural Networks (NN) 167, 92–103 (2023)

    Article  Google Scholar 

  11. Do, K., et al.: Momentum adversarial distillation: Handling large distribution shifts in data-free knowledge distillation. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)

    Google Scholar 

  12. Dong, C., Li, Y., Shen, Y., Qiu, M.: Hrkd: hierarchical relational knowledge distillation for cross-domain language model compression. arXiv preprint arXiv:2110.08551 (2021)

  13. Fang, G., Bao, Y., Song, J., Wang, X., Xie, D., Shen, C., Song, M.: Mosaicking to distill: Knowledge distillation from out-of-domain data. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 11920–11932 (2021)

    Google Scholar 

  14. Feng, W., Ju, L., Wang, L., Song, K., Zhao, X., Ge, Z.: Unsupervised domain adaptation for medical image segmentation by selective entropy constraints and adaptive semantic alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 623–631 (2023)

    Google Scholar 

  15. Gao, H., Guo, J., Wang, G., Zhang, Q.: Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9913–9923 (2022)

    Google Scholar 

  16. Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2066–2073 (2012)

    Google Scholar 

  17. Gong, X., et al.: Preserving privacy in federated learning with ensemble cross-domain knowledge distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11891–11899 (2022)

    Google Scholar 

  18. Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. (IJCV) 129(6), 1789–1819 (2021)

    Article  Google Scholar 

  19. Guo, Z., Yan, H., Li, H., Lin, X.: Class attention transfer based knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11868–11877 (2023)

    Google Scholar 

  20. Hao, Z., Guo, J., Han, K., Tang, Y., Hu, H., Wang, Y., Xu, C.: One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation. Advances in Neural Information Processing Systems (NeurIPs) 36 (2024)

    Google Scholar 

  21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  22. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)

  23. Huang, J., Guan, D., Xiao, A., Lu, S.: Fsdr: Frequency space domain randomization for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6891–6902 (2021)

    Google Scholar 

  24. Huang, J., Guan, D., Xiao, A., Lu, S.: Model adaptation: Historical contrastive learning for unsupervised domain adaptation without source data. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 3635–3649 (2021)

    Google Scholar 

  25. Huang, T., You, S., Wang, F., Qian, C., Xu, C.: Knowledge distillation from a stronger teacher. In: Advances in Neural Information Processing Systems (NeurIPS) 35, pp. 33716–33727 (2022)

    Google Scholar 

  26. Islam, A., Chen, C.F.R., Panda, R., Karlinsky, L., Feris, R., Radke, R.J.: Dynamic distillation network for cross-domain few-shot recognition with unlabeled data. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 3584–3595 (2021)

    Google Scholar 

  27. Jiande, F., Weixin, X., Zongxiang, L.: A low complexity distributed multitarget detection and tracking algorithm. Chinese J. Electron. (CJE) 32(3), 429–437 (2023)

    Article  Google Scholar 

  28. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Google Scholar 

  29. Lee, S., Bae, J., Kim, H.Y.: Decompose, adjust, compose: effective normalization by playing with frequency for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11776–11785 (2023)

    Google Scholar 

  30. Li, J., Li, G., Shi, Y., Yu, Y.: Cross-domain adaptive clustering for semi-supervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2505–2514 (2021)

    Google Scholar 

  31. Li, J., Yu, Z., Du, Z., Zhu, L., Shen, H.T.: A comprehensive survey on source-free domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2024)

    Google Scholar 

  32. Li, L.: Self-regulated feature learning via teacher-free feature distillation. In: ECCV (2022)

    Google Scholar 

  33. Li, L., Dong, P., Wei, Z., Yang, Y.: Automated knowledge distillation via monte carlo tree search. In: ICCV (2023)

    Google Scholar 

  34. Li, W., Fan, K., Yang, H.: Teacher-student mutual learning for efficient source-free unsupervised domain adaptation. Knowl.-Based Syst. 261, 110204 (2023)

    Article  Google Scholar 

  35. Liang, J., Hu, D., Wang, Y., He, R., Feng, J.: Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 44(11), 8602–8617 (2021)

    Google Scholar 

  36. Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1410–1417 (2014)

    Google Scholar 

  37. Lu, W., Wang, J., Li, H., Chen, Y., Xie, X.: Domain-invariant feature exploration for domain generalization. arXiv preprint arXiv:2207.12020 (2022)

  38. Oppenheim, A.V., Lim, J.S.: The importance of phase in signals. Proc. IEEE 69(5), 529–541 (1981)

    Article  Google Scholar 

  39. Park, D.Y., Cha, M.H., Kim, D., Han, B., et al.: Learning student-friendly teacher networks for knowledge distillation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 13292–13303 (2021)

    Google Scholar 

  40. Peng, B., et al.: Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5007–5016 (2019)

    Google Scholar 

  41. Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1406–1415 (2019)

    Google Scholar 

  42. Pham, C., Nguyen, V.A., Le, T., Phung, D., Carneiro, G., Do, T.T.: Frequency attention for knowledge distillation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2277–2286 (2024)

    Google Scholar 

  43. Piotrowski, L.N., Campbell, F.W.: A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception 11(3), 337–346 (1982)

    Article  Google Scholar 

  44. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv:1412.6550 (2014)

  45. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  46. Tang, J., Chen, S., Niu, G., Sugiyama, M., Gong, C.: Distribution shift matters for knowledge distillation with webly collected images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  47. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1365–1374 (2019)

    Google Scholar 

  48. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7167–7176 (2017)

    Google Scholar 

  49. Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5018–5027 (2017)

    Google Scholar 

  50. Wang, Z., Li, C., Wang, X.: Convolutional neural network pruning with structural redundancy reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14913–14922 (2021)

    Google Scholar 

  51. Xu, Q., Zhang, R., Zhang, Y., Wang, Y., Tian, Q.: A fourier-based framework for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14383–14392 (2021)

    Google Scholar 

  52. Yang, J., Martinez, B., Bulat, A., Tzimiropoulos, G.: Knowledge distillation via adaptive instance normalization. arXiv:2003.04289 (2020)

  53. Yang, S., Wang, Y., Herranz, L., Jui, S., van de Weijer, J.: Casting a bait for offline and online source-free domain adaptation. Computer Vision and Image Understanding (CVIU), p. 103747 (2023)

    Google Scholar 

  54. Yang, S., van de Weijer, J., Herranz, L., Jui, S., et al.: Exploiting the intrinsic neighborhood structure for source-free domain adaptation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 29393–29405 (2021)

    Google Scholar 

  55. Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z., Yuan, C.: Masked generative distillation. In: European Conference on Computer Vision (ECCV), pp. 53–69. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_4

  56. Yang, Z., Zeng, A., Li, Z., Zhang, T., Yuan, C., Li, Y.: From knowledge distillation to self-knowledge distillation: a unified approach with normalized loss and customized soft labels. arXiv preprint arXiv:2303.13005 (2023)

  57. Zhang, B., Zhang, X., Liu, Y., Cheng, L., Li, Z.: Matching distributions between model and data: Cross-domain knowledge distillation for unsupervised domain adaptation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 5423–5433 (2021)

    Google Scholar 

  58. Zhang, R., Xie, C., Deng, L.: A fine-grained object detection model for aerial images based on yolov5 deep neural network. Chinese J. Electron. (CJE) 32(1), 51–63 (2023)

    Article  Google Scholar 

  59. Zhang, Z., et al.: Divide and contrast: Source-free domain adaptation via adaptive contrastive learning. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 5137–5149 (2022)

    Google Scholar 

  60. Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11953–11962 (2022)

    Google Scholar 

  61. Zhao, S., et al.: Multi-source distilling domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12975–12983 (2020)

    Google Scholar 

  62. Zhou, K., Yang, Y., Hospedales, T., Xiang, T.: Deep domain-adversarial image generation for domain generalisation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13025–13032 (2020)

    Google Scholar 

Download references

Acknowledgment

C. Gong was supported by NSF of China (Nos: 62336003, 12371510), NSF of Jiangsu Province (No: BZ2021013), NSF for Distinguished Young Scholar of Jiangsu Province (No: BK20220080), the Fundamental Research Funds for the Central Universities (Nos: 30920032202, 30921013114), the “111” Program (No: B13022). H. Zhu was supported by A*STAR AME Programmatic Funding (No: A18A2b0046), the RobotHTPO Seed Fund under Project (No: C211518008), and the EDB Space Technology Development Grant under Project (No: S22-19016-STDP). M. Sugiyama was supported by JST CREST Grant Number JPMJCR18A2 and a grant from Apple, Inc. J. Zhou was supported by the National Research Foundation, Prime Minister’s Office, Singapore, and the Ministry of Communications and Information, under its Online Trust and Safety (OTS) Research Programme (No: MCl-OTS-001) and SERC CentralResearch Fund (Use-inspired Basic Research). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of Apple, Inc., National Research Foundation, Singapore, or the Ministry of Communications and Information.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shuo Chen or Chen Gong .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 879 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, J. et al. (2025). Direct Distillation Between Different Domains. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15138. Springer, Cham. https://doi.org/10.1007/978-3-031-72989-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72989-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72988-1

  • Online ISBN: 978-3-031-72989-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics