Direct Distillation Between Different Domains

Tang, Jialiang; Chen, Shuo; Niu, Gang; Zhu, Hongyuan; Zhou, Joey Tianyi; Gong, Chen; Sugiyama, Masashi

doi:10.1007/978-3-031-72989-8_9

Jialiang Tang^13,14,15,
Shuo Chen¹⁶,
Gang Niu¹⁶,
Hongyuan Zhu^17,18,
Joey Tianyi Zhou^18,19,20,
Chen Gong^13,14,15 &
…
Masashi Sugiyama^16,21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15138))

Included in the following conference series:

European Conference on Computer Vision

265 Accesses

Abstract

Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the teacher network (i.e., the source domain). The traditional domain adaptation techniques can be integrated with KD in a two-stage process to bridge the domain gap, but the ultimate reliability of two-stage approaches tends to be limited due to the high computational consumption and the additional errors accumulated from both stages. To solve this problem, we propose a new one-stage method dubbed “Direct Distillation between Different Domains” (4Ds). We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data. As a result, the teacher network can effectively transfer categorical knowledge that aligns with the target domain of the student network. Intensive experiments on various benchmark datasets demonstrate that our proposed 4Ds method successfully produces reliable student networks and outperforms state-of-the-art approaches. Code is available at https://github.com/tangjialiang97/4Ds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Joint Regularization Knowledge Distillation

Switchable Online Knowledge Distillation

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

Notes

1.
Each Fourier coefficient $\boldsymbol{\mathcal {F}}^{T}_{\text {ad}}(u, v)=\frac{1}{H W} \sum _{h=1}^{H} \sum _{w=1}^{W} \textbf{f}_{\text {ad}}^{T}(h, w) e ^{-\textrm{i} 2 \pi \left( \frac{u h}{H}+\frac{v w}{W}\right) }=\boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {real}}(u, v)+\textrm{i} \boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {img}}(u, v),$ where $\textrm{i}$ is the imaginary unit, $\boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {real}}$ and $\boldsymbol{\mathcal {F}}^{T}_{\text {ad}\_\text {img}}$ are the real and imaginary parts, respectively.
2.
Each $\textbf{f}_{\text {ift}}^{T}(h, w)$ is computed as: $\textbf{f}_{\text {ift}}^{T}(h, w)=\frac{1}{U V} \sum _{u=1}^{U} \sum _{v=1}^{V} \boldsymbol{\mathcal {F}}_{\text {ref}}^{T}(u, v) e ^{\textrm{i} 2 \pi \left( \frac{u h}{U}+\frac{v w}{V}\right) }$.

References

Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9163–9171 (2019)
Google Scholar
Chen, D., Mei, J.P., Zhang, H., Wang, C., Feng, Y., Chen, C.: Knowledge distillation with the reused teacher classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11933–11942 (2022)
Google Scholar
Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., Chen, C.: Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 7028–7036 (2021)
Google Scholar
Chen, G., Peng, P., Ma, L., Li, J., Du, L., Tian, Y.: Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 458–467 (2021)
Google Scholar
Chen, H., Guo, T., Xu, C., Li, W., Xu, C., Xu, C., Wang, Y.: Learning student networks in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6428–6437 (2021)
Google Scholar
Chen, H., et al.: Data-free learning of student networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3514–3522 (2019)
Google Scholar
Chen, P., Liu, S., Zhao, H., Jia, J.: Distilling knowledge via knowledge review. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5008–5017 (2021)
Google Scholar
Choi, Y., Choi, J., El-Khamy, M., Lee, J.: Data-free network quantization with adversarial knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 710–711 (2020)
Google Scholar
Cooley, J.W., Lewis, P.A., Welch, P.D.: The fast fourier transform and its applications. IEEE Trans. Educ. (ToE) 12(1), 27–34 (1969)
Article Google Scholar
Ding, Y., Sheng, L., Liang, J., Zheng, A., He, R.: Proxymix: proxy-based mixup training with label refinery for source-free domain adaptation. Neural Networks (NN) 167, 92–103 (2023)
Article Google Scholar
Do, K., et al.: Momentum adversarial distillation: Handling large distribution shifts in data-free knowledge distillation. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)
Google Scholar
Dong, C., Li, Y., Shen, Y., Qiu, M.: Hrkd: hierarchical relational knowledge distillation for cross-domain language model compression. arXiv preprint arXiv:2110.08551 (2021)
Fang, G., Bao, Y., Song, J., Wang, X., Xie, D., Shen, C., Song, M.: Mosaicking to distill: Knowledge distillation from out-of-domain data. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 11920–11932 (2021)
Google Scholar
Feng, W., Ju, L., Wang, L., Song, K., Zhao, X., Ge, Z.: Unsupervised domain adaptation for medical image segmentation by selective entropy constraints and adaptive semantic alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 623–631 (2023)
Google Scholar
Gao, H., Guo, J., Wang, G., Zhang, Q.: Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9913–9923 (2022)
Google Scholar
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2066–2073 (2012)
Google Scholar
Gong, X., et al.: Preserving privacy in federated learning with ensemble cross-domain knowledge distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11891–11899 (2022)
Google Scholar
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. (IJCV) 129(6), 1789–1819 (2021)
Article Google Scholar
Guo, Z., Yan, H., Li, H., Lin, X.: Class attention transfer based knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11868–11877 (2023)
Google Scholar
Hao, Z., Guo, J., Han, K., Tang, Y., Hu, H., Wang, Y., Xu, C.: One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation. Advances in Neural Information Processing Systems (NeurIPs) 36 (2024)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
Huang, J., Guan, D., Xiao, A., Lu, S.: Fsdr: Frequency space domain randomization for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6891–6902 (2021)
Google Scholar
Huang, J., Guan, D., Xiao, A., Lu, S.: Model adaptation: Historical contrastive learning for unsupervised domain adaptation without source data. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 3635–3649 (2021)
Google Scholar
Huang, T., You, S., Wang, F., Qian, C., Xu, C.: Knowledge distillation from a stronger teacher. In: Advances in Neural Information Processing Systems (NeurIPS) 35, pp. 33716–33727 (2022)
Google Scholar
Islam, A., Chen, C.F.R., Panda, R., Karlinsky, L., Feris, R., Radke, R.J.: Dynamic distillation network for cross-domain few-shot recognition with unlabeled data. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 3584–3595 (2021)
Google Scholar
Jiande, F., Weixin, X., Zongxiang, L.: A low complexity distributed multitarget detection and tracking algorithm. Chinese J. Electron. (CJE) 32(3), 429–437 (2023)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Google Scholar
Lee, S., Bae, J., Kim, H.Y.: Decompose, adjust, compose: effective normalization by playing with frequency for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11776–11785 (2023)
Google Scholar
Li, J., Li, G., Shi, Y., Yu, Y.: Cross-domain adaptive clustering for semi-supervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2505–2514 (2021)
Google Scholar
Li, J., Yu, Z., Du, Z., Zhu, L., Shen, H.T.: A comprehensive survey on source-free domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2024)
Google Scholar
Li, L.: Self-regulated feature learning via teacher-free feature distillation. In: ECCV (2022)
Google Scholar
Li, L., Dong, P., Wei, Z., Yang, Y.: Automated knowledge distillation via monte carlo tree search. In: ICCV (2023)
Google Scholar
Li, W., Fan, K., Yang, H.: Teacher-student mutual learning for efficient source-free unsupervised domain adaptation. Knowl.-Based Syst. 261, 110204 (2023)
Article Google Scholar
Liang, J., Hu, D., Wang, Y., He, R., Feng, J.: Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 44(11), 8602–8617 (2021)
Google Scholar
Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1410–1417 (2014)
Google Scholar
Lu, W., Wang, J., Li, H., Chen, Y., Xie, X.: Domain-invariant feature exploration for domain generalization. arXiv preprint arXiv:2207.12020 (2022)
Oppenheim, A.V., Lim, J.S.: The importance of phase in signals. Proc. IEEE 69(5), 529–541 (1981)
Article Google Scholar
Park, D.Y., Cha, M.H., Kim, D., Han, B., et al.: Learning student-friendly teacher networks for knowledge distillation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 13292–13303 (2021)
Google Scholar
Peng, B., et al.: Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5007–5016 (2019)
Google Scholar
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1406–1415 (2019)
Google Scholar
Pham, C., Nguyen, V.A., Le, T., Phung, D., Carneiro, G., Do, T.T.: Frequency attention for knowledge distillation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2277–2286 (2024)
Google Scholar
Piotrowski, L.N., Campbell, F.W.: A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception 11(3), 337–346 (1982)
Article Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv:1412.6550 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Tang, J., Chen, S., Niu, G., Sugiyama, M., Gong, C.: Distribution shift matters for knowledge distillation with webly collected images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1365–1374 (2019)
Google Scholar
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7167–7176 (2017)
Google Scholar
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5018–5027 (2017)
Google Scholar
Wang, Z., Li, C., Wang, X.: Convolutional neural network pruning with structural redundancy reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14913–14922 (2021)
Google Scholar
Xu, Q., Zhang, R., Zhang, Y., Wang, Y., Tian, Q.: A fourier-based framework for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14383–14392 (2021)
Google Scholar
Yang, J., Martinez, B., Bulat, A., Tzimiropoulos, G.: Knowledge distillation via adaptive instance normalization. arXiv:2003.04289 (2020)
Yang, S., Wang, Y., Herranz, L., Jui, S., van de Weijer, J.: Casting a bait for offline and online source-free domain adaptation. Computer Vision and Image Understanding (CVIU), p. 103747 (2023)
Google Scholar
Yang, S., van de Weijer, J., Herranz, L., Jui, S., et al.: Exploiting the intrinsic neighborhood structure for source-free domain adaptation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 29393–29405 (2021)
Google Scholar
Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z., Yuan, C.: Masked generative distillation. In: European Conference on Computer Vision (ECCV), pp. 53–69. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_4
Yang, Z., Zeng, A., Li, Z., Zhang, T., Yuan, C., Li, Y.: From knowledge distillation to self-knowledge distillation: a unified approach with normalized loss and customized soft labels. arXiv preprint arXiv:2303.13005 (2023)
Zhang, B., Zhang, X., Liu, Y., Cheng, L., Li, Z.: Matching distributions between model and data: Cross-domain knowledge distillation for unsupervised domain adaptation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 5423–5433 (2021)
Google Scholar
Zhang, R., Xie, C., Deng, L.: A fine-grained object detection model for aerial images based on yolov5 deep neural network. Chinese J. Electron. (CJE) 32(1), 51–63 (2023)
Article Google Scholar
Zhang, Z., et al.: Divide and contrast: Source-free domain adaptation via adaptive contrastive learning. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 5137–5149 (2022)
Google Scholar
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11953–11962 (2022)
Google Scholar
Zhao, S., et al.: Multi-source distilling domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12975–12983 (2020)
Google Scholar
Zhou, K., Yang, Y., Hospedales, T., Xiang, T.: Deep domain-adversarial image generation for domain generalisation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13025–13032 (2020)
Google Scholar

Download references

Acknowledgment

C. Gong was supported by NSF of China (Nos: 62336003, 12371510), NSF of Jiangsu Province (No: BZ2021013), NSF for Distinguished Young Scholar of Jiangsu Province (No: BK20220080), the Fundamental Research Funds for the Central Universities (Nos: 30920032202, 30921013114), the “111” Program (No: B13022). H. Zhu was supported by A*STAR AME Programmatic Funding (No: A18A2b0046), the RobotHTPO Seed Fund under Project (No: C211518008), and the EDB Space Technology Development Grant under Project (No: S22-19016-STDP). M. Sugiyama was supported by JST CREST Grant Number JPMJCR18A2 and a grant from Apple, Inc. J. Zhou was supported by the National Research Foundation, Prime Minister’s Office, Singapore, and the Ministry of Communications and Information, under its Online Trust and Safety (OTS) Research Programme (No: MCl-OTS-001) and SERC CentralResearch Fund (Use-inspired Basic Research). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of Apple, Inc., National Research Foundation, Singapore, or the Ministry of Communications and Information.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Jialiang Tang & Chen Gong
Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing, China
Jialiang Tang & Chen Gong
Jiangsu Key Laboratory of Image and Video Understanding for Social Security, Nanjing, China
Jialiang Tang & Chen Gong
Center for Advanced Intelligence Project, RIKEN, Riken, Japan
Shuo Chen, Gang Niu & Masashi Sugiyama
Institute for Infocomm Research (I²R), A*STAR, Singapore, Singapore
Hongyuan Zhu
Centre for Frontier AI Research (CFAR), A*STAR, Singapore, Singapore
Hongyuan Zhu & Joey Tianyi Zhou
Institute of High Performance Computing (IHPC), A*STAR, Singapore, Singapore
Joey Tianyi Zhou
The Centre for Advanced Technologies in Online Safety, Singapore, Singapore
Joey Tianyi Zhou
The Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Masashi Sugiyama

Authors

Jialiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Niu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Joey Tianyi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Chen Gong
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shuo Chen or Chen Gong .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 879 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, J. et al. (2025). Direct Distillation Between Different Domains. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15138. Springer, Cham. https://doi.org/10.1007/978-3-031-72989-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-72989-8_9
Published: 26 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72988-1
Online ISBN: 978-3-031-72989-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics