Wavelet Knowledge Distillation via Decoupled Target for Scene Text Detection

Qu, Kefan; Lin, Jianmin; Li, Jinrong; Yang, Ming; He, Wangpeng

doi:10.1007/978-3-031-46311-2_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14357))

Included in the following conference series:

International Conference on Image and Graphics

674 Accesses

Abstract

In this paper, we investigate the knowledge distillation strategy for training a compact student model for scene text detection, using a cumbersome teacher model that is too computational to apply on resource-constrained devices. We observed that the frequency domain information of the response map is different between the teacher and student models obviously, which can effectively guide the student model to learn more effective knowledge. Furtherly, we propose a wavelet knowledge distillation method via decoupled target for training accurate compact scene text detection networks. Specifically, we first use discrete wavelet transformation to decompose the probability map into different frequency bands which contain different characteristic components, transferring knowledge in the high-frequency band and low-frequency band respectively. In addition, we decouple the target to enhance the distillation effect of the corresponding region, by separating text and background regions through the ground truth mask. Extensive experiments demonstrate that our method consistently improves the F-measure of the student model and outperforms the other mainstream distillation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Kernel-mask knowledge distillation for efficient and accurate arbitrary-shaped text detection

Article Open access 13 July 2023

Collaborative multi-knowledge distillation under the influence of softmax regression representation

Article 05 November 2024

Multi-scale Feature Extraction and Fusion for Online Knowledge Distillation

References

Buciluă, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)
Google Scholar
Chen, J., Lai, Y., Zeng, Y., Yang, F.: Knowledge distillation via entropy map for scene text detection. In: 2021 16th International Conference on Computer Science and Education (ICCSE), pp. 506–511. IEEE, (2021)
Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32 (2018)
Google Scholar
Du, Y., et al.: Pp-ocrv2: Bag of tricks for ultra lightweight OCR system. arXiv preprint arXiv:2109.03144 (2021)
Guo, J., et al.: Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2154–2164 (2021)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Komodakis, N., Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)
Google Scholar
Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6356–6364 (2017)
Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 31 (2017)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. Proc. AAAI Conf. Artif. Intell. 34, 11474–11481 (2020)
Google Scholar
Liu, Y., Shu, C., Wang, J.: Structured knowledge distillation for dense prediction. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Qin, X., et al.: Mask is all you need: Rethinking mask R-CNN for dense and arbitrary-shaped scene text detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 414–423, 2021
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C.,Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Google Scholar
Shu, C., Liu, Y., Gao, J., Yan, Z., Shen, C.: Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5311–5320 (2021)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Yang, Z., et al.: Focal and global knowledge distillation for detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4643–4652 (2022)
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Google Scholar
Ye, J., Chen, Z., Liu, J., Bo, D.: Textfusenet: scene text detection with richer fused features. In IJCAI 20, 516–522 (2020)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)
Google Scholar
Zhang, L., Chen, X., Tu, X., Wan, P., Xu, N., Ma, K.: Wavelet knowledge distillation: Towards efficient image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12464–12474 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Xidian University, Xi’an, China
Kefan Qu & Wangpeng He
CVTE Research, Guangzhou, China
Jianmin Lin, Jinrong Li & Ming Yang

Authors

Kefan Qu
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jinrong Li
View author publications
You can also search for this author in PubMed Google Scholar
Ming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wangpeng He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jianmin Lin or Wangpeng He .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, K., Lin, J., Li, J., Yang, M., He, W. (2023). Wavelet Knowledge Distillation via Decoupled Target for Scene Text Detection. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14357. Springer, Cham. https://doi.org/10.1007/978-3-031-46311-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-46311-2_13
Published: 29 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46310-5
Online ISBN: 978-3-031-46311-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Wavelet Knowledge Distillation via Decoupled Target for Scene Text Detection