Skip to main content

Wavelet Knowledge Distillation via Decoupled Target for Scene Text Detection

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2023)

Abstract

In this paper, we investigate the knowledge distillation strategy for training a compact student model for scene text detection, using a cumbersome teacher model that is too computational to apply on resource-constrained devices. We observed that the frequency domain information of the response map is different between the teacher and student models obviously, which can effectively guide the student model to learn more effective knowledge. Furtherly, we propose a wavelet knowledge distillation method via decoupled target for training accurate compact scene text detection networks. Specifically, we first use discrete wavelet transformation to decompose the probability map into different frequency bands which contain different characteristic components, transferring knowledge in the high-frequency band and low-frequency band respectively. In addition, we decouple the target to enhance the distillation effect of the corresponding region, by separating text and background regions through the ground truth mask. Extensive experiments demonstrate that our method consistently improves the F-measure of the student model and outperforms the other mainstream distillation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Buciluă, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)

    Google Scholar 

  2. Chen, J., Lai, Y., Zeng, Y., Yang, F.: Knowledge distillation via entropy map for scene text detection. In: 2021 16th International Conference on Computer Science and Education (ICCSE), pp. 506–511. IEEE, (2021)

    Google Scholar 

  3. Ch’ng, C.K., Chan, C.S.: Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)

    Google Scholar 

  4. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32 (2018)

    Google Scholar 

  5. Du, Y., et al.: Pp-ocrv2: Bag of tricks for ultra lightweight OCR system. arXiv preprint arXiv:2109.03144 (2021)

  6. Guo, J., et al.: Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2154–2164 (2021)

    Google Scholar 

  7. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  8. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)

    Google Scholar 

  9. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  10. Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  11. Komodakis, N., Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)

    Google Scholar 

  12. Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6356–6364 (2017)

    Google Scholar 

  13. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 31 (2017)

    Google Scholar 

  14. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. Proc. AAAI Conf. Artif. Intell. 34, 11474–11481 (2020)

    Google Scholar 

  15. Liu, Y., Shu, C., Wang, J.: Structured knowledge distillation for dense prediction. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  16. Qin, X., et al.: Mask is all you need: Rethinking mask R-CNN for dense and arbitrary-shaped scene text detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 414–423, 2021

    Google Scholar 

  17. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C.,Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)

  18. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

    Google Scholar 

  19. Shu, C., Liu, Y., Gao, J., Yan, Z., Shen, C.: Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5311–5320 (2021)

    Google Scholar 

  20. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  21. Yang, Z., et al.: Focal and global knowledge distillation for detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4643–4652 (2022)

    Google Scholar 

  22. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)

    Google Scholar 

  23. Ye, J., Chen, Z., Liu, J., Bo, D.: Textfusenet: scene text detection with richer fused features. In IJCAI 20, 516–522 (2020)

    Google Scholar 

  24. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)

    Google Scholar 

  25. Zhang, L., Chen, X., Tu, X., Wan, P., Xu, N., Ma, K.: Wavelet knowledge distillation: Towards efficient image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12464–12474 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jianmin Lin or Wangpeng He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qu, K., Lin, J., Li, J., Yang, M., He, W. (2023). Wavelet Knowledge Distillation via Decoupled Target for Scene Text Detection. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14357. Springer, Cham. https://doi.org/10.1007/978-3-031-46311-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46311-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46310-5

  • Online ISBN: 978-3-031-46311-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics