Skip to main content

Natural Language Integration for Multimodal Few-Shot Class-Incremental Learning: Image Classification Problem

  • Conference paper
  • First Online:
Knowledge Management and Acquisition for Intelligent Systems (PKAW 2024)

Abstract

Few-Shot Class-Incremental Learning (FSCIL) suffers two main problems, namely 1. the model optimization difficulty on few-shot samples of new classes, and 2. the catastrophic forgetting of formerly learned old classes caused by a limitation to reuse the samples of former training. To solve these problems in image classification task, we proposed improving the input feature of the classification layer by integrating a visual-semantic network for projecting an image feature onto a sentence-embeddings feature. The network applied a pre-trained language model and image-text descriptions to distill the multi-modal prior knowledge. The projected feature is called distilled-Word-Embeddings (dWE). We conducted experiments on a benchmark open dataset CUB-200-2011 to compare the effects of the three features: 1) image feature as a baseline, 2) sentence-embeddings feature taken from image descriptions, and 3) our proposed feature dWE. We found that utilizing multiple features outperformed a single feature of the same type. Compared with the baseline, using a combination of an image feature and dWE improved the average accuracy of all sessions from 48.73% to 49.94%. An average rate-of-change (ROC) of the classification accuracy per session was employed to evaluate the catastrophic forgetting. The ROC improved from baseline −6.15% to −3.16% with dWE and to −2.99% with the combination of an image feature and dWE. These can be considered as more than 40% of improvement from the baseline. Moreover, using the combination of an image feature and dWE gave higher ROC than most of the previous FSCIL techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The Program Management Unit for Human Resources & Institutional Development, Research and Innovation (PMU-B) [grant number B04G640107].

References

  1. Bosaghzadeh, A., Dornaika, F.: Incremental and dynamic graph construction with application to image classification. Expert Syst. Appl. 144, 113117 (2020). https://doi.org/10.1016/j.eswa.2019.113117, https://www.sciencedirect.com/science/article/pii/S0957417419308346

  2. Chen, K., Lee, C.G.: Incremental few-shot learning via vector quantization in deep embedded space. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=3SV-ZePhnZM

  3. Cheraghian, A., et al.: Synthesized feature based few-shot class-incremental learning on a mixture of subspaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8661–8670 (2021)

    Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  5. Dosovitskiy, A., et al.: An image is worth 16 \(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy

  6. Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3037–3046 (2017). https://doi.org/10.1109/ICCV.2017.328

  7. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  9. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017). https://doi.org/10.1073/pnas.1611835114

    Article  MathSciNet  Google Scholar 

  10. Tunstall, L., et al.: Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055 (2022)

  11. Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)

    Google Scholar 

  12. Nawaz, S., Calefati, A., Caraffini, M., Landro, N., Gallo, I.: Are these birds similar: learning branched networks for fine-grained representations. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–5 (2019). https://doi.org/10.1109/IVCNZ48456.2019.8960960

  13. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing (2008)

    Google Scholar 

  14. Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)

    Google Scholar 

  15. Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49–58 (2016). https://doi.org/10.1109/CVPR.2016.13

  16. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1410

  17. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  18. Schwartz, E., et al.: Delta-encoder: an effective sample synthesis method for few-shot object recognition. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/1714726c817af50457d810aae9d27a2e-Paper.pdf

  19. Shi, G., Chen, J., Zhang, W., Zhan, L.M., Wu, X.M.: Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 6747–6761. Curran Associates, Inc. (2021)

    Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  21. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  22. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). https://proceedings.mlr.press/v97/tan19a.html

  23. Tao, X., Hong, X., Chang, X., Dong, S., Wei, X., Gong, Y.: Few-shot class-incremental learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  24. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR (2021). https://proceedings.mlr.press/v139/touvron21a.html

  25. Vinyals, O., Blundell, C., Lillicrap, T., kavukcuoglu, k., Wierstra, D.: Matching networks for one shot learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf

  26. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2023)

    Google Scholar 

  27. Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: ICML’17, Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3987–3995. JMLR.org (2017)

    Google Scholar 

  28. Zhang, C., Song, N., Lin, G., Zheng, Y., Pan, P., Xu, Y.: Few-shot incremental learning with continually evolved classifiers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

Download references

Acknowledgements

The 1st author acknowledges the Faculty’s Quota Scholarship awarded by Sirindhorn International Institute of Technology, Thammasat University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natsuda Kaothanthong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Temniranrat, P., Kaothanthong, N., Marukatat, S. (2025). Natural Language Integration for Multimodal Few-Shot Class-Incremental Learning: Image Classification Problem. In: Wu, S., Su, X., Xu, X., Kang, B.H. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2024. Lecture Notes in Computer Science(), vol 15372. Springer, Singapore. https://doi.org/10.1007/978-981-96-0026-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0026-7_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0025-0

  • Online ISBN: 978-981-96-0026-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics