Abstract
Few-Shot Class-Incremental Learning (FSCIL) suffers two main problems, namely 1. the model optimization difficulty on few-shot samples of new classes, and 2. the catastrophic forgetting of formerly learned old classes caused by a limitation to reuse the samples of former training. To solve these problems in image classification task, we proposed improving the input feature of the classification layer by integrating a visual-semantic network for projecting an image feature onto a sentence-embeddings feature. The network applied a pre-trained language model and image-text descriptions to distill the multi-modal prior knowledge. The projected feature is called distilled-Word-Embeddings (dWE). We conducted experiments on a benchmark open dataset CUB-200-2011 to compare the effects of the three features: 1) image feature as a baseline, 2) sentence-embeddings feature taken from image descriptions, and 3) our proposed feature dWE. We found that utilizing multiple features outperformed a single feature of the same type. Compared with the baseline, using a combination of an image feature and dWE improved the average accuracy of all sessions from 48.73% to 49.94%. An average rate-of-change (ROC) of the classification accuracy per session was employed to evaluate the catastrophic forgetting. The ROC improved from baseline −6.15% to −3.16% with dWE and to −2.99% with the combination of an image feature and dWE. These can be considered as more than 40% of improvement from the baseline. Moreover, using the combination of an image feature and dWE gave higher ROC than most of the previous FSCIL techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Program Management Unit for Human Resources & Institutional Development, Research and Innovation (PMU-B) [grant number B04G640107].
References
Bosaghzadeh, A., Dornaika, F.: Incremental and dynamic graph construction with application to image classification. Expert Syst. Appl. 144, 113117 (2020). https://doi.org/10.1016/j.eswa.2019.113117, https://www.sciencedirect.com/science/article/pii/S0957417419308346
Chen, K., Lee, C.G.: Incremental few-shot learning via vector quantization in deep embedded space. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=3SV-ZePhnZM
Cheraghian, A., et al.: Synthesized feature based few-shot class-incremental learning on a mixture of subspaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8661–8670 (2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Dosovitskiy, A., et al.: An image is worth 16 \(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy
Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3037–3046 (2017). https://doi.org/10.1109/ICCV.2017.328
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017). https://doi.org/10.1073/pnas.1611835114
Tunstall, L., et al.: Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055 (2022)
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Nawaz, S., Calefati, A., Caraffini, M., Landro, N., Gallo, I.: Are these birds similar: learning branched networks for fine-grained representations. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–5 (2019). https://doi.org/10.1109/IVCNZ48456.2019.8960960
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49–58 (2016). https://doi.org/10.1109/CVPR.2016.13
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1410
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Schwartz, E., et al.: Delta-encoder: an effective sample synthesis method for few-shot object recognition. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/1714726c817af50457d810aae9d27a2e-Paper.pdf
Shi, G., Chen, J., Zhang, W., Zhan, L.M., Wu, X.M.: Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 6747–6761. Curran Associates, Inc. (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). https://proceedings.mlr.press/v97/tan19a.html
Tao, X., Hong, X., Chang, X., Dong, S., Wei, X., Gong, Y.: Few-shot class-incremental learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR (2021). https://proceedings.mlr.press/v139/touvron21a.html
Vinyals, O., Blundell, C., Lillicrap, T., kavukcuoglu, k., Wierstra, D.: Matching networks for one shot learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2023)
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: ICML’17, Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3987–3995. JMLR.org (2017)
Zhang, C., Song, N., Lin, G., Zheng, Y., Pan, P., Xu, Y.: Few-shot incremental learning with continually evolved classifiers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Acknowledgements
The 1st author acknowledges the Faculty’s Quota Scholarship awarded by Sirindhorn International Institute of Technology, Thammasat University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Temniranrat, P., Kaothanthong, N., Marukatat, S. (2025). Natural Language Integration for Multimodal Few-Shot Class-Incremental Learning: Image Classification Problem. In: Wu, S., Su, X., Xu, X., Kang, B.H. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2024. Lecture Notes in Computer Science(), vol 15372. Springer, Singapore. https://doi.org/10.1007/978-981-96-0026-7_12
Download citation
DOI: https://doi.org/10.1007/978-981-96-0026-7_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0025-0
Online ISBN: 978-981-96-0026-7
eBook Packages: Computer ScienceComputer Science (R0)