Natural Language Integration for Multimodal Few-Shot Class-Incremental Learning: Image Classification Problem

Temniranrat, Pitchayagan; Kaothanthong, Natsuda; Marukatat, Sanparith

doi:10.1007/978-981-96-0026-7_12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15372))

Included in the following conference series:

Principle and Practice of Data and Knowledge Acquisition Workshop

96 Accesses

Abstract

Few-Shot Class-Incremental Learning (FSCIL) suffers two main problems, namely 1. the model optimization difficulty on few-shot samples of new classes, and 2. the catastrophic forgetting of formerly learned old classes caused by a limitation to reuse the samples of former training. To solve these problems in image classification task, we proposed improving the input feature of the classification layer by integrating a visual-semantic network for projecting an image feature onto a sentence-embeddings feature. The network applied a pre-trained language model and image-text descriptions to distill the multi-modal prior knowledge. The projected feature is called distilled-Word-Embeddings (dWE). We conducted experiments on a benchmark open dataset CUB-200-2011 to compare the effects of the three features: 1) image feature as a baseline, 2) sentence-embeddings feature taken from image descriptions, and 3) our proposed feature dWE. We found that utilizing multiple features outperformed a single feature of the same type. Compared with the baseline, using a combination of an image feature and dWE improved the average accuracy of all sessions from 48.73% to 49.94%. An average rate-of-change (ROC) of the classification accuracy per session was employed to evaluate the catastrophic forgetting. The ROC improved from baseline −6.15% to −3.16% with dWE and to −2.99% with the combination of an image feature and dWE. These can be considered as more than 40% of improvement from the baseline. Moreover, using the combination of an image feature and dWE gave higher ROC than most of the previous FSCIL techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Feature Generator for Few-Shot Learning

Using Sentences as Semantic Representations in Large Scale Zero-Shot Learning

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

Notes

1.
The Program Management Unit for Human Resources & Institutional Development, Research and Innovation (PMU-B) [grant number B04G640107].

References

Bosaghzadeh, A., Dornaika, F.: Incremental and dynamic graph construction with application to image classification. Expert Syst. Appl. 144, 113117 (2020). https://doi.org/10.1016/j.eswa.2019.113117, https://www.sciencedirect.com/science/article/pii/S0957417419308346
Chen, K., Lee, C.G.: Incremental few-shot learning via vector quantization in deep embedded space. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=3SV-ZePhnZM
Cheraghian, A., et al.: Synthesized feature based few-shot class-incremental learning on a mixture of subspaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8661–8670 (2021)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Dosovitskiy, A., et al.: An image is worth 16 $\times $16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy
Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3037–3046 (2017). https://doi.org/10.1109/ICCV.2017.328
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017). https://doi.org/10.1073/pnas.1611835114
Article MathSciNet Google Scholar
Tunstall, L., et al.: Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055 (2022)
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Google Scholar
Nawaz, S., Calefati, A., Caraffini, M., Landro, N., Gallo, I.: Are these birds similar: learning branched networks for fine-grained representations. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–5 (2019). https://doi.org/10.1109/IVCNZ48456.2019.8960960
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Google Scholar
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)
Google Scholar
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49–58 (2016). https://doi.org/10.1109/CVPR.2016.13
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1410
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Schwartz, E., et al.: Delta-encoder: an effective sample synthesis method for few-shot object recognition. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/1714726c817af50457d810aae9d27a2e-Paper.pdf
Shi, G., Chen, J., Zhang, W., Zhan, L.M., Wu, X.M.: Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 6747–6761. Curran Associates, Inc. (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). https://proceedings.mlr.press/v97/tan19a.html
Tao, X., Hong, X., Chang, X., Dong, S., Wei, X., Gong, Y.: Few-shot class-incremental learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR (2021). https://proceedings.mlr.press/v139/touvron21a.html
Vinyals, O., Blundell, C., Lillicrap, T., kavukcuoglu, k., Wierstra, D.: Matching networks for one shot learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2023)
Google Scholar
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: ICML’17, Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3987–3995. JMLR.org (2017)
Google Scholar
Zhang, C., Song, N., Lin, G., Zheng, Y., Pan, P., Xu, Y.: Few-shot incremental learning with continually evolved classifiers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar

Download references

Acknowledgements

The 1st author acknowledges the Faculty’s Quota Scholarship awarded by Sirindhorn International Institute of Technology, Thammasat University.

Author information

Authors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
Pitchayagan Temniranrat & Natsuda Kaothanthong
National Electronics and Computer Technology Center (NECTEC), Pathum Thani, Thailand
Sanparith Marukatat

Authors

Pitchayagan Temniranrat
View author publications
You can also search for this author in PubMed Google Scholar
Natsuda Kaothanthong
View author publications
You can also search for this author in PubMed Google Scholar
Sanparith Marukatat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natsuda Kaothanthong .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Shiqing Wu
Beijing University of Technology, Beijing, China
Xing Su
Nanjing University of Information Science and Technology, Nanjing, China
Xiaolong Xu
University of Tasmania, Hobart, SA, Australia
Byeong Ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Temniranrat, P., Kaothanthong, N., Marukatat, S. (2025). Natural Language Integration for Multimodal Few-Shot Class-Incremental Learning: Image Classification Problem. In: Wu, S., Su, X., Xu, X., Kang, B.H. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2024. Lecture Notes in Computer Science(), vol 15372. Springer, Singapore. https://doi.org/10.1007/978-981-96-0026-7_12

Download citation

DOI: https://doi.org/10.1007/978-981-96-0026-7_12
Published: 16 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0025-0
Online ISBN: 978-981-96-0026-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Language Integration for Multimodal Few-Shot Class-Incremental Learning: Image Classification Problem