Abstract
Facial expression recognition (FER) suffers from insufficient label information, as human expressions are complex and diverse, with many expressions ambiguous. Using low-quality labels or low-quantity labels will aggravate ambiguity of model predictions and reduce the accuracy of FER. How to improve the robustness of FER to ambiguous data with insufficient information remains challenging. To this end, we propose the Suppressing Ambiguity Self-Training (SAST) framework which is the first attempt to address the problem of insufficient information both label quality and label quantity containing, simultaneously. Specifically, we design an Ambiguous Relative Label Usage (ARLU) strategy that mixes hard labels and soft labels to alleviate the information loss problem caused by hard labels. We also enhance the robustness of the model to ambiguous data by means of Self-Training Resampling (STR). We further use the landmarks and Patch Branch (PB) to enhance the ability of suppressing ambiguity. Experiments on RAF-DB, FERPlus, SFEW, and AffectNet datasets show that our SAST outperforms 6 semi-supervised methods with fewer annotations, and achieves competitive accuracy to State-Of-The-Art (SOTA) FER methods. Our code is available at https://github.com/Liuxww/SAST.







Similar content being viewed by others
Data Availibility Statements
Data openly available in a public repository. The data that support the findings of this study are openly available at: •RAF-DB: http://whdeng.cn/RAF/model1.html/data-set. •SFEW: https://cs.anu.edu.au/few/emotiw2015.html. •FERPlus: https://github.com/microsoft/FERPlus. •AffectNet: http://mohammadmahoor.com/affectnet/.
References
Wu J, Yuan T, Zeng J, Gou F (2023) A medically assisted model for precise segmentation of osteosarcoma nuclei on pathological images. IEEE J Biomed Health Inform
Guan P, Yu K, Wei W, Tan Y, Wu J (2023) Big data analytics on lung cancer diagnosis framework with deep learning. IEEE/ACM Trans Comput Biol Bioinform
Wu J, Xiao P, Huang H, Gou F, Zhou Z, Dai Z (2022) An artificial intelligence multiprocessing scheme for the diagnosis of osteosarcoma mri images. IEEE J Biomed Health Inform 26(9):4656–4667
Wu J, Guo Y, Gou F, Dai Z (2022) A medical assistant segmentation method for MRI images of osteosarcoma based on DecoupleSegNet. Int J Intell Syst 37(11):8436–8461
Gou F, Wu J (2022) An attention–based ai–assisted segmentation system for osteosarcoma mri images. In: 2022 IEEE International conference on bioinformatics and biomedicine (BIBM), IEEE, pp 1539–1543
Liang X, Xu L, Zhang W, Zhang Y, Liu J, Liu Z (2023) A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis Comput 39(2277–229039):2637–2652
Zhang F, Zhang T, Mao Q, Xu C (2018) Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3359–3368
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6897–6906
She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6248–6257
Chen S, Wang J, Chen Y, Shi Z, Geng X, Rui Y (2020) Label distribution learning on auxiliary label space graphs for facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13984–13993
Zhang Y, Wang C, Deng W (2021) Relative uncertainty learning for facial expression recognition. Adv Neural Inf Process Syst 34:17616–17627
Y. Chen, J. Joo, Understanding and mitigating annotation bias in facial expression recognition. in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp 14980–14991
Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7660–7669
Xie S, Hu H, Wu Y (2019) Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn 92:177–191
Jiang P, Wan B, Wang Q, Wu J (2020) Fast and efficient facial expression recognition using a gabor convolutional network. IEEE Signal Process Lett 27:1954–1958
Zhou L, Fan X, Ma Y, Tjahjadi T, Ye Q (2020) Uncertainty-aware cross-dataset facial expression recognition via regularized conditional alignment. In: Proceedings of the 28th ACM international conference on multimedia, MM ’20, Association for Computing Machinery, New York, USA, pp 2964–2972
Wang L, Jia G, Jiang N, Wu H, Yang J (2022) Ease: robust facial expression recognition via emotion ambiguity-sensitive cooperative networks. in: Proceedings of the 30th ACM international conference on multimedia, MM ’22, Association for Computing Machinery, New York, USA, pp 218–227
Li H, Wang N, Yang X, Wang X, Gao X (2022) Towards semi-supervised deep facial expression recognition with an adaptive confidence margin. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4156–4165
Florea C, Badea M, Florea L, Racoviteanu A, Vertan C (2020) Margin-mix: semi-supervised learning for face expression recognition. In: Computer vision – ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII, Springer-Verlag, Berlin, Heidelberg, pp 1–17
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel C (2019) Mixmatch: a holistic approach to semi-supervised learning. arXiv:1905.02249
Sohn K, Berthelot D, Li C–L, Zhang Z, Carlini N, Cubuk ED, Kurakin A, Zhang H, Raffel C (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence, arXiv:2001.07685
Berthelot D, Carlini N, Cubuk ED, Kurakin A, Sohn K, Zhang H, Raffel C (2019) Remixmatch: semi-supervised learning with distribution alignment and augmentation anchoring, arXiv:1911.09785
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Mohan K, Seal A, Krejcar O, Yazidi A (2021) Fer-net: facial expression recognition using deep neural net. Neural Comput Appl 33(15):9125–9136
Mohan K, Seal A, Krejcar O, Yazidi A (2021) Facial expression recognition using local gravitational force descriptor-based deep convolution neural networks. IEEE Trans Instrum Meas 70:1–12
Wang C, Wang S, Liang G (2019) Identity- and pose-robust facial expression recognition through adversarial feature learning. In: Proceedings of the 27th ACM international conference on multimedia, MM ’19, Association for Computing Machinery, New York, USA, pp 238–246
Zou W, Zhang D, Lee D-J (2022) A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl Intell 52(3):2918–2929
Liu C, Liu X, Chen C, Wang Q (2023) Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition. Vis Comput 39:2637–2652
Liu S, Xu Y, Wan T, Kui X (2023) A dual-branch adaptive distribution fusion framework for real-world facial expression recognition. ICASSP 2023–2023 IEEE International conference on acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 1–5
Lee DH (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks
Xie Q, Dai Z, Hovy EH, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. In: Neural information processing systems
Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 702–703
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. in: European conference on computer vision. Springer, pp 499–515
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality–preserving learning for expression recognition in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2584–2593
Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM international conference on multimodal interaction, pp 279–283
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 2106–2112
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zeng D, Lin Z, Yan X, Liu Y, Wang F, Tang B (2022) Face2exp: combating data biases for facial expression recognition. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 20259–20268
Liu H, Cai H, Lin Q, Li X, Xiao H (2022) Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Trans Circ Syst Video Technol 32(9):6253–6266
Lo L, Xie HX, Shuai H–H, Cheng W–H (2021) Facial chirality: using self-face reflection to learn discriminative features for facial expression recognition. In: 2021 IEEE International conference on multimedia and expo (ICME), IEEE, pp 1–6
Ma F, Sun B, Li S (2021) Facial expression recognition with visual transformers and attentional selective fusion. IEEE Transactions on Affective Computing
Liu C, Hirota K, Dai Y (2022) Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf Sci
Guo Y, Huang J, Xiong M, Wang Z, Hu X, Wang J, Hijji M (2022) Facial expressions recognition with multi-region divided attention networks for smart education cloud applications. Neurocomputing 493:119–128
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Sadeghi H, Raie A-A (2022) Histnet: histogram-based convolutional neural network with chi-squared deep metric learning for facial expression recognition. Inf Sci 608:472–488
Han J, Du L, Ye X, Zhang L, Feng J (2022) The devil is in the face: exploiting harmonious representations for facial expression recognition. Neurocomputing 486:104–113
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62071384 and 62371399, the Key Research and Development Project of Shaanxi Province under Grant 2023-YBGY-239, and Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2023-JC-YB-531.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, Z., Wei, B., Liu, X. et al. SAST: a suppressing ambiguity self-training framework for facial expression recognition. Multimed Tools Appl 83, 56059–56076 (2024). https://doi.org/10.1007/s11042-023-17749-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17749-w