Skip to main content

Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

  • 2061 Accesses

Abstract

An accurate and substantial dataset is essential for training a reliable and well-performing model. However, even manually annotated datasets contain label errors, not to mention automatically labeled ones. Previous methods for label denoising have primarily focused on detecting outliers and their permanent removal – a process that is likely to over- or underfilter the dataset. In this work, we propose AGRA: a new method for learning with noisy labels by using Adaptive GRAdient-based outlier removal (We share our code at: https://github.com/anasedova/AGRA.) Instead of cleaning the dataset prior to model training, the dataset is dynamically adjusted during the training process. By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model at this point or is counter-productive and should be left out for the current update. Extensive evaluation on several datasets demonstrates AGRA’s effectiveness, while a comprehensive results analysis supports our initial hypothesis: permanent hard outlier removal is not always what model benefits the most from.

A. Sedova and L. Zellinger—Equal contribution

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The subscript \(x_t\) is omitted in the short-hand notation \(sim_{y_t}\) for brevity.

  2. 2.

    The appendix is available by the following link: https://github.com/anasedova/AGRA/raw/main/appendix.pdf.

  3. 3.

    The reports are not publicly accessible; only the noisy labels are available for the training data. The gold labels are not provided.

  4. 4.

    The appendix is available by the following link: https://github.com/anasedova/AGRA/raw/main/appendix.pdf.

  5. 5.

    AGRA can also be used with any PyTorch-compatible deep model as our method has no model-related limitations.

  6. 6.

    The appendix is available by the following link: https://github.com/anasedova/AGRA/raw/main/appendix.pdf.

  7. 7.

    The AUROC was computed on the nine classes which have more than one positive observation in the test set.

  8. 8.

    The weak supervision baselines cannot be run on CIFAR since it is a non-weakly supervised dataset; they also cannot be run on CheXpert as we do not have access to the labeling function matches. Furthermore, CORES\(^2\) is not applicable for CheXpert as it does not support multi-label settings.

  9. 9.

    The appendix is available by the following link: https://github.com/anasedova/AGRA/raw/main/appendix.pdf.

References

  1. Al-Zoubi, M.B.: An effective clustering-based approach for outlier detection. Eur. J. Sci. Res. 28(2), 310–316 (2009)

    Google Scholar 

  2. Alberto, T.C., Lochter, J.V., Almeida, T.A.: Tubespam: comment spam filtering on youtube. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 138–143 (2015)

    Google Scholar 

  3. Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of sms spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262 (2011)

    Google Scholar 

  4. Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Unsupervised label noise modeling and loss correction. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (2019)

    Google Scholar 

  5. Awasthi, A., Ghosh, S., Goyal, R., Sarawagi, S.: Learning from rules generalizing labeled exemplars. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April (2020)

    Google Scholar 

  6. Bai, M., Wang, X., Xin, J., Wang, G.: An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181, 19–28 (2016)

    Article  Google Scholar 

  7. Bénédict, G., Koops, H.V., Odijk, D., de Rijke, M.: Sigmoidf1: a smooth f1 score surrogate loss for multilabel classification. Trans. Mach. Learn. Res. (2022)

    Google Scholar 

  8. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  9. Chen, P., Liao, B.B., Chen, G., Zhang, S.: Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, pp. 1062–1070 (2019)

    Google Scholar 

  10. Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., Liu, Y.: Learning with instance-dependent label noise: a sample sieve approach. arXiv preprint arXiv:2010.02347 (2020)

  11. Elahi, M., Li, K., Nisar, W., Lv, X., Wang, H.: Efficient clustering-based outlier detection algorithm for dynamic data stream. In: 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5. IEEE (2008)

    Google Scholar 

  12. Fang, Z., Kong, S., Wang, Z., Fowlkes, C.C., Yang, Y.: Weak supervision and referring attention for temporal-textual association learning. CoRR abs/ arXiv: 2006.11747 (2020)

  13. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)

    Article  MATH  Google Scholar 

  14. Fu, D., Chen, M., Sala, F., Hooper, S., Fatahalian, K., Re, C.: Fast and three-rious: speeding up weak supervision with triplet methods. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning, 13–18 Jul, vol. 119, pp. 3280–3291 (2020)

    Google Scholar 

  15. Ghoting, A., Parthasarathy, S., Otey, M.E.: Fast mining of distance-based outliers in high-dimensional datasets. In: Data Mining and Knowledge Discovery, vol. 16 (2008)

    Google Scholar 

  16. Giacomello, E., Lanzi, P.L., Loiacono, D., Nassano, L.: Image embedding and model ensembling for automated chest x-ray interpretation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778 (06 2016)

    Google Scholar 

  18. Hedderich, M.A., Adelani, D.I., Zhu, D., Alabi, J.O., Markus, U., Klakow, D.: Transfer learning and distant supervision for multilingual transformer models: A study on african languages. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020, pp. 2580–2591 (2020)

    Google Scholar 

  19. Hedderich, M.A., Lange, L., Klakow, D.: ANEA: distant supervision for low-resource named entity recognition. arXiv: 2102.13129 (2021)

  20. Huang, J., Qu, L., Jia, R., Zhao, B.: O2u-net: A simple noisy label detection approach for deep neural networks. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October - 2 November 2019, pp. 3325–3333 (2019)

    Google Scholar 

  21. Irvin, J., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 590–597 (2019)

    Google Scholar 

  22. Karamanolakis, G., Mukherjee, S., Zheng, G., Awadallah, A.H.: Self-training with weak supervision. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021, pp. 845–863 (2021)

    Google Scholar 

  23. Knox, E.M., Ng, R.T.: Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases, pp. 392–403. Citeseer (1998)

    Google Scholar 

  24. Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  25. Li, J., et al.: Hybrid supervision learning for pathology whole slide image classification. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 309–318. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_30

    Chapter  Google Scholar 

  26. Li, J., Socher, R., Hoi, S.C.: Dividemix: Learning with noisy labels as semi-supervised learning. In: ICLR (2020)

    Google Scholar 

  27. Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Learning to learn from noisy labeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5051–5059 (2019)

    Google Scholar 

  28. Li, X., Roth, D.: Learning question classifiers. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

    Google Scholar 

  29. Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, L.J.: Learning from noisy labels with distillation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1928–1936 (2017)

    Google Scholar 

  30. Lipton, Z.C., Wang, Y., Smola, A.J.: Detecting and correcting for label shift with black box predictors. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, vol. 80 (2018)

    Google Scholar 

  31. Liu, Z., et al.: Learning not to learn in the presence of noisy labels. CoRR abs/ arXiv: 2002.06541 (2020)

  32. Northcutt, C., Jiang, L., Chuang, I.: Confident learning: estimating uncertainty in dataset labels. J. Artifi. Intell. Res. 70, 1373–1411 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  33. Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. VLDB J. (2) (2020)

    Google Scholar 

  34. Ratner, A., Hancock, B., Dunnmon, J., Sala, F., Pandey, S., Ré, C.: Training complex models with multi-task weak supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 4763–4771 (July 2019)

    Google Scholar 

  35. Ratner, A.J., De Sa, C.M., Wu, S., Selsam, D., Ré, C.: Data programming: creating large training sets, quickly. In: Advances in Neural Information Processing Systems (2016)

    Google Scholar 

  36. Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13(16) (2012)

    Google Scholar 

  37. Ren, W., Li, Y., Su, H., Kartchner, D., Mitchell, C., Zhang, C.: Denoising multi-source weak supervision for neural text classification. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020, vol. EMNLP 2020, pp. 3739–3754 (2020)

    Google Scholar 

  38. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  39. Shi, Y., et al.: Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937 (2021)

  40. Stephan, A., Kougia, V., Roth, B.: SepLL: separating latent class labels from weak supervision noise. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)

    Google Scholar 

  41. Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R.: Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080 (2014)

  42. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)

    Google Scholar 

  43. Tratz, S., Hovy, E.: A taxonomy, dataset, and classifier for automatic noun compound interpretation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 678–687 (Jul 2010)

    Google Scholar 

  44. Wang, H., Bah, M.J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 107964–108000 (2019)

    Article  Google Scholar 

  45. Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J., Bailey, J.: Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 322–330 (2019)

    Google Scholar 

  46. Wang, Z., Shang, J., Liu, L., Lu, L., Liu, J., Han, J.: Crossweigh: training named entity tagger from imperfect annotations. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 (2019)

    Google Scholar 

  47. Wei, J.: Label noise reduction without assumptions. Dartmouth College Undergraduate Theses, vol. 164 (2020)

    Google Scholar 

  48. Zhang, J., et al.: WRENCH: a comprehensive benchmark for weak supervision. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2021)

    Google Scholar 

  49. Zhao, B., Mopuri, K.R., Bilen, H.: Dataset condensation with gradient matching. In: International Conference on Learning Representations (2021)

    Google Scholar 

Download references

Acknowledgement

This research has been funded by the Vienna Science and Technology Fund (WWTF)[10.47379/VRG19008] “Knowledge-infused Deep Learning for Natural Language Processing”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasiia Sedova .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

Our method can improve the model predictions and produce more useful results, but we cannot promise they are perfect, especially for life-critical domains like healthcare. Data used for training can have biases that machine learning methods may pick up, and one needs to be careful when using such models in actual applications. We relied on datasets that were already published and did not hire anyone to annotate them for our work.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sedova, A., Zellinger, L., Roth, B. (2023). Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43412-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43411-2

  • Online ISBN: 978-3-031-43412-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics