ScoreCL: augmentation-adaptive contrastive learning via score-matching function

Kim, Jin-Young; Kwon, Soonwoo; Go, Hyojun; Lee, Yunsung; Choi, Seungtaek; Kim, Hyun-Gyoon

doi:10.1007/s10994-024-06707-8

ScoreCL: augmentation-adaptive contrastive learning via score-matching function

Published: 15 January 2025

Volume 114, article number 12, (2025)
Cite this article

Machine Learning Aims and scope Submit manuscript

Jin-Young Kim¹^na1,
Soonwoo Kwon²^na1,
Hyojun Go²^na1,
Yunsung Lee³,
Seungtaek Choi⁴ &
…
Hyun-Gyoon Kim⁵

123 Accesses
1 Altmetric
Explore all metrics

Abstract

Self-supervised contrastive learning (CL) has achieved state-of-the-art performance in representation learning by minimizing the distance between positive pairs while maximizing that of negative ones. Recently, it has been verified that the model learns better representation with diversely augmented positive pairs because they enable the model to be more view-invariant. However, only a few studies on CL have considered the difference between augmented views, and have not gone beyond the hand-crafted findings. In this paper, we first observe that the score-matching function can measure how much data has changed from the original through augmentation. With the observed property, every pair in CL can be weighted adaptively by the difference of score values, resulting in boosting the performance. We show the generality of our method, referred to as ScoreCL, by consistently improving various CL methods, SimCLR, SimSiam, W-MSE, and VICReg, up to 3%p in image classifcation on CIFAR and ImageNet datasets. Moreover, we have conducted exhaustive experiments and ablations, including results on diverse downstream tasks, comparison with possible baselines, and further applications when used with other augmentation methods. We hope our exploration will inspire more research in exploiting the score matching for CL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Similarity contrastive estimation for image and video soft contrastive self-supervised learning

Article Open access 26 September 2023

RegionCL: Exploring Contrastive Region Pairs for Self-supervised Representation Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availibility

No datasets were generated or analysed during the current study.

Notes

In this paper, we refer to CL as contrastive learning and related methods which model image similarity and dissimilarity (or only similarity) between two or more augmented image views, encompassing siamese networks or joint-embedding methods.
https://www.cs.toronto.edu/~kriz/cifar.html.
https://www.image-net.org/.
https://cs.stanford.edu/~acoates/stl10/.
https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/.
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/.
https://ai.stanford.edu/jkrause/cars/car_dataset.html.
https://www.robots.ox.ac.uk/vgg/data/fgvc-aircraft/.
https://www.robots.ox.ac.uk/vgg/data/dtd/.

References

Bai, Y., Yang, E., Wang, Z., Du, Y., Han, B., Deng, C., Wang, D., & Liu, T. (2022). RSA: reducing semantic shift from aggressive augmentations for self-supervised learning. Advances in Neural Information Processing Systems, 35, 21128–21141.
MATH Google Scholar
Bansal, A., Borgnia, E., Chu, H.-M., Li, J. S., Kazemi, H., Huang, F., Goldblum, M., Geiping, J., & Goldstein, T. (2022). Cold diffusion: Inverting arbitrary image transforms without noise. arXiv preprint arXiv:2208.09392
Bardes, A., Ponce, J., & LeCun, Y. (2021). Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906
Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101—Mining discriminative components with random forests. In European conference on computer vision (pp. 446–461). Springer.
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33, 9912–9924.
Google Scholar
Chen, X., & He, K. (2021). Exploring simple Siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15750–15758).
Chen, X., Fan, H., Girshick, R., & He, K. (2020b) Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020a). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597–1607). PMLR.
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Coates, A., Ng, A., & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 215–223). JMLR Workshop and Conference Proceedings.
Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 702–703).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International conference on learning representations. https://openreview.net/forum?id=YicbFdNTTy
Ermolov, A., Siarohin, A., Sangineto, E., & Sebe, N. (2021). Whitening for self-supervised representation learning. In International conference on machine learning (pp. 3015–3024). PMLR.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Gong, W., & Li, Y. (2021). Interpreting diffusion score matching using normalizing flow. arXiv preprint arXiv:2107.10072
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33, 21271–21284.
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020) Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729–9738).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
Henaff, O. (2020). Data-efficient image recognition with contrastive predictive coding. In International conference on machine learning (pp. 4182–4192). PMLR.
Huang, W., Yi, M., & Zhao, X. (2021). Towards the generalization of contrastive self-supervised learning. arXiv preprint arXiv:2111.00743
Hyvärinen, A. (2008). Optimal approximation of signal priors. Neural Computation, 20(12), 3087–3110.
Article MathSciNet MATH Google Scholar
Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Estimation of non-normalized statistical models. Natural Image Statistics, 39, 419–426.
Article MATH Google Scholar
Kadkhodaie, Z., & Simoncelli, E. (2021). Stochastic solutions for linear inverse problems using the prior implicit in a denoiser. Advances in Neural Information Processing Systems, 34, 13242–13254.
MATH Google Scholar
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems, 33, 18661–18673.
Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3D object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops (pp. 554–561).
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
Article MATH Google Scholar
Lee, K., & Shin, J. (2022). R$\backslash $’enyicl: Contrastive representation learning with skew r$\backslash $’enyi divergence. arXiv preprint arXiv:2208.06270
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Li, Y., Yang, M., Peng, D., Li, T., Huang, J., & Peng, X. (2022). Twin contrastive learning for online clustering. International Journal of Computer Vision, 130(9), 2205–2221.
Article MATH Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).
Mahmood, A., Oliva, J., & Styner, M. (2020). Multiscale score matching for out-of-distribution detection. arXiv preprint arXiv:2010.13132
Maji, S., Kannala, J., Rahtu, E., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. Technical report
Mo, S., Kang, H., Sohn, K., Li, C.-L., & Shin, J. (2021). Object-aware contrastive learning for debiased scene representation. Advances in Neural Information Processing Systems, 34, 12251–12264.
Google Scholar
Nilsback, M.-E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729). IEEE.
Oord, A.V.d., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748
Peng, X., Wang, K., Zhu, Z., Wang, M., & You, Y. (2022). Crafting better contrastive views for Siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16031–16040).
Purushwalkam, S., & Gupta, A. (2020). Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. Advances in Neural Information Processing Systems, 33, 3407–3418.
MATH Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28, 1137–1149.
MATH Google Scholar
Robinson, J., Chuang, C.-Y., Sra, S., & Jegelka, S. (2020). Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256–2265). PMLR.
Song, J., & Ermon, S. (2020). Multi-label contrastive predictive coding. Advances in Neural Information Processing Systems, 33, 8161–8173.
Google Scholar
Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems,32
Song, Y., Garg, S., Shi, J., & Ermon, S. (2020). Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in artificial intelligence (pp. 574–584). PMLR.
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456
Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139–1147). PMLR.
Tian, Y., Krishnan, D., & Isola, P. (2020b). Contrastive multiview coding. In European conference on computer vision (pp. 776–794). Springer.
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., & Isola, P. (2020). What makes for good views for contrastive learning? Advances in Neural Information Processing Systems, 33, 6827–6839.
Google Scholar
Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661–1674.
Article MathSciNet MATH Google Scholar
Wang, X., Fan, H., Tian, Y., Kihara, D., & Chen, X. (2022). On the importance of asymmetry for Siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16570–16579).
Wang, X., & Qi, G.-J. (2022). Contrastive learning with stronger augmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 5549–5560.
MATH Google Scholar
Wang, Z., Chen, Z., Li, Y., Guo, Y., Yu, J., Gong, M., & Liu, T. (2023). Mosaic representation learning for self-supervised visual pre-training. In The eleventh international conference on learning representations.
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., & Girshick, R. (2019). Detectron2. https://github.com/facebookresearch/detectron2
Xie, J., Zhan, X., Liu, Z., Ong, Y.-S., & Loy, C. C. (2022). Delving into inter-image invariance for unsupervised visual representations. International Journal of Computer Vision, 130(12), 2994–3013.
Article MATH Google Scholar
You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., & Hsieh, C.-J. (2019). Large batch optimization for deep learning: Training bert in 76 minutes. arXiv preprint arXiv:1904.00962
Zbontar, J., Jing, L., Misra, I., LeCun, Y., & Deny, S. (2021). Barlow twins: Self-supervised learning via redundancy reduction. In International conference on machine learning (pp. 12310–12320). PMLR.
Zhang, Q., & Chen, Y. (2021). Diffusion normalizing flow. Advances in Neural Information Processing Systems, 34, 16280–16291.
MATH Google Scholar
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).

Download references

Author information

Jin-Young Kim, Soonwoo Kwon and Hyojun Go have contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemoon-gu, Seoul, 03722, South Korea
Jin-Young Kim
Independent Researcher, Seoul, South Korea
Soonwoo Kwon & Hyojun Go
Wrtn, Teheran-ro 2, Gangnam-gu, Seoul, 06241, South Korea
Yunsung Lee
Yanolza, Teheran-ro 108gil 42, Gangnam-gu, Seoul, 06176, South Korea
Seungtaek Choi
Department of Financial Engineering, Ajou University, World cup-ro 206, Yeongtong-gu, Suwon, 16499, South Korea
Hyun-Gyoon Kim

Authors

Jin-Young Kim
View author publications
You can also search for this author inPubMed Google Scholar
Soonwoo Kwon
View author publications
You can also search for this author inPubMed Google Scholar
Hyojun Go
View author publications
You can also search for this author inPubMed Google Scholar
Yunsung Lee
View author publications
You can also search for this author inPubMed Google Scholar
Seungtaek Choi
View author publications
You can also search for this author inPubMed Google Scholar
Hyun-Gyoon Kim
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

JY Kim, S Kwon, and H Go contributed equally to this work. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Jin-Young Kim or Hyun-Gyoon Kim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Editor: Mingming Gong.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kim, JY., Kwon, S., Go, H. et al. ScoreCL: augmentation-adaptive contrastive learning via score-matching function. Mach Learn 114, 12 (2025). https://doi.org/10.1007/s10994-024-06707-8

Download citation

Received: 11 May 2024
Revised: 08 November 2024
Accepted: 08 November 2024
Published: 15 January 2025
DOI: https://doi.org/10.1007/s10994-024-06707-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ScoreCL: augmentation-adaptive contrastive learning via score-matching function

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Similarity contrastive estimation for image and video soft contrastive self-supervised learning

RegionCL: Exploring Contrastive Region Pairs for Self-supervised Representation Learning

Explore related subjects

Data availibility

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now