Skip to main content

Region-Adaptive Transform with Segmentation Prior for Image Compression

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or Transformer-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposed module, Region-Adaptive Transform, applies adaptive convolutions on different regions guided by the masks. Additionally, we introduce a plug-and-play module named Scale Affine Layer to incorporate rich contexts from various regions. While there have been prior image compression efforts that involve segmentation masks as additional intermediate inputs, our approach differs significantly from them. Our advantages lie in that, to avoid extra bitrate overhead, we treat these masks as privilege information, which is accessible during the model training stage but not required during the inference phase. To the best of our knowledge, we are the first to employ class-agnostic masks as privilege information and achieve superior performance in pixel-fidelity metrics, such as Peak Signal to Noise Ratio (PSNR). The experimental results demonstrate our improvement compared to previously well-performing methods, with about 8.2% bitrate saving compared to VTM-17.0. The source code is available at https://github.com/GityuxiLiu/SegPIC-for-Image-Compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    “class-agnostic” means distinguishing different objects in an image regardless of what those objects are. Such masks can be regarded as a generalized representation of partition maps widely used in conventional codecs.

References

  1. Workshop and challenge on learned image compression (CLIC 2020) (2020). http://www.compression.cc

  2. Agustsson, E., Minnen, D., Toderici, G., Mentzer, F.: Multi-realism image compression with a conditional generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 22324–22333 (2023)

    Google Scholar 

  3. Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)

    Article  MathSciNet  Google Scholar 

  4. Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: ICASSP, pp. 2042–2046. IEEE (2019)

    Google Scholar 

  5. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (2017)

    Google Scholar 

  6. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (2018)

    Google Scholar 

  7. Bellard, F.: BPG image format (2014). https://bellard.org/bpg/

  8. Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. ITU SG16 Doc. VCEG-M33 (2001)

    Google Scholar 

  9. Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff: thing and stuff classes in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)

    Google Scholar 

  10. Chang, J., et al.: Layered conceptual image compression via deep semantic synthesis. In: IEEE International Conference on Image Processing, pp. 694–698 (2019)

    Google Scholar 

  11. Chang, J., Zhao, Z., Yang, L., Jia, C., Zhang, J., Ma, S.: Thousand to one: semantic prior modeling for conceptual coding. In: International Conference on Multimedia and Expo, pp. 1–6. IEEE (2021)

    Google Scholar 

  12. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)

    Google Scholar 

  13. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)

    Google Scholar 

  14. Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36(5), 961–1005 (1990)

    Article  MathSciNet  Google Scholar 

  15. Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

    Google Scholar 

  16. Feng, R., Gao, Y., Jin, X., Feng, R., Chen, Z.: Semantically structured image compression via irregular group-based decoupling. In: International Conference on Computer Vision, pp. 17237–17247 (2023)

    Google Scholar 

  17. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)

    Google Scholar 

  18. He, D., Yang, Z., Peng, W., Ma, R., Qin, H., Wang, Y.: ELIC: efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5718–5727 (2022)

    Google Scholar 

  19. He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14771–14780 (2021)

    Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  21. Hoang, T.M., Zhou, J., Fan, Y.: Image compression with encoder-decoder matched semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 160–161 (2020)

    Google Scholar 

  22. Jia, X., De Brabandere, B., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  23. Jiang, W., Yang, J., Zhai, Y., Ning, P., Gao, F., Wang, R.: MLIC: multi-reference entropy model for learned image compression. In: ACM International Conference on Multimedia, pp. 7618–7627 (2023)

    Google Scholar 

  24. (JVET), J.V.E.T.: Versatile video coding (2021). https://jvet.hhi.fraunhofer.de/

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  26. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (2014)

    Google Scholar 

  27. Kodak, E.: Kodak lossless true color image suite (PhotoCD PCD0992) (1993). http://r0k.us/graphics/kodak/

  28. Koyuncu, A.B., Gao, H., Boev, A., Gaikov, G., Alshina, E., Steinbach, E.: Contextformer: a transformer with spatio-channel attention for context modeling in learned image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13679, pp. 447–463. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19800-7_26

    Chapter  Google Scholar 

  29. Li, F., Zhang, L., Liu, Z., Lei, J., Li, Z.: Multi-frequency representation enhancement with privilege information for video super-resolution. In: International Conference on Computer Vision, pp. 12814–12825 (2023)

    Google Scholar 

  30. Liu, J., Lu, G., Hu, Z., Xu, D.: A unified end-to-end framework for efficient deep image compression. arXiv preprint arXiv:2002.03370 (2020)

  31. Liu, J., Sun, H., Katto, J.: Learned image compression with mixed transformer-CNN architectures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14388–14397 (2023)

    Google Scholar 

  32. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  33. Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  34. Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: IEEE International Conference on Image Processing, pp. 3339–3343. IEEE (2020)

    Google Scholar 

  35. Pan, G., Lu, G., Hu, Z., Xu, D.: Content adaptive latents and decoder for neural image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 556–573. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_32

    Chapter  Google Scholar 

  36. Qi, L., et al.: Open world entity segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8743–8756 (2022)

    Google Scholar 

  37. Shen, H., Zhao, Z.Q., Zhang, W.: Adaptive dynamic filtering network for image denoising. In: AAAI, vol. 37, pp. 2227–2235 (2023)

    Google Scholar 

  38. Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)

    Article  Google Scholar 

  39. Stark, H., Woods, J.W.: Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc. (1986)

    Google Scholar 

  40. Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. IEEE Trans. Circuit Syst. Video Technol. 31(9), 3631–3642 (2020)

    Article  Google Scholar 

  41. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  42. Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991)

    Article  Google Scholar 

  43. Wang, D., Yang, W., Hu, Y., Liu, J.: Neural data-dependent transform for learned image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17379–17388 (2022)

    Google Scholar 

  44. Wang, G.H., Li, J., Li, B., Lu, Y.: EVC: towards real-time neural image compression with mask decay. In: International Conference on Learning Representations (2023)

    Google Scholar 

  45. Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 606–615 (2018)

    Google Scholar 

  46. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)

    Google Scholar 

  47. Xu, Y.S., Tseng, S.Y.R., Tseng, Y., Kuo, H.K., Tsai, Y.M.: Unified dynamic convolutional network for super-resolution with variational degradations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12496–12505 (2020)

    Google Scholar 

  48. Zhu, Y., Yang, Y., Cohen, T.: Transformer-based transform coding. In: International Conference on Learning Representations (2021)

    Google Scholar 

  49. Zou, R., Song, C., Zhang, Z.: The devil is in the details: Window-based attention for image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17492–17501 (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China (No.2021ZD0112100), National NSF of China (No.62120106009), Guangdong Basic and Applied Basic Research Foundation (2024A1515010454), the Basic and Frontier Research Project of PCL, and the Major Key Project of PCL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao Zhao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 390 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Yang, W., Bai, H., Wei, Y., Zhao, Y. (2025). Region-Adaptive Transform with Segmentation Prior for Image Compression. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72952-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72951-5

  • Online ISBN: 978-3-031-72952-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics