Abstract
Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or Transformer-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposed module, Region-Adaptive Transform, applies adaptive convolutions on different regions guided by the masks. Additionally, we introduce a plug-and-play module named Scale Affine Layer to incorporate rich contexts from various regions. While there have been prior image compression efforts that involve segmentation masks as additional intermediate inputs, our approach differs significantly from them. Our advantages lie in that, to avoid extra bitrate overhead, we treat these masks as privilege information, which is accessible during the model training stage but not required during the inference phase. To the best of our knowledge, we are the first to employ class-agnostic masks as privilege information and achieve superior performance in pixel-fidelity metrics, such as Peak Signal to Noise Ratio (PSNR). The experimental results demonstrate our improvement compared to previously well-performing methods, with about 8.2% bitrate saving compared to VTM-17.0. The source code is available at https://github.com/GityuxiLiu/SegPIC-for-Image-Compression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
“class-agnostic” means distinguishing different objects in an image regardless of what those objects are. Such masks can be regarded as a generalized representation of partition maps widely used in conventional codecs.
References
Workshop and challenge on learned image compression (CLIC 2020) (2020). http://www.compression.cc
Agustsson, E., Minnen, D., Toderici, G., Mentzer, F.: Multi-realism image compression with a conditional generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 22324–22333 (2023)
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)
Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: ICASSP, pp. 2042–2046. IEEE (2019)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (2017)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (2018)
Bellard, F.: BPG image format (2014). https://bellard.org/bpg/
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. ITU SG16 Doc. VCEG-M33 (2001)
Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff: thing and stuff classes in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)
Chang, J., et al.: Layered conceptual image compression via deep semantic synthesis. In: IEEE International Conference on Image Processing, pp. 694–698 (2019)
Chang, J., Zhao, Z., Yang, L., Jia, C., Zhang, J., Ma, S.: Thousand to one: semantic prior modeling for conceptual coding. In: International Conference on Multimedia and Expo, pp. 1–6. IEEE (2021)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36(5), 961–1005 (1990)
Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Feng, R., Gao, Y., Jin, X., Feng, R., Chen, Z.: Semantically structured image compression via irregular group-based decoupling. In: International Conference on Computer Vision, pp. 17237–17247 (2023)
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
He, D., Yang, Z., Peng, W., Ma, R., Qin, H., Wang, Y.: ELIC: efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5718–5727 (2022)
He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14771–14780 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hoang, T.M., Zhou, J., Fan, Y.: Image compression with encoder-decoder matched semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 160–161 (2020)
Jia, X., De Brabandere, B., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Jiang, W., Yang, J., Zhai, Y., Ning, P., Gao, F., Wang, R.: MLIC: multi-reference entropy model for learned image compression. In: ACM International Conference on Multimedia, pp. 7618–7627 (2023)
(JVET), J.V.E.T.: Versatile video coding (2021). https://jvet.hhi.fraunhofer.de/
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (2014)
Kodak, E.: Kodak lossless true color image suite (PhotoCD PCD0992) (1993). http://r0k.us/graphics/kodak/
Koyuncu, A.B., Gao, H., Boev, A., Gaikov, G., Alshina, E., Steinbach, E.: Contextformer: a transformer with spatio-channel attention for context modeling in learned image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13679, pp. 447–463. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19800-7_26
Li, F., Zhang, L., Liu, Z., Lei, J., Li, Z.: Multi-frequency representation enhancement with privilege information for video super-resolution. In: International Conference on Computer Vision, pp. 12814–12825 (2023)
Liu, J., Lu, G., Hu, Z., Xu, D.: A unified end-to-end framework for efficient deep image compression. arXiv preprint arXiv:2002.03370 (2020)
Liu, J., Sun, H., Katto, J.: Learned image compression with mixed transformer-CNN architectures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14388–14397 (2023)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022 (2021)
Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: IEEE International Conference on Image Processing, pp. 3339–3343. IEEE (2020)
Pan, G., Lu, G., Hu, Z., Xu, D.: Content adaptive latents and decoder for neural image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 556–573. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_32
Qi, L., et al.: Open world entity segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8743–8756 (2022)
Shen, H., Zhao, Z.Q., Zhang, W.: Adaptive dynamic filtering network for image denoising. In: AAAI, vol. 37, pp. 2227–2235 (2023)
Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)
Stark, H., Woods, J.W.: Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc. (1986)
Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. IEEE Trans. Circuit Syst. Video Technol. 31(9), 3631–3642 (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991)
Wang, D., Yang, W., Hu, Y., Liu, J.: Neural data-dependent transform for learned image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17379–17388 (2022)
Wang, G.H., Li, J., Li, B., Lu, Y.: EVC: towards real-time neural image compression with mask decay. In: International Conference on Learning Representations (2023)
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 606–615 (2018)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Xu, Y.S., Tseng, S.Y.R., Tseng, Y., Kuo, H.K., Tsai, Y.M.: Unified dynamic convolutional network for super-resolution with variational degradations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12496–12505 (2020)
Zhu, Y., Yang, Y., Cohen, T.: Transformer-based transform coding. In: International Conference on Learning Representations (2021)
Zou, R., Song, C., Zhang, Z.: The devil is in the details: Window-based attention for image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17492–17501 (2022)
Acknowledgements
This work was supported in part by the National Key R&D Program of China (No.2021ZD0112100), National NSF of China (No.62120106009), Guangdong Basic and Applied Basic Research Foundation (2024A1515010454), the Basic and Frontier Research Project of PCL, and the Major Key Project of PCL.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Yang, W., Bai, H., Wei, Y., Zhao, Y. (2025). Region-Adaptive Transform with Segmentation Prior for Image Compression. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-72952-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72951-5
Online ISBN: 978-3-031-72952-2
eBook Packages: Computer ScienceComputer Science (R0)