Region-Adaptive Transform with Segmentation Prior for Image Compression

Liu, Yuxi; Yang, Wenhan; Bai, Huihui; Wei, Yunchao; Zhao, Yao

doi:10.1007/978-3-031-72952-2_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15104))

Included in the following conference series:

European Conference on Computer Vision

303 Accesses
1 Citations

Abstract

Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or Transformer-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposed module, Region-Adaptive Transform, applies adaptive convolutions on different regions guided by the masks. Additionally, we introduce a plug-and-play module named Scale Affine Layer to incorporate rich contexts from various regions. While there have been prior image compression efforts that involve segmentation masks as additional intermediate inputs, our approach differs significantly from them. Our advantages lie in that, to avoid extra bitrate overhead, we treat these masks as privilege information, which is accessible during the model training stage but not required during the inference phase. To the best of our knowledge, we are the first to employ class-agnostic masks as privilege information and achieve superior performance in pixel-fidelity metrics, such as Peak Signal to Noise Ratio (PSNR). The experimental results demonstrate our improvement compared to previously well-performing methods, with about 8.2% bitrate saving compared to VTM-17.0. The source code is available at https://github.com/GityuxiLiu/SegPIC-for-Image-Compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation

Image Segmentation Losses with Modules Expressing a Relationship Between Predictions

Towards Efficient Semantic Segmentation Compression via Meta Pruning

Notes

1.
“class-agnostic” means distinguishing different objects in an image regardless of what those objects are. Such masks can be regarded as a generalized representation of partition maps widely used in conventional codecs.

References

Workshop and challenge on learned image compression (CLIC 2020) (2020). http://www.compression.cc
Agustsson, E., Minnen, D., Toderici, G., Mentzer, F.: Multi-realism image compression with a conditional generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 22324–22333 (2023)
Google Scholar
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)
Article MathSciNet Google Scholar
Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: ICASSP, pp. 2042–2046. IEEE (2019)
Google Scholar
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (2017)
Google Scholar
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (2018)
Google Scholar
Bellard, F.: BPG image format (2014). https://bellard.org/bpg/
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. ITU SG16 Doc. VCEG-M33 (2001)
Google Scholar
Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff: thing and stuff classes in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)
Google Scholar
Chang, J., et al.: Layered conceptual image compression via deep semantic synthesis. In: IEEE International Conference on Image Processing, pp. 694–698 (2019)
Google Scholar
Chang, J., Zhao, Z., Yang, L., Jia, C., Zhang, J., Ma, S.: Thousand to one: semantic prior modeling for conceptual coding. In: International Conference on Multimedia and Expo, pp. 1–6. IEEE (2021)
Google Scholar
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36(5), 961–1005 (1990)
Article MathSciNet Google Scholar
Dosovitskiy, A., et al.: An image is worth $16 \times 16$ words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Google Scholar
Feng, R., Gao, Y., Jin, X., Feng, R., Chen, Z.: Semantically structured image compression via irregular group-based decoupling. In: International Conference on Computer Vision, pp. 17237–17247 (2023)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
Google Scholar
He, D., Yang, Z., Peng, W., Ma, R., Qin, H., Wang, Y.: ELIC: efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5718–5727 (2022)
Google Scholar
He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14771–14780 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hoang, T.M., Zhou, J., Fan, Y.: Image compression with encoder-decoder matched semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 160–161 (2020)
Google Scholar
Jia, X., De Brabandere, B., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Jiang, W., Yang, J., Zhai, Y., Ning, P., Gao, F., Wang, R.: MLIC: multi-reference entropy model for learned image compression. In: ACM International Conference on Multimedia, pp. 7618–7627 (2023)
Google Scholar
(JVET), J.V.E.T.: Versatile video coding (2021). https://jvet.hhi.fraunhofer.de/
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (2014)
Google Scholar
Kodak, E.: Kodak lossless true color image suite (PhotoCD PCD0992) (1993). http://r0k.us/graphics/kodak/
Koyuncu, A.B., Gao, H., Boev, A., Gaikov, G., Alshina, E., Steinbach, E.: Contextformer: a transformer with spatio-channel attention for context modeling in learned image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13679, pp. 447–463. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19800-7_26
Chapter Google Scholar
Li, F., Zhang, L., Liu, Z., Lei, J., Li, Z.: Multi-frequency representation enhancement with privilege information for video super-resolution. In: International Conference on Computer Vision, pp. 12814–12825 (2023)
Google Scholar
Liu, J., Lu, G., Hu, Z., Xu, D.: A unified end-to-end framework for efficient deep image compression. arXiv preprint arXiv:2002.03370 (2020)
Liu, J., Sun, H., Katto, J.: Learned image compression with mixed transformer-CNN architectures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14388–14397 (2023)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: IEEE International Conference on Image Processing, pp. 3339–3343. IEEE (2020)
Google Scholar
Pan, G., Lu, G., Hu, Z., Xu, D.: Content adaptive latents and decoder for neural image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 556–573. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_32
Chapter Google Scholar
Qi, L., et al.: Open world entity segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8743–8756 (2022)
Google Scholar
Shen, H., Zhao, Z.Q., Zhang, W.: Adaptive dynamic filtering network for image denoising. In: AAAI, vol. 37, pp. 2227–2235 (2023)
Google Scholar
Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)
Article Google Scholar
Stark, H., Woods, J.W.: Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc. (1986)
Google Scholar
Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. IEEE Trans. Circuit Syst. Video Technol. 31(9), 3631–3642 (2020)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991)
Article Google Scholar
Wang, D., Yang, W., Hu, Y., Liu, J.: Neural data-dependent transform for learned image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17379–17388 (2022)
Google Scholar
Wang, G.H., Li, J., Li, B., Lu, Y.: EVC: towards real-time neural image compression with mask decay. In: International Conference on Learning Representations (2023)
Google Scholar
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 606–615 (2018)
Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Google Scholar
Xu, Y.S., Tseng, S.Y.R., Tseng, Y., Kuo, H.K., Tsai, Y.M.: Unified dynamic convolutional network for super-resolution with variational degradations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12496–12505 (2020)
Google Scholar
Zhu, Y., Yang, Y., Cohen, T.: Transformer-based transform coding. In: International Conference on Learning Representations (2021)
Google Scholar
Zou, R., Song, C., Zhang, Z.: The devil is in the details: Window-based attention for image compression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17492–17501 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China (No.2021ZD0112100), National NSF of China (No.62120106009), Guangdong Basic and Applied Basic Research Foundation (2024A1515010454), the Basic and Frontier Research Project of PCL, and the Major Key Project of PCL.

Author information

Authors and Affiliations

Institute of Information Science, Beijing Jiaotong University, Beijing, China
Yuxi Liu, Huihui Bai, Yunchao Wei & Yao Zhao
Visual Intelligence + X International Joint Laboratory of the Ministry of Education, Beijing, China
Yuxi Liu, Huihui Bai, Yunchao Wei & Yao Zhao
Pengcheng Laboratory, Shenzhen, China
Wenhan Yang, Yunchao Wei & Yao Zhao

Authors

Yuxi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Huihui Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yunchao Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yao Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao Zhao .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 390 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Yang, W., Bai, H., Wei, Y., Zhao, Y. (2025). Region-Adaptive Transform with Segmentation Prior for Image Compression. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-72952-2_11
Published: 01 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72951-5
Online ISBN: 978-3-031-72952-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Region-Adaptive Transform with Segmentation Prior for Image Compression