CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis

Zhang, Yubo; Han, Shuang; Zhang, Zhongxin; Wang, Jianyang; Bi, Hongbo

doi:10.1007/s00371-022-02404-6

CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis

Original article
Published: 03 February 2022

Volume 39, pages 1283–1293, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yubo Zhang¹,
Shuang Han¹,
Zhongxin Zhang¹,
Jianyang Wang¹ &
…
Hongbo Bi ORCID: orcid.org/0000-0003-2442-330X¹

1244 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent years, generative adversarial networks have successfully synthesized images through text descriptions. However, there are still problems that the generated image cannot be deeply embedded in the text description semantics, the target object of the generated image is incomplete, and the texture structure of the target object is not rich enough. Consequently, we propose a network framework, cross-domain feature fusion generative adversarial network (CF-GAN), which includes two modules, feature fusion-enhanced response module (FFERM) and multi-branch residual module (MBRM), to fine-grain the generated images with the way of deep fusion. FFERM can integrate both the word-level vector features and image features deeply. MBRM is a relatively simple and innovative residual network structure instead of the traditional residual module to extract features fully. We conducted experiments on the CUB and COCO datasets, and the results reveal that the Inception Score has improved from 4.36 to 4.83 (increased by 10.78%) on the CUB dataset, compared with AttnGAN. Compared with DM-GAN, the Inception Score has increased from 30.49 to 31.13 (increased by 2.06%) on the COCO dataset. Extensive experiments and ablation studies demonstrate the proposed CF-GAN’s superiority compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

M-GAN: multiattribute learning and multimodal feature fusion-based generative adversarial network for text-to-image synthesis

Article 05 August 2024

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bishop, C.M., et al.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Chen, X., Qing, L., He, X., Luo, X., Xu, Y.: Ftgan: A fully-trained generative adversarial networks for text to face generation. arXiv preprint arXiv:1904.05729 (2019)
Dash, A., Gamboa, J.C.B., Ahmed, S., Liwicki, M., Afzal, M.Z.: Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412 (2017)
Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5706–5714 (2017)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, B., Qi, X., Lukasiewicz, T., Torr, P.H.: Controllable text-to-image generation. arXiv preprint arXiv:1909.07083 (2019)
Li, B., Qi, X., Lukasiewicz, T., Torr, P.H.: Manigan: text-guided image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7880–7889 (2020)
Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)
Li, Y., Gan, Z., Shen, Y., Liu, J., Cheng, Y., Wu, Y., Carin, L., Carlson, D., Gao, J.: Storygan: A sequential conditional gan for story visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6329–6338 (2019)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & play generative networks: conditional iterative generation of images in latent space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4467–4477 (2017)
Ni, J., Zhang, S., Zhou, Z., Hou, J., Gao, F.: Instance mask embedding and attribute-adaptive generative adversarial network for text-to-image synthesis. IEEE Access 8, 37697–37711 (2020)
Article Google Scholar
Qiao, T., Zhang, J., Xu, D., Tao, D.: Mirrorgan: Learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. arXiv preprint arXiv:1610.02454 (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press (2007)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. arXiv preprint arXiv:1606.03498 (2016)
Shi, C., Pun, C.M.: Adaptive multi-scale deep neural networks with perceptual loss for panchromatic and multispectral images classification. Inf. Sci. 490, 1–17 (2019)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tan, F., Feng, S., Ordonez, V.: Text2scene: generating compositional scenes from textual descriptions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6710–6719 (2019)
Tan, H., Liu, X., Li, X., Zhang, Y., Yin, B.: Semantics-enhanced adversarial nets for text-to-image synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10501–10510 (2019)
Tao, M., Tang, H., Wu, S., Sebe, N., Wu, F., Jing, X.Y.: Df-gan: deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865 (2020)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S-PLUS. Springer Science & Business Media (2013)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Yang, Y., Wang, L., Xie, D., Deng, C., Tao, D.: Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis. IEEE Trans. Image Process. 30, 2798–2809 (2021)
Article Google Scholar
Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Article Google Scholar
Zhang, Z., Xie, Y., Yang, L.: Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6199–6208 (2018)
Zhang, Z., Zhou, J., Yu, W., Jiang, N.: Drawgan: text to image synthesis with drawing generative adversarial networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4195–4199. IEEE (2021)
Zhu, M., Pan, P., Chen, W., Yang, Y.: Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)

Download references

Author information

Authors and Affiliations

School of Electrical Information Engineering, Northeast Petroleum University, Daqing, 163000, China
Yubo Zhang, Shuang Han, Zhongxin Zhang, Jianyang Wang & Hongbo Bi

Authors

Yubo Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Shuang Han
View author publications
You can also search for this author inPubMed Google Scholar
Zhongxin Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Jianyang Wang
View author publications
You can also search for this author inPubMed Google Scholar
Hongbo Bi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hongbo Bi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Han, S., Zhang, Z. et al. CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis. Vis Comput 39, 1283–1293 (2023). https://doi.org/10.1007/s00371-022-02404-6

Download citation

Accepted: 05 January 2022
Published: 03 February 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00371-022-02404-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

M-GAN: multiattribute learning and multimodal feature fusion-based generative adversarial network for text-to-image synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now