Exploring conditional pixel-independent generation in GAN inversion for image processing

Huang, Chunyao; Sun, Xiaomei; Tian, Zhiqiang; Du, Shaoyi; Zeng, Wei

doi:10.1007/s11042-024-18395-6

Exploring conditional pixel-independent generation in GAN inversion for image processing

1247: Recent Advances in AI-Powered Multimedia Visual Computing and Multimodal Signal Processing for Metaverse Era
Published: 15 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chunyao Huang^1,2,
Xiaomei Sun³,
Zhiqiang Tian³,
Shaoyi Du⁴ &
…
Wei Zeng ORCID: orcid.org/0000-0002-8353-8265²

73 Accesses
Explore all metrics

Abstract

Image processing holds an indispensable role in various facets of our daily lives, professional undertakings, and educational pursuits, encompassing a gamut of tasks including image reconstruction, inpainting, super-resolution, colorization, and editing. In recent years, the advent of advanced models rooted in Generative Adversarial Networks (GANs) has showcased remarkable capabilities in the domain of image synthesis, catapulting the direct application of these cutting-edge models to image processing to the forefront of contemporary research. Within this context, GAN inversion, an emerging paradigm, assumes a pivotal role in the landscape of image processing tasks. This paper delves into the realm of image inversion based on the latent space of GAN models. In response to the inherent limitations of current GAN inversion methods, we introduce three innovations. Firstly, we depart from the conventional use of convolutional networks for generator implementation in existing GAN inversion techniques. Our approach employs generators entirely composed of fully connected layers, marking a significant departure from spatial convolutions and information propagation across pixels. Secondly, we leverage the distinct characteristic of generators engaged in conditional independent pixel synthesis. This feature is enhanced by fusing feature maps spanning contiguous strata of a feature pyramid network during the feature extraction process. Lastly, our framework offers a high degree of versatility, extending its applicability beyond image reconstruction to domains like image inpainting, super-resolution, and image colorization. Empirical results, based on the CelebFaces Attribute-HQ (CelebA-HQ) dataset, unequivocally demonstrate that GAN inversion, built on the principle of conditional independent pixel synthesis, yields superior reconstruction outcomes. Furthermore, it proves amenable to a plethora of tasks, including image inpainting, super-resolution, and image colorization. These advances open new vistas in image processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Semantic Image Completion and Enhancement Using GANs

High-Resolution Image Inpainting Using Generative Adversarial Networks

OASIS: Only Adversarial Supervision for Semantic Image Synthesis

Article Open access 17 September 2022

Availability of data and materials

All the datasets used in this manuscript are publicly available datasets, already in the public domain.

References

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of GANs for improved quality, stability, and variation. In: International conference on learning representations. pp 1880–1900
Shi J, Liu W, Zhou G, Zhou Y (2023) AutoInfo GAN: toward a better image synthesis GAN framework for high-fidelity few-shot datasets via NAS and contrastive learning. Knowl-Based Syst 276:110757
Article Google Scholar
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4401–4410
Xu X, Chang J, Ding S (2022) Image style transfering based on StarGAN and class encoder. Int J Softw Inform 12(2):245–258
Article Google Scholar
Li S, Yuan Q, Zhang Y, Lv B, Wei F (2022) Image dehazing algorithm based on deep learning coupled local and global features. Appl Sci 12(17):8552
Article CAS Google Scholar
Liu S, Zhang Q, Huang L (2023) Edge computing-based generative adversarial network for photo design style transfer using conditional entropy distance. Comput Commun 210:174–182
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Article MathSciNet Google Scholar
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp 5907–5915
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. pp 2223–2232
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8789–8797
Choi Y, Uh Y, Yoo J, Ha JW (2020) Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8188–8197
Huh M, Zhang R, Zhu JY, Paris S, Hertzmann A (2020) Transforming and projecting images into class-conditional generative networks. In: European conference on computer vision. pp 17–34
Abdal R, Qin Y, Wonka P (2019) Image2stylegan: How to embed images into the stylegan latent space?. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 4432–4441
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8110–8119
Abdal R, Qin Y, Wonka P (2020) Image2stylegan++: How to edit the embedded images?. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 8296–8305
Tewari A, Elgharib M, Bernard F, Seidel HP, Pérez P, Zollhöfer M, Theobalt C (2020) Pie: Portrait image embedding for semantic control. ACM Trans Graph 39(6):1–14
Article Google Scholar
Zhao Z, Faghihroohi S, Yang J, Huang K, Navab N, Maier M, Nasseri MA (2023) Unobtrusive biometric data de-identification of fundus images using latent space disentanglement. Biomed Opt Express 14(10):5466–5483
Article PubMed PubMed Central Google Scholar
Guan S, Tai Y, Ni B, Zhu F, Huang F, Yang X (2020) Collaborative learning for faster stylegan embedding. arXiv:2007.01758
Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2021) Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2287–2296
Xu Y, Shen Y, Zhu J, Yang C, Zhou B (2021) Generative hierarchical features from synthesizing images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4432–4442
Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for stylegan image manipulation. ACM Trans Graph 40(4):1–14
Article Google Scholar
Liu H, Song Y, Chen Q (2023) Delving StyleGAN inversion for image editing: a foundation latent space viewpoint. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10072–10082
Zhu J, Shen Y, Zhao D, Zhou B (2020) In-domain gan inversion for real image editing. In: European conference on computer vision. pp 592–608
Gu J, Shen Y, Zhou B (2020) Image processing using multi-code gan prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3012–3021
Pan X, Zhan X, Dai B, Lin D, Loy CC, Luo P (2021) Exploiting deep generative prior for versatile image restoration and manipulation. IEEE Trans Pattern Anal Mach Intell 44(11):7474–7489
Article Google Scholar
Sitzmann V, Martel J, Bergman A, Lindell D, Wetzstein G (2020) Implicit neural representations with periodic activation functions. Adv Neural Inf Process Syst 33:7462–7473
Google Scholar
Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, Ng R (2020) Fourier features let networks learn high frequency functions in low dimensional domains. Adv Neural Inf Process Syst 33:7537–7547
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4690–4699
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inf Process Syst 30
Pidhorskyi S, Adjeroh DA, Doretto G (2020) Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 14104–14113

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Fujian Province (Grant Nos. 2021J011086, 2023J01964, 2023J01965, 2023J01966), by the External Collaboration Project of Science and Technology Department of Fujian Province (Grant No. 2023I0025), by the Fujian Province Chinese Academy of Sciences STS Program Supporting Project (Grant no. 2023T3084, 2023T3088), by the Guidance Project of the Science and Technology Department of Fujian Province (Grand No. 2023H0017), by the Qimai Science and Technology Innovation Project of Wuping Country, by the Xinluo District Industry-University-Research Science and Technology Joint Innovation Project (Grand Nos. 2022XLXYZ002,2022XLXYZ004), and by the Special Project of the Ministry of Education’s Higher Education Science Research and Development Center on “Innovative Applications of Virtual Simulation Technology in Vocational Education Teaching" (Grant No. ZJXF2022278).

Author information

Authors and Affiliations

Department of Computer, The Open University of Longyan, Longyan, 364000, People’s Republic of China
Chunyao Huang
School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan, 364012, People’s Republic of China
Chunyao Huang & Wei Zeng
School of Software Engineering, Xi’an Jiaotong University, Xi’an, 710049, People’s Republic of China
Xiaomei Sun & Zhiqiang Tian
Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049, People’s Republic of China
Shaoyi Du

Authors

Chunyao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shaoyi Du
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zeng.

Ethics declarations

Conflicts of interest

There is no conflict of interest.

Ethical approval

There is no issue with Ethical approval and Informed consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, C., Sun, X., Tian, Z. et al. Exploring conditional pixel-independent generation in GAN inversion for image processing. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18395-6

Download citation

Received: 17 October 2023
Revised: 14 December 2023
Accepted: 22 January 2024
Published: 15 February 2024
DOI: https://doi.org/10.1007/s11042-024-18395-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring conditional pixel-independent generation in GAN inversion for image processing

Abstract

Access this article

Similar content being viewed by others

Semantic Image Completion and Enhancement Using GANs

High-Resolution Image Inpainting Using Generative Adversarial Networks

OASIS: Only Adversarial Supervision for Semantic Image Synthesis

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring conditional pixel-independent generation in GAN inversion for image processing

Abstract

Access this article

Similar content being viewed by others

Semantic Image Completion and Enhancement Using GANs

High-Resolution Image Inpainting Using Generative Adversarial Networks

OASIS: Only Adversarial Supervision for Semantic Image Synthesis

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation