Abstract
Image processing holds an indispensable role in various facets of our daily lives, professional undertakings, and educational pursuits, encompassing a gamut of tasks including image reconstruction, inpainting, super-resolution, colorization, and editing. In recent years, the advent of advanced models rooted in Generative Adversarial Networks (GANs) has showcased remarkable capabilities in the domain of image synthesis, catapulting the direct application of these cutting-edge models to image processing to the forefront of contemporary research. Within this context, GAN inversion, an emerging paradigm, assumes a pivotal role in the landscape of image processing tasks. This paper delves into the realm of image inversion based on the latent space of GAN models. In response to the inherent limitations of current GAN inversion methods, we introduce three innovations. Firstly, we depart from the conventional use of convolutional networks for generator implementation in existing GAN inversion techniques. Our approach employs generators entirely composed of fully connected layers, marking a significant departure from spatial convolutions and information propagation across pixels. Secondly, we leverage the distinct characteristic of generators engaged in conditional independent pixel synthesis. This feature is enhanced by fusing feature maps spanning contiguous strata of a feature pyramid network during the feature extraction process. Lastly, our framework offers a high degree of versatility, extending its applicability beyond image reconstruction to domains like image inpainting, super-resolution, and image colorization. Empirical results, based on the CelebFaces Attribute-HQ (CelebA-HQ) dataset, unequivocally demonstrate that GAN inversion, built on the principle of conditional independent pixel synthesis, yields superior reconstruction outcomes. Furthermore, it proves amenable to a plethora of tasks, including image inpainting, super-resolution, and image colorization. These advances open new vistas in image processing.
Similar content being viewed by others
Availability of data and materials
All the datasets used in this manuscript are publicly available datasets, already in the public domain.
References
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of GANs for improved quality, stability, and variation. In: International conference on learning representations. pp 1880–1900
Shi J, Liu W, Zhou G, Zhou Y (2023) AutoInfo GAN: toward a better image synthesis GAN framework for high-fidelity few-shot datasets via NAS and contrastive learning. Knowl-Based Syst 276:110757
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4401–4410
Xu X, Chang J, Ding S (2022) Image style transfering based on StarGAN and class encoder. Int J Softw Inform 12(2):245–258
Li S, Yuan Q, Zhang Y, Lv B, Wei F (2022) Image dehazing algorithm based on deep learning coupled local and global features. Appl Sci 12(17):8552
Liu S, Zhang Q, Huang L (2023) Edge computing-based generative adversarial network for photo design style transfer using conditional entropy distance. Comput Commun 210:174–182
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp 5907–5915
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. pp 2223–2232
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8789–8797
Choi Y, Uh Y, Yoo J, Ha JW (2020) Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8188–8197
Huh M, Zhang R, Zhu JY, Paris S, Hertzmann A (2020) Transforming and projecting images into class-conditional generative networks. In: European conference on computer vision. pp 17–34
Abdal R, Qin Y, Wonka P (2019) Image2stylegan: How to embed images into the stylegan latent space?. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 4432–4441
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8110–8119
Abdal R, Qin Y, Wonka P (2020) Image2stylegan++: How to edit the embedded images?. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 8296–8305
Tewari A, Elgharib M, Bernard F, Seidel HP, Pérez P, Zollhöfer M, Theobalt C (2020) Pie: Portrait image embedding for semantic control. ACM Trans Graph 39(6):1–14
Zhao Z, Faghihroohi S, Yang J, Huang K, Navab N, Maier M, Nasseri MA (2023) Unobtrusive biometric data de-identification of fundus images using latent space disentanglement. Biomed Opt Express 14(10):5466–5483
Guan S, Tai Y, Ni B, Zhu F, Huang F, Yang X (2020) Collaborative learning for faster stylegan embedding. arXiv:2007.01758
Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2021) Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2287–2296
Xu Y, Shen Y, Zhu J, Yang C, Zhou B (2021) Generative hierarchical features from synthesizing images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4432–4442
Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for stylegan image manipulation. ACM Trans Graph 40(4):1–14
Liu H, Song Y, Chen Q (2023) Delving StyleGAN inversion for image editing: a foundation latent space viewpoint. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10072–10082
Zhu J, Shen Y, Zhao D, Zhou B (2020) In-domain gan inversion for real image editing. In: European conference on computer vision. pp 592–608
Gu J, Shen Y, Zhou B (2020) Image processing using multi-code gan prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3012–3021
Pan X, Zhan X, Dai B, Lin D, Loy CC, Luo P (2021) Exploiting deep generative prior for versatile image restoration and manipulation. IEEE Trans Pattern Anal Mach Intell 44(11):7474–7489
Sitzmann V, Martel J, Bergman A, Lindell D, Wetzstein G (2020) Implicit neural representations with periodic activation functions. Adv Neural Inf Process Syst 33:7462–7473
Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, Ng R (2020) Fourier features let networks learn high frequency functions in low dimensional domains. Adv Neural Inf Process Syst 33:7537–7547
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4690–4699
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inf Process Syst 30
Pidhorskyi S, Adjeroh DA, Doretto G (2020) Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 14104–14113
Acknowledgements
This work was supported by the Natural Science Foundation of Fujian Province (Grant Nos. 2021J011086, 2023J01964, 2023J01965, 2023J01966), by the External Collaboration Project of Science and Technology Department of Fujian Province (Grant No. 2023I0025), by the Fujian Province Chinese Academy of Sciences STS Program Supporting Project (Grant no. 2023T3084, 2023T3088), by the Guidance Project of the Science and Technology Department of Fujian Province (Grand No. 2023H0017), by the Qimai Science and Technology Innovation Project of Wuping Country, by the Xinluo District Industry-University-Research Science and Technology Joint Innovation Project (Grand Nos. 2022XLXYZ002,2022XLXYZ004), and by the Special Project of the Ministry of Education’s Higher Education Science Research and Development Center on “Innovative Applications of Virtual Simulation Technology in Vocational Education Teaching" (Grant No. ZJXF2022278).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
There is no conflict of interest.
Ethical approval
There is no issue with Ethical approval and Informed consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, C., Sun, X., Tian, Z. et al. Exploring conditional pixel-independent generation in GAN inversion for image processing. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18395-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18395-6